ALTERing a Huge MySQL Table

Percona's pt-online-schema-change
(aka pt-osc) can do an ALTER with very little downtime. It does, however, require adding a TRIGGER to the table.

See also this blog about FOREIGN KEYs: pt-osc and FKs

gh-ost
is a new and promising competitor to pt-online-schema-change; it uses the binlog.

The ALTER in version 5.6.7 can do a variety of ALTERs without blocking other operations. For a list of what 5.7 can and cannot do via ALGORITHM=INPLACE, see 5.7 ALTER TABLE

If you don't have the latest version, or you can't use Percona's solution, read on.

Overview of Solution

    1.  Build the 'Alter' script to copy rows into a new table (in clumps of rows)
    2.  Push code to add a Database Layer between client(s) and the database
    3.  Push code to augment the Database Layer to handle the effect of the Alter
    4.  Turn on the Alter
    5.  Push code to deactivate the Alter

The Layer may already be in place; it is 'best practice'. However, after seeing what is ahead (below), you may want to clean up the Layer some. One example is to change a BEGIN...COMMIT into a single API call (if practical). If this leads to a cleaner API, then that helps the API. Later, when you get to step 3, life will be simpler for other reasons.

Steps 2 and 3 are separate because I am assuming you also cannot afford screwups. Step 2 focuses on the API from Client to Layer; Step 3 focuses on tweaking the Layer for this Alter -- they are separable focuses, and is safer to think about only one at a time.

"Turning on the Alter" is essentially running the script (Step 1) to copy the table over. This may take hours or days.

The guiding principles...
    ⚈  If the existing table continues to be read/written by the identical code as before, all queries on it should continue to work correctly.
    ⚈  At all times, rows with id <= $left_off (a "highwater mark") will be correctly transformed, inserted, updated, etc.
    ⚈  At the end, when $left_off is at the end of the table, the transformation is complete.

Shortcut

Most of the rest of this discussion centers on the complex case of a table that is being modified -- any row could be UPDATEd, INSERTs could go anywhere into the table. And it assumes a single machine, or a single Master.

Some likely special cases are covered near the end ("Alternative...")

Assumptions/Restrictions/Complications

Assumptions
    ⚈  This discussion assumes you can walk through the original table using an explicit PRIMARY or UNIQUE key.
    ⚈  Single-column (not 'compound') key is used to walk through the table.
    ⚈  You have enough disk space to simultaneously hold both the original table and the new table(s).
    ⚈  There is enough 'wiggle room' in the performance of the databases so that the overhead of this process (100%?) can be handled.
    ⚈  INSERTs are single-row, UPDATEs are single-table, DELETEs are single-row
    ⚈  You can modify the Layer to change how INSERTs, etc, operate.
    ⚈  SELECTs need no modification. (Nice, eh?)
    ⚈  UPDATE statements do not modify the column being used to walk through the table.
    ⚈  INSERT..ON DUPLICATE KEY UPDATE.. is not used
    ⚈  INSERT IGNORE does not depend on a secondary UNIQUE key
    ⚈  There are no FOREIGN KEYs in the definition of this table, nor any in other tables referencing this table.
    ⚈  Only one table is involved
    ⚈  You have all the write operations (no ad hoc queries from users)
    ⚈  Self-joins

Applicability
    ⚈  Engine -- The outline given here should work equally well for MyISAM or InnoDB.
    ⚈  Replication -- You can perform this on the read/write Master and have it propagate gracefully to the slaves.
    ⚈  Cluster (Galera, etc) -- It should work.

What to do if assumptions are not met?
    ⚈  No PRIMARY/UNIQUE key -- A non-unique key can be used, but this may lead to arbitrarily large locks on the table.
    ⚈  No explicit keys at all -- Punt. (LIMIT+OFFSET is not viable!)
    ⚈  Inadequate disk space -- Punt. Get space first.
    ⚈  If the system gets bogged down, two dynamic tunables can be tweaked to slow down the Alter and make it less invasive.
    ⚈  Multi-row INSERT -- design the Layer's API to make it easy to determine which rows are below $left_off.
    ⚈  Multi-table UPDATE -- This is a challenge -- especially 'self-joins' that interrogate later parts of the table to decide on changes to earlier parts.
    ⚈  DELETE .. IN(...) -- pass the IN list in an array for easy handling
    ⚈  DELETE .. WHERE (multi-row) -- no problem
    ⚈  UPDATE that changes the key -- You must recognize the situation and write some messy code. (It would be handy if the API helped make this obvious.)
    ⚈  INSERT..ON DUPLICATE KEY UPDATE.. -- Don't know.
    ⚈  INSERT IGNORE -- The problem is that it might get INSERTed before the conflicting record is inserted. No workaround available.
    ⚈  FOREIGN KEYS -- have not investigated
    ⚈  Extra tables for normalization -- not a big deal
    ⚈  ad hoc write queries -- must disallow for the duration of the Alter.
    ⚈  Compound index -- The code is more complex, but doable. See Iterating through a compound key

Database Layer

It is "best practice" to have a "Layer" between your clients and your database. SQL statements are only known to the Layer, not to the clients. By having the Layer, you are segregating "business logic" from "database details". For the purposes here, the "database details" will be changing; it is better to have such code isolated.

The Layer would be called from the Client with calls like "Insert this stuff...". If the 'stuff' needs massaguing to be compatible with MySQL (eg, timestamps), then the Layer is the 'right' place to do the conversion. The Layer is also a good place to hide any "normalization" and "lookup" tables -- clients should no care whether a 'name' is stored in the table directly, or normalized into another table and only an id is put into the "Fact" table.

Keep the API clean and simple; hide any database complexity in the Layer. And, for this task, it will get complex.

For the task at hand, we will depend on the Layer to make it easy to migrate from one table to another, and to do that without any real knowledge of "business logic". This ignorance of the client side of things makes it easier and safer to write the code and have confidence that it is correct.

The Alter

This is a script that does the conversion by copying the data into a new table, which already has the 'new' schema elements (added columns, dropped indexes, etc).

    ⚈  CREATE TABLE -- the new table, with all new indexes/datatypes, etc.
    ⚈  Change the row in the helper table Migrate (running=1, etc) (below)
    ⚈  Loop (below)
    ⚈  Deactivate (running=0)
    ⚈  RENAME TABLE existing TO old, new TO existing;

Each iteration does:
    ⚈  Migrates a "clump" of, say, 100 rows from the old table to the new.
    ⚈  Any transforms (normalization, datatype conversion, etc) are done for those rows.
    ⚈  Locks out all 'write' operations while handling the clump. (More later)

The starting values for Migrate:
    ⚈  running = 1
    ⚈  clump_size = 100 (1000 if existing table is InnoDB with no secondary keys); <100 if lots of indexes)
    ⚈  delay = 1 (second)
    ⚈  left_off_* -- whatever key is less than first value (eg, 0 or '')

The Loop should watch for running out of rows. When that happens, it needs to do the Deactivate and RENAME before the Unlock.

One Iteration (Copy one Clump)

    ⚈  Fetch the next row from Migrate
    ⚈  Find the key of the 100th row after where you 'left_off'.
    ⚈  Lock (exclusive)
    ⚈  INSERT INTO new SELECT * FROM existing WHERE id > $left_off AND id <= $hundredth_hence;
    ⚈  If no more rows, Deactivate and RENAME
    ⚈  Update row in Migrate
    ⚈  Unlock
    ⚈  Sleep 1 second (dynamically tunable)

Notes:
    ⚈  The "Find" step is outside the Lock/Unlock so as to minimize the time in lock state.
    ⚈  Risk: with the Find step outside, you could end up with more than 100 rows to move. (Probably not a big deal.)
    ⚈  Lock/Unlock -- an exclusive LOCK on the Migrate table. Note that all INSERTs/UPDATEs/etc must also Lock that table
    ⚈  The SELECT step should include any transforms needed, build normalization records, and whatever else is required.

Layer's INSERTs

The database Layer is vital for giving us a simple way to modify all INSERT/UPDATE/REPLACE/DELETE statements. All client write operations must be going through the Layer. (SELECTs should go through the Layer, but that does not matter for this discussion.)

I will discuss 'simple' write operations only.

Around every atomic operation, you need to add the following:
    ⚈  Lock (read-lock) on Migrate
    ⚈  Fetch the row from Migrate
    ⚈  If not running, then skip most of these steps. (Alter has not yet started, or has finished.)
    ⚈  Perform the SQL statement on old table
    ⚈  Modify statement to include AND id <= $left_off, and perform it on new table.
    ⚈  Unlock

For "transactions" in BEGIN..COMMIT, either make the whole transaction a single API call, or carefully coordinate the Lock/Unlock with BEGIN..COMMIT. BEGIN should map to the Lock and Fetch steps; COMMIT should map to the Unlock step.

REPLACE should be split into what it actually does -- a DELETE and an INSERT. Both of these sound be inside a single Lock/Unlock pair.

INSERT .. ON DUPLICATE KEY UPDATE .., especially if including a SELECT can probably be broken into two steps as discussed in [staging_table#normalization][High Speed Ingestion]]

As noted earlier, some multi-row and multi-table write operations get more complicated.

Helper Table: Migrate

This table has only 1 row.

      CREATE TABLE Migrate (
         running TINYINT UNSIGNED NOT NULL DEFAULT '0',
         clump_size INT UNSIGNED NOT NULL DEFAULT '100',
         delay FLOAT NOT NULL DEFAULT '1',
         left_off_1 ... (INT, VARCHAR, etc, matching first field in KEY being used)
         left_off_2 ... (more field(s), if needed)
         clumps_moved INT UNSIGNED NOT NULL DEFAULT '0',
         rows_moved BIGINT UNSIGNED NOT NULL DEFAULT '0',
         lock_time DOUBLE NOT NULL DEFAULT '0',
         move_time DOUBLE NOT NULL DEFAULT '0',
         sleep_time DOUBLE NOT NULL DEFAULT '0',
         last_move TIMESTAMP NOT NULL
      ) ENGINE=MyISAM;