MyISAM to InnoDB

Table of Contents

The Task
INDEX Issues
Non-INDEX Issues
Postlog
Brought to you by Rick James

The Task


You have decided to change some table(s) from MyISAM to InnoDB. That should be as simple as ALTER TABLE foo ENGINE=InnoDB. But you have heard that there might be some subtle issues.

This is describes the possible issues and what do do about them.

Recommendation. One way to assist in searching for issues is to do (at least in *nix)
      mysqldump --no-data --all-databases >schemas
      egrep 'CREATE|PRIMARY' schemas   # Focusing on PRIMARY KEYs
      egrep 'CREATE|FULLTEXT' schemas  # Looking for FULLTEXT indexes
      egrep 'CREATE|KEY' schemas       # Looking for various combinations of indexes

Understanding how the indexes work will help you better understand what might run faster or slower in InnoDB.

INDEX Issues


(Most of these Recommendations and some of these Facts have exceptions.)

Fact. Every InnoDB table has a PRIMARY KEY. If you do not provide one, then the first non-NULL UNIQUE key is used. If that can't be done, then a 6-byte, hidden, integer is provided.

Recommendation. Look for tables without a PRIMARY KEY. Explicitly specify a PRIMARY KEY, even if it an artificial AUTO_INCREMENT. This is not an absolute requirement, but it is a stronger admonishment for InnoDB than for MyISAM. Some day you may need to walk through the table; without an explicit PK, you can't do it.

Fact. The fields of the PRIMARY KEY are included in each Secondary key.

    ⚈  Check for redundant indexes with this in mind.
    PRIMARY KEY(id),
    INDEX(b), -- effectively the same as INDEX(b, id)
    INDEX(b, id) -- effectively the same as INDEX(b)
    (Keep one of the INDEXes, not both)
    ⚈  Note subtle things like
    PRIMARY KEY(id),
    UNIQUE(b), -- keep for uniqueness constraint
    INDEX(b, id) -- DROP this one
    ⚈  Also, since the PK and the data coexist:
    PRIMARY KEY(id),
    INDEX(id, b) -- DROP this one; it adds almost nothing

Contrast. This feature of MyISAM is not available in InnoDB; the value of `id` will start over at 1 for each different value of `abc`:
        id INT UNSIGNED NOT NULL AUTO_INCREMENT,
        PRIMARY KEY (abc, id)
A way to simulate the MyISAM 'feature' might be something like: What you want is this, but it won't work because it is referencing the table twice:
      INSERT INTO foo
         (other, id, ...)
         VALUES
         (123, (SELECT MAX(id)+1 FROM foo WHERE other = 123), ...);
Instead, you need some variant on this. (You may already have a BEGIN...COMMIT.)
      BEGIN;
      SELECT @id := MAX(id)+1 FROM foo WHERE other = 123 FOR UPDATE;
      INSERT INTO foo
         (other, id, ...)
         VALUES
         (123, @id, ...);
      COMMIT;
Having a transaction is mandatory to prevent another thread from grabbing the same id.

Recommendation. Look for such PRIMARY KEYs. If you find such, ponder how to change the design. There is no straightforward workaround. However, the following may be ok. Be sure that the datatype for id is big enough since it won't start over. You cannot assume id starts at 1, and there will be gaps. The table cannot be simply ALTERed, nor can you dump and reload the data; you must write code to recompute the ids.
        id INT UNSIGNED NOT NULL AUTO_INCREMENT,
        PRIMARY KEY (abc, id),
        UNIQUE(id)

Recommendation. Keep the PRIMARY KEY short. If you have Secondary keys, remember that they include the fields of the PK. A long PK would make the Secondary keys bulky. Well, maybe not -- if the is a lot of overlap in fields. Example: PRIMARY KEY(a,b,c), INDEX(c,b,a) -- no extra bulk.

Recommendation. Check AUTO_INCREMENT sizes.
    ⚈  BIGINT is almost never needed. It wastes at least 4 bytes per row (versus INT).
    ⚈  Almost always use UNSIGNED and NOT NULL.
    ⚈  MEDIUMINT UNSIGNED (16M max) might suffice instead of INT
    ⚈  Be sure to be pessimistic -- it is painful to ALTER later.

Contrast. "Vertical Partitioning". This is where you artificially split a table to move bulky columns (eg, a BLOB) into another, parallel, table. It is beneficial in MyISAM to avoid stepping over the blob when you don't need to read it. InnoDB stores BLOB and TEXT differently -- 767 bytes are in the record, the rest is in some other block. So, it may (or may not) be worth putting the tables back together. Caution: An InnoDB row is limited to 8KB, and the 767 counts against that.

Fact. FULLTEXT and SPATIAL indexes are not available in InnoDB. FULLTEXT is available in InnoDB as of 5.6, but there are a number of differences.

Recommendation. Search for such indexes. If not yet upgraded, keep such tables in MyISAM. Better yet, do Vertical Partitioning (see above) to split out the minimum number of columns from InnoDB.

Fact. The maximum length of an INDEX is different between the Engines. (This change is not likely to hit you, but watch out.) MyISAM allows 1000 bytes; InnoDB allows 767 bytes (per column in an index), just big enough for a VARCHAR(255) CHARACTER SET utf8.
ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes

Contrast. SHOW TABLE STATUS's Index_length differs. In MyISAM, space for all indexes, including a PRIMARY KEY, is included. For InnoDB, the PK is included with the data; Index_length is the total size for the secondary keys.

Fact. In InnoDB, the PRIMARY KEY is included in the data. Hence, exact match by PK ("point query") may be a little faster with InnoDB. And, "range" scans by PK are likely to be faster -- no back and forth between the index and the data.

Fact. In InnoDB, a lookup by Secondary Key traverses the secondary key's BTree, grabs the PRIMARY KEY, then traverses the PK's BTree. Hence, secondary key lookups are a little more cumbersome.

Contrast. The fields of the PRIMARY KEY are included in each Secondary key. This may lead to "Using index" (in the EXPLAIN plan) for InnoDB for cases where it did not happen in MyISAM. (This is a slight performance boost, and counteracts the double-lookup otherwise needed.) However, when "Using index" would be useful on the PRIMARY KEY, MyISAM would do an "index scan", yet InnoDB effectively has to do a "table scan".

Same as MyISAM. Almost always
      INDEX(a)   -- DROP this one because the other one handles it.
      INDEX(a,b)

Same as MyISAM. The optimizer almost never uses two indexes in a single SELECT. (5.1 occasionally does "index merge".) SELECT in subqueries and UNIONs can independently pick indexes.

Same as MyISAM. The optimizer does a very poor job of the construct IN ( SELECT ... ). Turn it into a JOIN. (5.6.7 and MariaDB 5.5 improve on this.)

Subtle issue. When you DELETE a row, the AUTO_INCREMENT id may be burned. Ditto for REPLACE, which is a DELETE plus an INSERT.

Subtle issue. INSERT IGNORE burns ids because it allocates values before checking for duplicate keys.

InnoDB only. A ROLLBACK (explicit or implicit) will burn any ids already allocated to INSERTs.

REPLACE = DELETE + INSERT, so the INSERT comments apply to REPLACE.

Subtle issue. After a crash, the next id to be assigned may or may not be what you expect; this varies with Engine.

Very subtle contrast. Replication occurs on COMMIT. If you have multiple threads using transactions, the AUTO_INCREMENTs can arrive at a slave out of order for InnoDB. One transaction BEGINs, grabs an id. Then another transaction grabs an id but COMMITs before the first finishes.

Same as MyISAM. "Prefix" indexing is usually bad in both InnoDB and MyISAM. Example: INDEX(foo(30))

Non-INDEX Issues


Disk space for InnoDB is likely to be 2-3 times as much as for MyISAM.

MyISAM and InnoDB use RAM radically differently. If you change all your tables, you should make significant adjustments:
    ⚈  key_buffer_size -- small but non-zero; say, 10M;
    ⚈  innodb_buffer_pool_size -- 70% of available RAM

InnoDB has essentially no need for CHECK, OPTIMIZE, or ANALYZE. Remove them from your maintenance scripts. (No real harm if you keep them.)

Backup scripts may need checking. A MyISAM table can be backed up by copying three files. There is no corresponding technique with InnoDB. Similarly, capturing a table or database for copying from production to a development environment is not possible. Change to mysqldump.

Understand autocommit and BEGIN/COMMIT.
    ⚈  (default) autocommit = 1: In the absence of any BEGIN or COMMIT statements, every statement is a transaction by itself. This is close to the MyISAM behavior, but is not really the best.
    ⚈  autocommit = 0: COMMIT will close a transaction and start another one. To me, this is kludgy.
    ⚈  (recommended) BEGIN...COMMIT gives you control over what sequence of operation(s) are to be considered a transaction and "atomic". Include the ROLLBACK statement if you need to undo stuff back to the BEGIN.

Perl's DBIx::DWIW and Java's JDBC have API calls to do BEGIN and COMMIT. These are probably better than 'executing' BEGIN and COMMIT.

Test for errors everywhere! Because InnoDB uses row-level locking, it can stumble into deadlocks that you are not expecting. The engine will automatically ROLLBACK to the BEGIN. The normal recovery is to redo, beginning at the BEGIN. Note that this is a strong reason to have BEGINs.

Use SELECT ... FOR UPDATE in a transaction when you need to 'lock' the row SELECTed until you can do something to it (eg UPDATE).

LOCK/UNLOCK TABLES -- remove them. Replace them (sort of) with BEGIN ... COMMIT. (Yes, LOCK will work, but it is less efficient, and may have subtle issues.)

In 5.1, ALTER ONLINE TABLE can speed up some operations significantly. (Normally ALTER TABLE copies the table over and rebuilds the indexes.)

The "limits" on virtually everything are different between MyISAM and InnoDB. Unless you have huge tables, wide rows, lots of indexes, etc, you are unlikely to stumble into a different limit.

Mixture of MyISAM and InnoDB? This is OK. But there are caveats.
    ⚈  RAM settings for caches should be adjusted to accordingly.
    ⚈  JOINing tables of different Engines works.
    ⚈  A transaction that affects tables of both types can ROLLBACK InnoDB changes, but will leave MyISAM changes intact.
    ⚈  Replication: MyISAM statements are replicated when finished; InnoDB statements are held until the COMMIT.

FIXED (vs DYNAMIC) is meaningless in InnoDB.

PARTITION -- You can partition MyISAM and InnoDB tables. Remember the screwball rule: You must either
    ⚈  have no UNIQUE (or PRIMARY) keys, or
    ⚈  have the value you are "partitioning on" in every UNIQUE key.
The former is not advised for InnoDB.

Postlog


See also
Manual page
a primer in FOREIGN KEYs

Refreshed: Oct, 2012


Contact me by posting a question at MySQL Forums :: InnoDB
-- Rick James

Rick's MySQL Documents

MySQL Documents by Rick James

Tips, Debugging, HowTos, Optimizations, etc...

Rick's RoTs (Rules of Thumb -- lots of tips)
Memory Allocation (caching, etc)
Character Set and Collation problem solver
Converting from MyISAM to InnoDB -- includes differences between them
Big DELETEs - how to optimize -- and other chunking advice, plus a use for PARTITIONing
Compound INDEXes plus other insights into the mysteries of INDEXing
Partition Maintenance (DROP+REORG) for time series
Entity-Attribute-Value -- a common, poorly performing, design pattern; plus an alternative
Find the nearest 10 pizza parlors -- efficient searching on Latitude + Longitude (another PARITION use)
Alter of a Huge table -- Mostly obviated by 5.6
Latest 10 news articles -- how to optimize the schema and code for such
Pagination, not with OFFSET, LIMIT
Data Warehouse techniques (esp., Summary Tables)
Techniques on efficiently finding a random row (On beyond ORDER BY RAND())
GUID/UUID Performance (type 1 only)
IP Range Table Performance -- or other disjoint ranges
MySQL Limits -- built-in hard limits
Galera Notes (Percona XtraDB Cluster, MariaDB 10, or manually installed)
Rollup Unique User Counts
5.7's Query Rewrite -- perhaps 5.7's best perf gain, at least for this forum's users
Best of MySQL Forum -- index of lots of tips, discussions, etc