MyISAM to InnoDB

The Task

Configuration

INDEX Issues

INDEX issue -- 2-column PK

INDEX issue -- 767 limit

Non-INDEX Issues

Non-INDEX issue -- Transactions

pt-online-schema-change

FULLTEXT

Why InnoDB is preferred

Brought to you by Rick James

The Task

You have decided to change some table(s) from MyISAM to InnoDB. That should be as simple as ALTER TABLE foo ENGINE=InnoDB. But you have heard that there might be some subtle issues.

This is describes the possible issues and what do do about them.

Recommendation. One way to assist in searching for issues is to do (at least in *nix)

      mysqldump --no-data --all-databases >schemas
      egrep 'CREATE|PRIMARY' schemas   # Focusing on PRIMARY KEYs
      egrep 'CREATE|FULLTEXT' schemas  # Looking for FULLTEXT indexes
      egrep 'CREATE|KEY' schemas       # Looking for various combinations of indexes

Understanding how the indexes work will help you better understand what might run faster or slower in InnoDB.

If you have not decided to move from MyISAM, maybe you should reconsider. Oracle's MySQL 8.0 is making serious changes toward getting rid of MyISAM. (MariaDB is not yet making such noises.) Partitioning will no longer work in MyISAM in 8.0.

Usage of clustering technology (eg, Galera) does not handle MyISAM, or at least not well.

This manual page covers many of the same issues as this document. The focus is different; read both. Converting Tables from MyISAM to InnoDB

Configuration

RAM needs to be allocated differently depending whether you have all-MyISAM, all-InnoDB, or a mixture. See Memory Allocation

INDEX Issues

(Most of the "Recommendations" and some of these "Facts" have exceptions.)

Fact. Every InnoDB table has a PRIMARY KEY. If you do not provide one, then the first non-NULL UNIQUE key is used. If that can't be done, then a 6-byte, hidden, integer is provided.

Recommendation. Look for tables without a PRIMARY KEY. Explicitly specify a PRIMARY KEY, even if it is an artificial AUTO_INCREMENT. This is not an absolute requirement, but it is a stronger admonishment for InnoDB than for MyISAM. Some day you may need to walk through the table; without an explicit PK, you can't do it.

Fact. In InnoDB, the fields of the PRIMARY KEY are included in each Secondary key.

⚈ Check for redundant indexes with this in mind.

    PRIMARY KEY(id),
    INDEX(b),    -- effectively the same as INDEX(b, id)
    INDEX(b, id) -- effectively the same as INDEX(b)
    (Keep one of the INDEXes, not both)

⚈ Note subtle things like

    PRIMARY KEY(id),
    UNIQUE(b),   -- keep for uniqueness constraint
    INDEX(b, id) -- DROP this one

⚈ Also, since the PK and the data cohabitate the same BTree:

    PRIMARY KEY(id),
    INDEX(id, b) -- DROP this one; it adds almost nothing

Recommendation. Keep the PRIMARY KEY short. This recommendation has so many exceptions that I hate to even recommend it. Some people quite it as an absolute rule. Well, that's an old wives' tale.

If you have Secondary keys, remember that they include the fields of the PK. A long PK would make the Secondary keys bulky. Well, maybe not -- if the is a lot of overlap in fields. Example: PRIMARY KEY(a,b,c), INDEX(c,b,a) -- no extra bulk.

Recommendation. Check AUTO_INCREMENT sizes.
    ⚈  BIGINT is almost never needed. It wastes at least 4 bytes per row (versus INT).
    ⚈  Almost always use UNSIGNED and NOT NULL.
    ⚈  MEDIUMINT UNSIGNED (3 bytes, 16M max) might suffice instead of INT; see also SMALLINT and TINYINT.
    ⚈  Be sure to be pessimistic -- it is painful to ALTER later.

Contrast. "Vertical Partitioning". This is where you artificially split a table to move bulky columns (eg, a BLOB) into another, parallel, table. It is beneficial in MyISAM to avoid stepping over the blob when you don't need to read it. InnoDB stores BLOB and TEXT differently -- some bytes may be in the record, the rest is in some other block. So, it may (or may not) be worth putting the tables back together. Caution: There are cases where an InnoDB table definition will be "too big". It may be possible to use a different ROW_FORMAT to avoid this problem.

Fact. SPATIAL indexes are not available in InnoDB until 5.7.5. FULLTEXT is available in InnoDB as of 5.6, but there are a number of differences. (See below)

Recommendation. Search for such indexes. If not yet upgraded, keep such tables in MyISAM. Better yet, do Vertical Partitioning (see above) to split out the minimum number of columns from InnoDB.

Contrast. The Index_length in SHOW TABLE STATUS differs. In MyISAM, space for all indexes, including a PRIMARY KEY, is included. For InnoDB, the PK is included with the data; Index_length is the total size for the secondary keys.

Fact. In InnoDB, the PRIMARY KEY is included in the data. Hence, exact match by PK ("point query") may be a little faster with InnoDB. And, "range" scans by PK are likely to be faster -- no back and forth between the index and the data.

Fact. In InnoDB, a lookup by Secondary Key drills down the secondary key's BTree, grabs the PRIMARY KEY, then drills down the PK's BTree. Hence, secondary key lookups are a little more cumbersome.

Contrast. The fields of the PRIMARY KEY are included in each Secondary key. This may lead to "Using index" (in the EXPLAIN plan) for InnoDB for cases where it did not happen in MyISAM. (This is a slight performance boost, and counteracts the double-lookup otherwise needed.) However, when "Using index" would be useful on the PRIMARY KEY, MyISAM would do an "index range scan", yet InnoDB effectively has to do a "table range scan". "Using index" means "covering index". "Using index condition" (aka ICP) is not the same as "Using index".

Same as MyISAM. Almost always

      INDEX(a)   -- DROP this one because the other one handles it.
      INDEX(a,b)

Same as MyISAM. The optimizer almost never uses two indexes in a single SELECT. (5.1 occasionally does "index merge".) SELECTs in subqueries and UNIONs can independently pick indexes.

Same as MyISAM. The optimizer does a very poor job of the construct IN ( SELECT ... ). Turn it into a JOIN. (5.6.7 and MariaDB 5.5 improve on this.) Still, it is often better to turn it into a JOIN.

Subtle issue. When you DELETE a row, the AUTO_INCREMENT id may be burned. Ditto for REPLACE, which is a DELETE plus an INSERT.

Subtle issue. INSERT IGNORE burns ids because it allocates values before checking for duplicate keys. See this for a workaround: INSERT IGNORE

InnoDB only. A ROLLBACK (explicit or implicit) will burn any ids already allocated to INSERTs.

Contrast & Subtle issue. After a crash, the next id to be assigned may or may not be what you expect; this varies with Engine.

Contrast & Subtle issue. After any shutdown and restart, InnoDB will discover the next AUTO_INCREMENT via MAX(id). So, if the last id were deleted before the shutdown, that id will be reused. (This is being "fixed" in version 8.0.)

Very subtle contrast. Replication occurs for InnoDB at COMMIT time. If you have multiple threads using transactions, the AUTO_INCREMENTs can arrive at a replicata out of order for InnoDB. One transaction BEGINs, grabs an id. Then another transaction grabs an id but COMMITs before the first finishes.

Same as MyISAM. "Prefix" indexing is usually bad in both InnoDB and MyISAM. Example: INDEX(foo(30))

More index issues are covered in the companion blog Index Cookbook

INDEX issue -- 2-column PK

Contrast. This feature of MyISAM is not available in InnoDB; the value of id will start over at 1 for each different value of abc:

        id INT UNSIGNED NOT NULL AUTO_INCREMENT,
        PRIMARY KEY (abc, id)

A way to simulate the MyISAM 'feature' might be something like: What you want is this, but it won't work because it is referencing the table twice:

      INSERT INTO foo
         (other, id, ...)
         VALUES
         (123, (SELECT MAX(id)+1 FROM foo WHERE other = 123), ...);

Instead, you need some variant on this. (You may already have a BEGIN...COMMIT.)

      BEGIN;
      SELECT @id := IFNULL(MAX(id),0) + 1 FROM foo WHERE other = 123 FOR UPDATE;
      INSERT INTO foo
         (other, id, ...)
         VALUES
         (123, @id, ...);
      COMMIT;

Having a transaction is mandatory to prevent another thread from grabbing the same id.

Recommendation. Look for such PRIMARY KEYs. If you find such, ponder how to change the design. There is no straightforward workaround. However, the following may be ok. Be sure that the datatype for id is big enough so it won't overflow. The existing ids will have dups, but future ones should be ok:

        id INT UNSIGNED NOT NULL AUTO_INCREMENT,
        PRIMARY KEY (abc, id),
        INDEX(id)

Note that I said INDEX, not UNIQUE for id.

StackOverflow discussion of 2-col AI
Solving with a TRIGGER

INDEX issue -- 767 limit

Fact. The maximum length of an INDEX is different between the Engines. (This change is not likely to hit you, but watch out.) MyISAM allows 1000 bytes; InnoDB allows 767 bytes (per column in an index), just big enough for a VARCHAR(255) CHARACTER SET utf8 or VARCHAR(191) CHARACTER SET utf8mb4

ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes

The limit is being raised to 3072 in 5.7 (MariaDB 10.2). If you are running an older version, see my companion blog on 'Limits' for Various 767 workarounds

If you are using MySQL 8 or MariaDB 10.2, see this for a "windowing" approach: https://stackoverflow.com/questions/75821533/composite-primary-key-with-auto-increment-in-innodb#

Non-INDEX Issues

Disk space for InnoDB is likely to be 2-3 times as much as for MyISAM.

MyISAM and InnoDB use RAM radically differently. If you change all your tables, you should make significant adjustments:
    ⚈  key_buffer_size -- small but non-zero; say, 10M;
    ⚈  innodb_buffer_pool_size -- 70% of available RAM. (Use a smaller % if you have under 4GB of RAM.)

InnoDB has essentially no need for CHECK, OPTIMIZE, or ANALYZE. Remove them from your maintenance scripts. (If you keep them, they will run, just not be useful.)

Backup scripts may need checking. A MyISAM table can be backed up by copying three files. There is no corresponding technique with InnoDB. Similarly, capturing one table or database for copying from production to a development environment is not possible. Change to mysqldump or XtraBackup or etc.

In 5.1, ALTER ONLINE TABLE can speed up some operations significantly. (Normally ALTER TABLE copies the table over and rebuilds the indexes.) Later versions call it ALTER TABLE ... ALGORITHM=INPLACE. For some operations, it is much less invasive. MyISAM does not have this feature.

The "limits" on virtually everything are different between MyISAM and InnoDB. Unless you have huge tables, wide rows, lots of indexes, etc, you are unlikely to stumble into a different limit. See Limits

Mixture of MyISAM and InnoDB? This is OK. But there are caveats.
    ⚈  RAM settings for caches should be adjusted to accordingly.
    ⚈  JOINing tables of different Engines works.
    ⚈  A transaction that affects tables of both types can ROLLBACK InnoDB changes but will leave MyISAM changes intact.
    ⚈  Replication: MyISAM statements are replicated when finished; InnoDB statements are held until the COMMIT.

FIXED (vs DYNAMIC) is meaningless in InnoDB. (FIXED is virtually useless in MyISAM.)

PARTITION -- You can partition MyISAM (before 8.0) and InnoDB tables. Remember the screwball rule: You must either
    ⚈  have no UNIQUE (or PRIMARY) keys, or
    ⚈  have the value you are "partitioning on" in every UNIQUE key.
The former is not advised for InnoDB.

Non-INDEX issue -- Transactions

Learn to use "transactions". This is a feature that does not exist in MyISAM.

If you currently have LOCK TABLES and UNLOCK TABLES, remove them. Replace them (sort of) with BEGIN ... COMMIT. (Yes, LOCK TABLE will work, but it is less efficient, and may have subtle issues.)

Understand autocommit and BEGIN/COMMIT.
    ⚈  (default) autocommit = 1: In the absence of any BEGIN or COMMIT statements, every statement is a transaction by itself. This is close to the MyISAM behavior, but is not really the best.
    ⚈  autocommit = 0: In this 'mode', COMMIT will close a transaction and start another one. To me, this is kludgy. If you forget to COMMIT, it will be a nasty bug to locate.
    ⚈  (recommended) BEGIN...COMMIT gives you control over what sequence of operation(s) are to be considered a transaction and "atomic". Include the ROLLBACK statement if you need to undo stuff back to the BEGIN.

START TRANSACTION is a synonym for BEGIN.

Perl's DBIx::DWIW and Java's JDBC have API calls to do BEGIN and COMMIT. These are probably better than 'executing' BEGIN and COMMIT.

Test for errors everywhere! Because InnoDB uses row-level locking, it can stumble into deadlocks that you are not expecting. The engine will automatically ROLLBACK the state of the tables to what they were at the time of the BEGIN. The normal recovery is to redo, beginning at the BEGIN. Note that this is a strong reason to have BEGINs.

Even test after COMMIT. Some day you may move to a Galera cluster solution; this requires checking for errors after COMMIT. Local deadlocks are caught before that, but cluster-wide deadlocks are not caught until COMMIT time.

Use SELECT ... FOR UPDATE in a transaction when you need to 'lock' the row SELECTed until you can do something to it (eg UPDATE).

pt-online-schema-change

pt-online-schema-change
should be able to convert MyISAM to InnoDB with essentially no downtime. Some comments:

    ⚈  Galera-based systems (PXC, MariaDB+Galera) do not like MyISAM; there are some limitations. It would probably be better to convert before loading into Galera.
    ⚈  You should consider doing all the necessary changes (to indexes, etc) to each table at the same time.
    ⚈  Since you would be trying to do the conversions "live", LOCK TABLES may work slightly differently as you gradually convert multiple tables.
    ⚈  There may be other caveats.

FULLTEXT

Beginning with 5.6, FULLTEXT is available in InnoDB. Here are most of the differences.

    ⚈  VARIABLES (in my.cnf) are different. SHOW VARIABLES LIKE '%ft%';
    ⚈  min "word" length defaults to 3 instead of 4
    ⚈  short words with "+" before them are ignored in MyISAM, but cause InnoDB to never match
    ⚈  the relevance values are computed differently, so ORDER BY may sort differently
    ⚈  IN BOOLEAN MODE on an InnoDB table requires the MATCH() column list to exactly match the FULLTEXT definition. (MyISAM did not.)
    ⚈  For MyISAM tables (only), boolean-mode searches can be done on nonindexed columns, although they are likely to be slow.
    ⚈  There may be cases in InnoDB where you 'need' to rebuild the index.
    ⚈  For InnoDB, a + before a quoted word must be outside the quotes.
    ⚈  Disable InnoDB stopwords via innodb_ft_enable_stopword=OFF.

(Caveat: This list is likely to be incomplete.)

A suggestion on FT's own id
Discussion of InnoDB FT drawbacks and workarounds

Why InnoDB is preferred

    ⚈  Recovery from mysql crash is automatic (versus having to run REPAIR TABLE).
    ⚈  Transactional integrity is available.
    ⚈  Row-level locking is much more efficient than table-level locking for high concurrency.
    ⚈  MyISAM, though still supported, is not supported actively, and may soon be deprecated, perhaps even removed.
    ⚈  MyISAM cannot be used in Galera-based clustering HA solutions.

On the flip side:

    ⚈  InnoDB tables have a bigger disk footprint.

Generating ALTERs

To generate all the ALTERs to convert all the MyISAM tables to InnoDB:

    SELECT  CONCAT('USE ', table_schema, ';  ALTER TABLE ', table_name, ' ENGINE=InnoDB;')
        FROM  information_schema.tables
        WHERE  engine = 'MyISAM'
          AND  table_schema NOT IN ('mysql', 'information_schema', 'performance_schema');

Files

In MyISAM, you can move the .frm, .MYD, and .MYI files around in the disk system. InnoDB is laid out different. Do not try anything similar. See mysqldump and/or transportable tablespaces for alternatives.

Myths about MyISAM

The MyISAM vs InnoDB Myth "InnoDB is not good as MyISAM"

That's an old wives' tale. Erase it from your mind.

    ⚈  InnoDB has improved a lot since that rumor was started.
    ⚈  You now have user information split across two tables; the small overhead of doing such is probably worse than having all the info neatly together in one InnoDB table.

Bottom line: Use InnoDB for all tables. There are very few exceptions to this simple rule. In no particular order:

    ⚈  InnoDB tables usually have a 2x-3x larger disk footprint. But, so what, disks are huge.
    ⚈  COUNT(*) without WHERE is 'instantaneous' in MyISAM.
    ⚈  2-col AUTO_INCREMENT -- standard in MyISAM; clumsy to simulate in InnoDB. (Rarely asked for.)
    ⚈  Performance in obscure cases. (No specifics come to mind at the moment.)
    ⚈  One might quibble that the differences in FULLTEXT constitute an issue.

On the flip side, Oracle has taken the stand that MyISAM will be removed from MySQL.

The CHAR vs VARCHAR Myth "CHAR is better than VARCHAR"

Another _old_wives' tale. Even in MyISAM, that quote is often taken out of context.

    ⚈  Even in context it is rarely valid
    ⚈  If you have variable length data, the savings for I/O is higher than the alleged savings of CHAR over VARCHAR.
    ⚈  In InnoDB, CHAR and VARCHAR are mostly implemented identically.

Bottom line: Use CHAR only for strings that are truly fixed length.

The need-to-optimize-the-little-things Myth

I'll start with the 'answer' first.

Even before looking at the data, there are other tasks.

    1.  Receive the query, possibly across a WAN. (Up to milliseconds.)
    2.  Parse the tokens in the query.
    3.  Figure out which table is being used for each column named in the query.
    4.  Open the tables.
    5.  Invoke the Optimizer to deduce the best way to perform the query. This will involve locating all the possible indexes, doing probes into the tables to gather statistics, etc.
    6.  Run the query.

In the grand scheme of things, locating a record is far more costly than anything that is done with the record. (This is a generalization, not an absolute.)

    1.  Locate the record -- perhaps via an index, perhaps "next after" the last record fetched.
    2.  Fetch the block containing the record. This is probably cached in the buffer pool, but it might need to be fetched from disk. So, this step might be nanoseconds, or it could be milliseconds.
    3.  Dissect the block to find the row in question. This might include scanning the "history list" if multiple transactions are running and the "isolation mode" needs to be consulted to decide which copy of the row is "visible".
    4.  Now that you have the row, the columns need to be picked apart -- even with off-word-boundary issues, byte scans, NULL checks, length checks (eg, for VARchar, etc, we are talking nanoseconds per column.
    5.  "Endianness" slips in about here. MySQL can handle big-endian and little-endian hardware architecture with binary compatibility. This implies that for some hardware-dependent situations, it must swap bytes to get the column value into the right "endianism".
    6.  Do something with the column. This may be simply copying it intact; it may be applying a function (collation, summation, sqrt, whatever). Again nanoseconds.

Now, what was your question? Oh, yeah, you were concerned about some tiny part of the last step.

General

    ⚈  Word boundaries, for various hardware, software, and design reasons are not worth thinking about.
    ⚈  For large tables, I/O is a much bigger factor in performance than fixed versus variable length things.
    ⚈  Fixed length in MyISAM had very few advantages. Most vanish when you aren't doing UPDATE or DELETE + INSERT.
    ⚈  All columns in a row needed to be "fixed", else it was "variable".
    ⚈  InnoDB possibly has zero benefit from "fixed".
    ⚈  MyISAM was designed before variable length charsets (utf8, etc) were added. Even CHAR is effectively variable length when using utf8.

(And stop reading any MySQL reference that is over a decade old. Note that I somewhat keep my articles up-to-date.)

Postlog

See also Manual page: Converting Tables from MyISAM to InnoDB
a primer in FOREIGN KEYs
Windowing function to simulate 2-col PK in MySQL 8.0 or MariaDB 10.2
Sequence (as replacement for auto_increment)

Refreshed: Oct, 2012; FULLTEXT: June, 2015; Refreshed and added Generating ALTERs: March, 2016; Refresh Aug, 2017; Add "Myths": Mar, 2020

-- Rick James

MySQL Documents by Rick James

HowTo Techniques for Optimizing Tough Tasks:

Partition Maintenance (DROP+REORG) for time series (includes list of PARTITION uses)

Big DELETEs - how to optimize -- and other chunking advice, plus a use for PARTITIONing
Chunking lengthy DELETE/UPDATE/etc.

Data Warehouse techniques:
Data Warehouse Overview Summary Tables High speed ingestion Bulk Normalization

Schema and code design for large Sensor database

Entity-Attribute-Value (EAV) -- a common, poorly performing, design pattern; plus an alternative

5 methods for 'Find Nearest'

Lat/Lng search to Find the nearest 10 pizza parlors
Lat/Long representation choices

Z-Order 'find nearest'

Pagination, not with OFFSET, LIMIT

Techniques on efficiently finding a random row (On beyond ORDER BY RAND())

GUID/UUID Performance (type 1 only)

IP Range Table Performance -- or other disjoint ranges

Rollup Unique User Counts

Alter of a Huge table -- Mostly obviated by 5.6

Efficient List of Latest 10 news articles

Build and execute a Pivot SELECT (showing rows as columns)

(Groupwise Max): Efficiently find largest row(s) for each group

Other Tips, Tuning, Debugging, Optimizations, etc...

Rick's RoTs (Rules of Thumb -- lots of tips)

Datatypes and building a good schema

Memory Allocation (caching, etc)

Character Set and Collation problem solver
Trouble with UTF-8 If you want case folding, but accent sensitivity, please file a request at https://bugs.mysql.com .
Python tips, PHP tips, other language tips
utf8 Collations utf8mb4 Collations on 8.0

Converting from MyISAM to InnoDB -- includes differences between them

Compound INDEXes plus other insights into the mysteries of INDEXing

Cookbook for Creating Indexes
Many-to-many mapping table Handler counts wp_postmeta UNION+OFFSET

MySQL Limits -- built-in hard limits
767-byte INDEX limit

Galera, tips on converting to (Percona XtraDB Cluster, MariaDB 10, or manually installed)

5.7's Query Rewrite -- perhaps 5.7's best perf gain, at least for this forum's users

Analyze MySQL Performance
Analyze VARIABLEs and GLOBAL STATUS Analyze SlowLog

My slides from conferences
MiniFest 2021 - Rick James & Daniel Black - Answering on Stack Overflow(+comments) - MariaDB Frontlines
Percona Live 4/2017 - Rick's RoTs (Rules of Thumb) - MySQL/MariaDB
Percona Live 4/2017 - Index Cookbook - MySQL/MariaDB
Percona Live 9/2015 - PARTITIONing - MySQL/MariaDB

Contact me via LinkedIn; be sure to include a brief teaser in the Invite request:

Did my articles help you out? Like what you see? Consider donating:

☕️ Buy me a Banana Latte and bagel ($10) There is no obligation but it would put a utf8mb4 smiley 🙂 on my face, instead of the Mojibake "ðŸ™‚"

MyISAM to InnoDB

Table of Contents

The Task

Configuration

INDEX Issues

INDEX issue -- 2-column PK

INDEX issue -- 767 limit

Non-INDEX Issues

Non-INDEX issue -- Transactions

pt-online-schema-change

FULLTEXT

Why InnoDB is preferred

Generating ALTERs

Files

Myths about MyISAM

Postlog

MySQL Documents by Rick James

Did my articles help you out? Like what you see? Consider donating: