The Problem Space
Let's say you have "news articles" (rows in a table) and want a web page showing the latest ten articles about a particular topic.
Variants on "topic":
⚈ Provider (of news article)
⚈ Manufacturer (of item for sale)
⚈ Ticker (financial stock)
Variants on "news article"
⚈ Item for sale
⚈ Blog comment
⚈ Blog thread
Variants on "latest"
⚈ Publication date (unix_timestamp)
⚈ Most popular (keep the count)
⚈ Most emailed (keep the count)
⚈ Manual ranking (1..10 -- 'top ten')
Variants on "10" - there is nothing sacred about "10" in this discussion.
The Performance Issues
Currently you have a table (or a column) that relates the topic to the article. The SELECT statement to find the latest 10 articles has grown in complexity, and performance is poor. You have focused on what index to add, but nothing seems to work.
⚈ If there are multiple topics for each article, you need a many-to-many table.
⚈ You have a flag "is_deleted" that needs filtering on.
⚈ You want to "paginate" the list (ten articles per page, for as many pages as necessary).
First, let me give you the solution, then I will elaborate on why it works well.
⚈ One new table called, say, Lists.
⚈ Lists has _exactly_ 3 columns: topic, article_id, sequence
⚈ Lists has _exactly_ 2 indexes: PRIMARY KEY(topic, sequence, article_id), INDEX(article_id)
⚈ Only viewable articles are in Lists. (This avoids the filtering on "is_deleted", etc)
⚈ Lists is InnoDB. (This gets "clustering".)
⚈ sequence is typically the date of the article, but could be some other ordering.
⚈ topic could be a string, or could be an id that maps to the 'topic', the choice is not critical to this discussion.
⚈ article_id is a link to the bulky row in another table(s) that provide all the details about the article.
Find the latest 10 articles for a topic:
FROM Articles a
JOIN Lists s ON s.article_id = a.article_id
WHERE s.topic = ?
ORDER BY s.sequence DESC
You must _not_ have any WHERE condition touching columns in Articles.
When you mark an article for deletion; you _must_ remove it from Lists:
DELETE FROM Lists
WHERE article_id = ?;
I emphasize "must" because "deleted" flags and other filtering is often the root of
Why it works
By now, you may have discovered why it works.
The big goal is to minimize the disk hits. Let's itemize how few disk hits are needed. When finding the latest articles with 'normal' code, you will probably find that it is doing significant scans of the Articles table, failing to quickly home in on the 10 rows you want. With this design, there is only one extra disk hit:
⚈ 1 disk hit: 10 adjacent, narrow, rows in Lists -- probably in a single "block".
⚈ 10 disk hits: The 10 articles. (These hits are unavoidable, but may be cached.)
The PRIMARY KEY, and using InnoDB, makes these quite efficient.
OK, you pay for this by removing things that you should avoid.
⚈ 1 disk hit: INDEX(article_id) - finding a few ids
⚈ A few more disk hits to DELETE rows from Lists.
This is a small price to pay -- and you are not paying it while the user is waiting for the page to render.
-- Rick James
MySQL Documents by Rick James
HowTo Techniques for Optimizing Tough Tasks:
Partition Maintenance (DROP+REORG) for time series (includes list of PARTITION uses)
Big DELETEs - how to optimize -- and other chunking advice, plus a use for PARTITIONing
Chunking lengthy DELETE/UPDATE/etc.
Data Warehouse techniques:
Overview Summary Tables High speed ingestion
Entity-Attribute-Value -- a common, poorly performing, design pattern (EAV); plus an alternative
Find the nearest 10 pizza parlors -- efficient searching on Latitude + Longitude (another PARITION use)
Lat/Long representation choices
Pagination, not with OFFSET, LIMIT
Techniques on efficiently finding a random row (On beyond ORDER BY RAND())
GUID/UUID Performance (type 1 only)
IP Range Table Performance -- or other disjoint ranges
Rollup Unique User Counts
Alter of a Huge table -- Mostly obviated by 5.6
Latest 10 news articles -- how to optimize the schema and code for such
Build and execute a "Pivot" SELECT (showing rows as columns)
Find largest row for each group ("groupwise max")
Other Tips, Tuning, Debugging, Optimizations, etc...
Rick's RoTs (Rules of Thumb -- lots of tips)
Memory Allocation (caching, etc)
Character Set and Collation problem solver
Trouble with UTF-8 If you want case folding, but accent sensitivity, please file a request at http://bugs.mysql.com .
Python tips, PHP tips, other language tips
utf8 Collations utf8mb4 Collations on 8.0
Converting from MyISAM to InnoDB -- includes differences between them
Compound INDEXes plus other insights into the mysteries of INDEXing
Cookbook for Creating Indexes
Many-to-many mapping table wp_postmeta UNION+OFFSET
MySQL Limits -- built-in hard limits
767-byte INDEX limit
Galera, tips on converting to (Percona XtraDB Cluster, MariaDB 10, or manually installed)
5.7's Query Rewrite -- perhaps 5.7's best perf gain, at least for this forum's users
Request for tuning / slowlog info
Best of MySQL Forum -- index of lots of tips, discussions, etc
Analyze MySQL Performance
Analyze VARIABLEs and GLOBAL STATUS Analyze SlowLog
My slides from conferences
Percona Live 4/2017 - Rick's RoTs (Rules of Thumb) - MySQL/MariaDB
Percona Live 4/2017 - Index Cookbook - MySQL/MariaDB
Percona Live 9/2015 - PARTITIONing - MySQL/MariaDB
(older ones upon request)
Contact me via LinkedIn; be sure to include a brief teaser in the Invite request: