Implementation -- Getting Rid of OFFSET
The Desire
You have a Web site with news articles, or a blog, or some other thing with a list of things
that might be too long for a single page. So, you decide to break it
into chunks of, say, 10 items and provide a [Next]
button to go the next "page".
You discover OFFSET and LIMIT in MySQL (or MariaDB) and decide that is the obvious way to do it.
SELECT *
FROM items
WHERE messy_filtering
ORDER BY date DESC
OFFSET $M LIMIT $N
Note that the problem requirement needs a [Next] link on each page so
that the user can 'page' through the data.
He does not really need "GoTo Page ___".
Jump to the [First] or [Last] page may be useful.
The Problem
All is well -- until you have 50,000 items in a list.
And someone tries to walk through all 5000 pages.
That 'someone' could be a search engine crawler.
Where's the problem? Performance.
Your web page is doing SELECT ... OFFSET 49990 LIMIT 10
(or the equivalent LIMIT 49990,10).
MySQL has to find all 50,000 rows, step over the first 49,990, then deliver
the 10 for that distant page.
If it is a crawler ('spider') that read all the pages, then it actually
touched about 125,000,000 items to read all 5,000 pages.
Reading the entire table, just to get a distant page, can be so much
I/O that it can cause timeouts on the web page. Or it can interfere
with other activity, causing other things to be slow.
Other Bugs
In addition to a performance problem, ...
⚈ If an item is inserted or deleted between the time you look at one page and the next, you could miss an item, or see an item duplicated.
⚈ The pages are not easily bookmarked or sent to someone else because the contents shift over time.
⚈ The WHERE clause and the ORDER BY may even make it so that all 50,000 items have to be read, just to find the 10 items for page 1!
What to Do?
Hardware? No, that's just a bandaid. The data will continue to grow
and even the new hardware won't handle it.
Better INDEX? No. You must get away from reading the
entire table to get the 5000th page.
Build another table saying where the pages start? Get real!
That would be a maintenance nightmare, and expensive.
Bottom line: Don't use OFFSET; instead remember where you "left off".
First page (latest 10 items):
SELECT ... WHERE ... ORDER BY id DESC LIMIT 10
Next page (second 10):
SELECT ... WHERE ... AND id < $left_off ORDER BY id DESC LIMIT 10
With INDEX(id), this suddenly becomes very efficient.
Implementation -- Getting Rid of OFFSET
You are probably doing this now:
ORDER BY datetime DESC LIMIT 49990,10
You probably have some unique id on the table.
This can probably be used for "left off".
Currently, the [Next] button probably has a url something like
?topic=xyz&page=4999
The 'topic' (or 'tag' or 'provider' or 'user' or etc) says which set of items are being displayed.
The product of page*limit gives the OFFSET.
(The "limit=10" might be in the url, or might be hard-coded;
this choice is not relevant to this discussion. I will omit it from the url for brevity.)
The new variant would be
?topic=xyz&id=12345.
(Note: the 12345 is not computable from 4999.)
By using INDEX(topic, id) you can efficiently say
WHERE topic = 'xyz'
AND id >= 1234
ORDER BY id
LIMIT 10
That will hit only 10 rows. This is a huge improvement for later pages.
Now for more details.
Implementation -- "Left Off"
What if there are exactly 10 rows left when you display the current page?
It would make the UI nice if you grayed out the [Next] button, wouldn't it?
(Or you could suppress the button all together.)
How to do that? Instead of LIMIT 10, use LIMIT 11.
That will give you the 10 items needed for the current page, plus
an indication of whether there is another page. And the id for that page.
So, take the 11th id for the [Next] button:
<a href=?topic=xyz&id=$id11>Next</a>
If you have a composite key for "left off", then it is a bit more complex. See
compound leftoff
Implementation -- Links Beyond [Next]
Let's extend the 11 trick to also find the next 5 pages and
build links for them.
Plan A is to say LIMIT 51. If you are on page 12, that would give
you links for pages 13 (using 11th id) through pages 17 (51st).
Plan B is to do two queries, one to get the 10 items for the current page,
the other to get the next 41 ids (LIMIT 10, 41) for the next 5 pages.
Which Plan to pick? It depends on many things, so benchmark.
A Reasonable set of Links
Reaching forward and backward by 5 pages is not too much work.
It would take two separate queries to find the ids in both directions.
Also, having links that take you to the First and Last pages
would be easy to do. No id is needed; they can be something like
<a href=?topic=xyz&id=FIRST>First</a>
<a href=?topic=xyz&id=LAST>Last</a>
The client would recognize those, then generate a SELECT with something like
WHERE topic = 'xyz'
ORDER BY id ASC -- ASC for First; DESC for Last
LIMIT 10
The Last items would be delivered in reverse order.
Either deal with that in the client, or make the SELECT more complex:
( SELECT ...
WHERE topic = 'xyz'
ORDER BY id DESC
LIMIT 10
) ORDER BY id ASC
Let's say you are on page 12 of lots of pages. It could show these links:
[First] ... [7] [8] [9] [10] [11] 12 [13] [14] [15] [16] [17] ... [Last]
where the ellipsis is really used.
Some end cases:
Page one of three:
First [2] [3]
Page one of many:
First [2] [3] [4] [5] ... [Last]
Page two of many:
[First] 2 [3] [4] [5] ... [Last]
If you jump to the Last page, you don't know what page number it is.
So, the best you can do is perhaps:
[First] ... [Prev] Last
Why it Works
The goal is to touch only the relevant rows,
not all the rows leading up to the desired rows.
This is nicely achieved, except for building links to the "next 5 pages".
That may (or may not) be efficiently resolved by the simple SELECT id, discussed above.
The reason that may not be efficient deals with the WHERE clause.
Let's discuss the optimimal and suboptimal indexes.
For this discussion, I am assuming
⚈ The datetime field might have duplicates -- this can cause troubles
⚈ The id field is unique
⚈ The id field is close enough to datetime-ordered to be used instead of datetime.
Very efficient -- it does all the work in INDEX(topic, id):
WHERE topic = 'xyz'
AND id >= 876
ORDER BY id ASC
LIMIT 10,41
That will hit 51 consecutive index entries, 0 data rows.
Inefficient -- it must reach into the data to get deleted:
WHERE topic = 'xyz'
AND id >= 876
AND is_deleted = 0
ORDER BY id ASC
LIMIT 10,41
That will hit at least 51 consecutive index entries, plus at least
51 randomly located data rows.
Efficient -- back to the previous degree of efficiency via INDEX(topic, is_deleted, id):
WHERE topic = 'xyz'
AND id >= 876
AND is_deleted = 0
ORDER BY id ASC
LIMIT 10,41
Note how all the = parts of the WHERE come first;
then comes both the >= and ORDER BY, both on id.
This means that the INDEX can be used for all the WHERE, plus the ORDER BY.
"Items 11-20 out of 12345"
You lose the "out of" except when the count is small.
Instead, say something like
Items 11-20 out of Many
Alternatively... Only a few searches will have too many items to count.
Keep another table with the search criteria and a count.
This count can be computed daily (or hourly) by some background script.
When discovering that the topic is a busy one,
look it up in the table to get
Items 11-20 out of about 49,000
The background script would round off the count.
The quick way to get an estimated number of rows for an InnoDB table is
SELECT table_rows
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'database_name'
AND TABLE_NAME = 'table_name'
However, it does not allow for the WHERE clause that you probably have.
The SQL_CALC_FOUND_ROWS query modifier and accompanying FOUND_ROWS() function are now deprecated and will be removed in a future MySQL version. ...
COUNT(*) is subject to certain optimizations. SQL_CALC_FOUND_ROWS causes some optimizations to be disabled. -- From the MySQL 8.0.17 Changelog
Complex WHERE, or JOIN
If the search criteria cannot be confined to an INDEX in a single table,
this technique is doomed. I have another paper that discusses "Lists",
which solves that (which extra development work), and even improves
on what is discussed here.
How Much Faster?
This depends on
⚈ How many rows (total)
⚈ Whether the WHERE clause prevented the efficient use of the ORDER BY
⚈ Whether the data is bigger than the cache.
This last one kicks in when building one page requires reading more
data from disk can be cached. At that point, the problem goes from
being CPU-bound to being I/O-bound. This is likely to suddenly
slow down the loading of a pages by a factor of 10.
What is Lost
⚈ Cannot "jump to Page N", for an arbitrary N. Why would you want to do that?
⚈ Walking backward from the end does not know the page numbers.
⚈ The code is more complex.
Pagination and UNION
If you must use OFFSET, but the query involves UNION, this is the pattern to use:
( SELECT ... ORDER BY x LIMIT 10 )
UNION
( SELECT ... ORDER BY x LIMIT 10 )
ORDER BY x LIMIT 10
Since you cannot predict which SELECT(s) will the "10" will come from, you need to get 10 from each, then further whittle down the list, repeating both the ORDER BY and LIMIT.
For the 4th page of 10 items, this pattern is needed:
( SELECT ... ORDER BY x LIMIT 40 )
UNION
( SELECT ... ORDER BY x LIMIT 40 )
ORDER BY x LIMIT 30, 10
That is, collect 4 page's worth in each SELECT, then do the OFFSET in the UNION.
Postlog
Designed about 2007;
posted 2012;
cosmetic refresh: 2017
UNION: Sep, 2017
A forum discussion
Luke calls it "seek method" or "keyset pagination"
More by Luke
Mixing random and pagination
jOOQ using the Seek Method
Lists
Multiple discussions of this technique
another discussion by 'Timekiller'
Why Most Programmers Get Pagination Wrong
First/Prev/Next/Last
Blog on Scanning Large Table
kids and their pets
Partition Maintenance (DROP+REORG) for time series (includes list of PARTITION uses)
Big DELETEs - how to optimize -- and other chunking advice, plus a use for PARTITIONing
Chunking lengthy DELETE/UPDATE/etc.
Data Warehouse techniques:
Data Warehouse Overview
Summary Tables
High speed ingestion
Bulk Normalization
Schema and code design for large Sensor database
Entity-Attribute-Value (EAV) -- a common, poorly performing, design pattern; plus an alternative
Lat/Lng search to Find the nearest 10 pizza parlors
Lat/Long representation choices
Pagination, not with OFFSET, LIMIT
Techniques on efficiently finding a random row (On beyond ORDER BY RAND())
GUID/UUID Performance (type 1 only)
IP Range Table Performance -- or other disjoint ranges
Alter of a Huge table -- Mostly obviated by 5.6
Efficient List of Latest 10 news articles
Build and execute a Pivot SELECT (showing rows as columns)
(Groupwise Max): Efficiently find largest row(s) for each group
Other Tips, Tuning, Debugging, Optimizations, etc...
Rick's RoTs (Rules of Thumb -- lots of tips)
Datatypes and building a good schema
Memory Allocation (caching, etc)
Character Set and Collation problem solver
Trouble with UTF-8
If you want case folding, but accent sensitivity, please file a request at https://bugs.mysql.com .
Python tips,
PHP tips,
other language tips
utf8 Collations
utf8mb4 Collations on 8.0
Converting from MyISAM to InnoDB -- includes differences between them
Compound INDEXes plus other insights into the mysteries of INDEXing
Cookbook for Creating Indexes
Many-to-many mapping table
Handler counts
wp_postmeta
UNION+OFFSET
MySQL Limits -- built-in hard limits
767-byte INDEX limit
Galera, tips on converting to (Percona XtraDB Cluster, MariaDB 10, or manually installed)
5.7's Query Rewrite -- perhaps 5.7's best perf gain, at least for this forum's users
Analyze MySQL Performance
Analyze VARIABLEs and GLOBAL STATUS
Analyze SlowLog
My slides from conferences
MiniFest 2021 - Rick James & Daniel Black - Answering on Stack Overflow(+comments) - MariaDB Frontlines
Percona Live 4/2017 - Rick's RoTs (Rules of Thumb) - MySQL/MariaDB
Percona Live 4/2017 - Index Cookbook - MySQL/MariaDB
Percona Live 9/2015 - PARTITIONing - MySQL/MariaDB
Contact me via LinkedIn; be sure to include a brief teaser in the Invite request: