Pagination

Table of Contents

The Desire

The Problem

Other Bugs

What to Do?

Implementation -- Getting Rid of OFFSET

Implementation -- "Left Off"

Implementation -- Links Beyond [Next]

A Reasonable set of Links

Why it Works

"Items 11-20 out of 12345"

Complex WHERE, or JOIN

How Much Faster?

What is Lost

Pagination and UNION

Postlog

Brought to you by Rick James


The Desire


You have a Web site with news articles, or a blog, or some other thing with a list of things that might be too long for a single page. So, you decide to break it into chunks of, say, 10 items and provide a [Next] button to go the next "page".

You discover OFFSET and LIMIT in MySQL (or MariaDB) and decide that is the obvious way to do it.
    SELECT  *
        FROM  items
        WHERE  messy_filtering
        ORDER BY  date DESC
        OFFSET  $M  LIMIT $N

Note that the problem requirement needs a [Next] link on each page so that the user can 'page' through the data. He does not really need "GoTo Page ___". Jump to the [First] or [Last] page may be useful.

The Problem


All is well -- until you have 50,000 items in a list. And someone tries to walk through all 5000 pages. That 'someone' could be a search engine crawler.

Where's the problem? Performance. Your web page is doing SELECT ... OFFSET 49990 LIMIT 10 (or the equivalent LIMIT 49990,10). MySQL has to find all 50,000 rows, step over the first 49,990, then deliver the 10 for that distant page.

If it is a crawler ('spider') that read all the pages, then it actually touched about 125,000,000 items to read all 5,000 pages.

Reading the entire table, just to get a distant page, can be so much I/O that it can cause timeouts on the web page. Or it can interfere with other activity, causing other things to be slow.

Other Bugs


In addition to a performance problem, ...
    ⚈  If an item is inserted or deleted between the time you look at one page and the next, you could miss an item, or see an item duplicated.
    ⚈  The pages are not easily bookmarked or sent to someone else because the contents shift over time.
    ⚈  The WHERE clause and the ORDER BY may even make it so that all 50,000 items have to be read, just to find the 10 items for page 1!

What to Do?


Hardware? No, that's just a bandaid. The data will continue to grow and even the new hardware won't handle it.

Better INDEX? No. You must get away from reading the entire table to get the 5000th page.

Build another table saying where the pages start? Get real! That would be a maintenance nightmare, and expensive.

Bottom line: Don't use OFFSET; instead remember where you "left off".
First page (latest 10 items):
    SELECT ... WHERE ... ORDER BY id DESC LIMIT 10
Next page (second 10):
    SELECT ... WHERE ... AND id < $left_off  ORDER BY id DESC  LIMIT 10
With INDEX(id), this suddenly becomes very efficient.

Implementation -- Getting Rid of OFFSET


You are probably doing this now: ORDER BY datetime DESC LIMIT 49990,10 You probably have some unique id on the table. This can probably be used for "left off".

Currently, the [Next] button probably has a url something like ?topic=xyz&page=4999 The 'topic' (or 'tag' or 'provider' or 'user' or etc) says which set of items are being displayed. The product of page*limit gives the OFFSET. (The "limit=10" might be in the url, or might be hard-coded; this choice is not relevant to this discussion. I will omit it from the url for brevity.)

The new variant would be ?topic=xyz&id=12345. (Note: the 12345 is not computable from 4999.) By using INDEX(topic, id) you can efficiently say
    WHERE topic = 'xyz'
      AND id >= 1234
    ORDER BY id
    LIMIT 10
That will hit only 10 rows. This is a huge improvement for later pages. Now for more details.

Implementation -- "Left Off"


What if there are exactly 10 rows left when you display the current page? It would make the UI nice if you grayed out the [Next] button, wouldn't it? (Or you could suppress the button all together.)

How to do that? Instead of LIMIT 10, use LIMIT 11. That will give you the 10 items needed for the current page, plus an indication of whether there is another page. And the id for that page.

So, take the 11th id for the [Next] button: <a href=?topic=xyz&id=$id11>Next</a>

If you have a composite key for "left off", then it is a bit more complex. See
compound leftoff

Implementation -- Links Beyond [Next]


Let's extend the 11 trick to also find the next 5 pages and build links for them.

Plan A is to say LIMIT 51. If you are on page 12, that would give you links for pages 13 (using 11th id) through pages 17 (51st).

Plan B is to do two queries, one to get the 10 items for the current page, the other to get the next 41 ids (LIMIT 10, 41) for the next 5 pages.

Which Plan to pick? It depends on many things, so benchmark.

A Reasonable set of Links


Reaching forward and backward by 5 pages is not too much work. It would take two separate queries to find the ids in both directions. Also, having links that take you to the First and Last pages would be easy to do. No id is needed; they can be something like
    <a href=?topic=xyz&id=FIRST>First</a>
    <a href=?topic=xyz&id=LAST>Last</a>
The client would recognize those, then generate a SELECT with something like
    WHERE topic = 'xyz'
    ORDER BY id ASC -- ASC for First; DESC for Last
    LIMIT 10
The Last items would be delivered in reverse order. Either deal with that in the client, or make the SELECT more complex:
    ( SELECT ...
        WHERE topic = 'xyz'
        ORDER BY id DESC
        LIMIT 10
    ) ORDER BY id ASC

Let's say you are on page 12 of lots of pages. It could show these links:
    [First] ... [7] [8] [9] [10] [11] 12 [13] [14] [15] [16] [17] ... [Last]
where the ellipsis is really used. Some end cases:
Page one of three:
    First [2] [3]
Page one of many:
    First [2] [3] [4] [5] ... [Last]
Page two of many:
    [First] 2 [3] [4] [5] ... [Last]
If you jump to the Last page, you don't know what page number it is.
So, the best you can do is perhaps:
    [First] ... [Prev] Last

Why it Works


The goal is to touch only the relevant rows, not all the rows leading up to the desired rows. This is nicely achieved, except for building links to the "next 5 pages". That may (or may not) be efficiently resolved by the simple SELECT id, discussed above. The reason that may not be efficient deals with the WHERE clause.

Let's discuss the optimimal and suboptimal indexes.

For this discussion, I am assuming
    ⚈  The datetime field might have duplicates -- this can cause troubles
    ⚈  The id field is unique
    ⚈  The id field is close enough to datetime-ordered to be used instead of datetime.

Very efficient -- it does all the work in INDEX(topic, id):
    WHERE topic = 'xyz'
      AND id >= 876
    ORDER BY id ASC
    LIMIT 10,41
That will hit 51 consecutive index entries, 0 data rows.

Inefficient -- it must reach into the data to get deleted:
    WHERE topic = 'xyz'
      AND id >= 876
      AND is_deleted = 0
    ORDER BY id ASC
    LIMIT 10,41
That will hit at least 51 consecutive index entries, plus at least 51 randomly located data rows.

Efficient -- back to the previous degree of efficiency via INDEX(topic, is_deleted, id):
    WHERE topic = 'xyz'
      AND id >= 876
      AND is_deleted = 0
    ORDER BY id ASC
    LIMIT 10,41
Note how all the = parts of the WHERE come first; then comes both the >= and ORDER BY, both on id. This means that the INDEX can be used for all the WHERE, plus the ORDER BY.

"Items 11-20 out of 12345"


You lose the "out of" except when the count is small. Instead, say something like
    Items 11-20 out of Many

Alternatively... Only a few searches will have too many items to count. Keep another table with the search criteria and a count. This count can be computed daily (or hourly) by some background script. When discovering that the topic is a busy one, look it up in the table to get
    Items 11-20 out of about 49,000
The background script would round off the count.

The quick way to get an estimated number of rows for an InnoDB table is
    SELECT  table_rows
        FROM  information_schema.TABLES
        WHERE  TABLE_SCHEMA = 'database_name'
          AND  TABLE_NAME = 'table_name'
However, it does not allow for the WHERE clause that you probably have.

The SQL_CALC_FOUND_ROWS query modifier and accompanying FOUND_ROWS() function are now deprecated and will be removed in a future MySQL version. ... COUNT(*) is subject to certain optimizations. SQL_CALC_FOUND_ROWS causes some optimizations to be disabled. -- From the MySQL 8.0.17 Changelog

Complex WHERE, or JOIN


If the search criteria cannot be confined to an INDEX in a single table, this technique is doomed. I have another paper that discusses "Lists", which solves that (which extra development work), and even improves on what is discussed here.

How Much Faster?


This depends on
    ⚈  How many rows (total)
    ⚈  Whether the WHERE clause prevented the efficient use of the ORDER BY
    ⚈  Whether the data is bigger than the cache.
This last one kicks in when building one page requires reading more data from disk can be cached. At that point, the problem goes from being CPU-bound to being I/O-bound. This is likely to suddenly slow down the loading of a pages by a factor of 10.

What is Lost


    ⚈  Cannot "jump to Page N", for an arbitrary N. Why would you want to do that?
    ⚈  Walking backward from the end does not know the page numbers.
    ⚈  The code is more complex.

Pagination and UNION


If you must use OFFSET, but the query involves UNION, this is the pattern to use:
( SELECT ... ORDER BY x  LIMIT 10 )
UNION
( SELECT ... ORDER BY x  LIMIT 10 )
ORDER BY x  LIMIT 10
Since you cannot predict which SELECT(s) will the "10" will come from, you need to get 10 from each, then further whittle down the list, repeating both the ORDER BY and LIMIT.

For the 4th page of 10 items, this pattern is needed:
( SELECT ... ORDER BY x  LIMIT 40 )
UNION
( SELECT ... ORDER BY x  LIMIT 40 )
ORDER BY x  LIMIT 30, 10
That is, collect 4 page's worth in each SELECT, then do the OFFSET in the UNION.

Postlog


Designed about 2007;   posted 2012;   cosmetic refresh: 2017   UNION: Sep, 2017

A forum discussion
Luke calls it "seek method" or "keyset pagination"
More by Luke
Mixing random and pagination
jOOQ using the Seek Method
Lists
Multiple discussions of this technique
another discussion by 'Timekiller'
Federico Razzoli's discussion
Why Most Programmers Get Pagination Wrong
First/Prev/Next/Last
Blog on Scanning Large Table

-- Rick James

MySQL Documents by Rick James

HowTo Techniques for Optimizing Tough Tasks:

Partition Maintenance (DROP+REORG) for time series (includes list of PARTITION uses)

Big DELETEs - how to optimize -- and other chunking advice, plus a use for PARTITIONing
    Chunking lengthy DELETE/UPDATE/etc.

Data Warehouse techniques:
    Overview   Summary Tables   High speed ingestion   Bulk Normalization  

Entity-Attribute-Value -- a common, poorly performing, design pattern (EAV); plus an alternative

5 methods for 'Find Nearest'

Find the nearest 10 pizza parlors -- efficient searching on Latitude + Longitude (another PARITION use)
    Lat/Long representation choices

Z-Order 'find nearest'(under construction)

Pagination, not with OFFSET, LIMIT

Techniques on efficiently finding a random row (On beyond ORDER BY RAND())

GUID/UUID Performance (type 1 only)

IP Range Table Performance -- or other disjoint ranges

Rollup Unique User Counts

Alter of a Huge table -- Mostly obviated by 5.6

Latest 10 news articles -- how to optimize the schema and code for such

Build and execute a "Pivot" SELECT (showing rows as columns)

Find largest row for each group ("groupwise max")

Other Tips, Tuning, Debugging, Optimizations, etc...

Rick's RoTs (Rules of Thumb -- lots of tips)

Datatypes and building a good schema

Memory Allocation (caching, etc)

Character Set and Collation problem solver
    Trouble with UTF-8   If you want case folding, but accent sensitivity, please file a request at http://bugs.mysql.com .
    Python tips,   PHP tips,   other language tips
    utf8 Collations   utf8mb4 Collations on 8.0

Converting from MyISAM to InnoDB -- includes differences between them

Compound INDEXes plus other insights into the mysteries of INDEXing

Cookbook for Creating Indexes
    Many-to-many mapping table   Handler counts   wp_postmeta   UNION+OFFSET

MySQL Limits -- built-in hard limits
    767-byte INDEX limit

Galera, tips on converting to (Percona XtraDB Cluster, MariaDB 10, or manually installed)

5.7's Query Rewrite -- perhaps 5.7's best perf gain, at least for this forum's users

Analyze MySQL Performance
    Analyze VARIABLEs and GLOBAL STATUS     Analyze SlowLog

My slides from conferences
Percona Live 4/2017 - Rick's RoTs (Rules of Thumb) - MySQL/MariaDB
Percona Live 4/2017 - Index Cookbook - MySQL/MariaDB
Percona Live 9/2015 - PARTITIONing - MySQL/MariaDB
(older ones upon request)

Contact me via LinkedIn; be sure to include a brief teaser in the Invite request:   View Rick James's profile on LinkedIn

Did my articles help you out? Like what you see? Consider donating:

☕️ Buy me a Banana Latte ($4) There is no obligation but it would put a utf8mb4 smiley 🙂 on my face, instead of the Mojibake "🙂"