ZOrder in MySQL

Table of Contents

The Problem

2D to 1D

Where it goes wrong

Creating the ZOrder number

Poles (> 85 deg)

Multiple chunks

Reference code

Postlog

Brought to you by Rick James

UNDER CONSTUCTION

The Problem


The uptimate goal (here) is to provide an efficient way to "find the ten coffee shops nearest me" in a huge MySQL database.

Indexes are the way any RDBMS provides efficient access to a few rows out of millions. But indexes are one-dimensional, and we need a two-dimensional index.

ZOrder is a technique for combining two numbers, such as latitude and longitude, into one for the purpose of making a 1D index of 2D data.

It is mostly successful in that two points near to each other on the map are _usually_ near each other in the index. This allows for "find nearest" to mostly work efficiently.

Because of those "mostlys", the implementation and use of a ZOrder index is complicated. This document discusses such an implementation.

2D to 1D


Think of the fingers of one of your hands being the bits of the integer representing "latitude". Ditto for your other hand and "longitude". Now fold your hands; notice how the 'bits' are interleved. This represents turning two dimensional numbers in to a single, "ZOrder", number.

The "Z" comes from going back to the map and connecting, in zorder, the items. It looks like a zig-zag line that is "self-similar". The repeating pattern is a Z (or N or C or U, depending on how you interlace your fingers).

Wikipedia - Self-similarity
Wikipedia on Z-roder curve

Where it goes wrong


Mostly, nearby items on the globa are nearby in the index. But, while it is busy keeping some items nearby, others are not being included. In the worst case, two 'adjacent' items can be as far as halfway across the index. Furtunately, this is an extreme case, but it needs to be handled in the algorithm. To do so, the algorithm first looks for items nearby, then discovers a boundary between "nearby" and "far" in each direction (north or south, and east or west). Then it makes a separate effort to collect items in those "far" areas.

Creating the ZOrder number


    1.  convert lat and lng to integers, shifted and scaled sufficiently. I chose to scale them up to 25-bit integers so that there would be very little roundoff when comparing two nearly equally-distant items.
    2.  if 5 stages (ceil(log2(25))), spread the bits apart.
    3.  OR together the two bit strings (without the 1s hitting).

Poles (> 85 deg)


If there are only a small number of items above +/-85 degrees latitude, then the proposed code is to artificially comput the zorder number with latitude to +/-85.xx degrees.

Multiple chunks


The worst case for zorder is perhaps latitude 0 at longitude 0. The values for the combined zorder number in the 4 quadrants adjacent to that spot are radically different. Hence it would be unwise to try to fetch all the data with a single SELECT. Instead, the algorithm first discovers the "square" containing the source location and with the boundaries at the most significant zorder values. These boundaries would have the most trailing zero bits. Then do 4 selects, one in each quadrant.

However, optimizations can change those "4" selects into between 2 and 6

    1.  (Select #1) Scan forward in the box containing the source. ("forward" meaning ORDER BY zorder ASC)
    2.  (Select #2) Scan backward in the box. See if enough rows have been found in the first 2 selects to satisfy the query and all the items are still within the box. If so, quit.
    3.  (Select #3 - maybe) If selects 1 or 2 failed to reach all the way to the corners of the box, then recalculate a smaller bounding box and query it entirely. Again, this might lead to completion.
    4.  (Select #4 - maybe) Step over the closest edge and scan a rectangle outside the box.
    5.  (Select #5 - maybe) Ditto with the next closest edge.
    6.  (Select #6 - maybe) Finally scan the corner between them.

After most steps, decrease the radius if possible. That is, if looking for 10 items, lower the radius to the distance to the 10th closest item (if you have 10 so far).

Much of this is messy to do in a Stored Procedure; instead the reference code uses PHP.

Reference code


Here is a working implementation tested against 3.1M cities in the world. There were no items near poles; the code does not account for such (sorry, polar bears).

See ...(link to be provided)

Postlog


While I conceived the idea in about 2009, I did not actually implement it until 2019.
-- Rick James

MySQL Documents by Rick James

HowTo Techniques for Optimizing Tough Tasks:

Partition Maintenance (DROP+REORG) for time series (includes list of PARTITION uses)

Big DELETEs - how to optimize -- and other chunking advice, plus a use for PARTITIONing
    Chunking lengthy DELETE/UPDATE/etc.

Data Warehouse techniques:
    Overview   Summary Tables   High speed ingestion  

Entity-Attribute-Value -- a common, poorly performing, design pattern (EAV); plus an alternative

5 methods for 'Find Nearest'

Find the nearest 10 pizza parlors -- efficient searching on Latitude + Longitude (another PARITION use)
    Lat/Long representation choices

Z-Order 'find nearest'(under construction)

Pagination, not with OFFSET, LIMIT

Techniques on efficiently finding a random row (On beyond ORDER BY RAND())

GUID/UUID Performance (type 1 only)

IP Range Table Performance -- or other disjoint ranges

Rollup Unique User Counts

Alter of a Huge table -- Mostly obviated by 5.6

Latest 10 news articles -- how to optimize the schema and code for such

Build and execute a "Pivot" SELECT (showing rows as columns)

Find largest row for each group ("groupwise max")

Other Tips, Tuning, Debugging, Optimizations, etc...

Rick's RoTs (Rules of Thumb -- lots of tips)

Memory Allocation (caching, etc)

Character Set and Collation problem solver
    Trouble with UTF-8   If you want case folding, but accent sensitivity, please file a request at http://bugs.mysql.com .
    Python tips,   PHP tips,   other language tips
    utf8 Collations   utf8mb4 Collations on 8.0

Converting from MyISAM to InnoDB -- includes differences between them

Compound INDEXes plus other insights into the mysteries of INDEXing

Cookbook for Creating Indexes
    Many-to-many mapping table   wp_postmeta   UNION+OFFSET

MySQL Limits -- built-in hard limits
    767-byte INDEX limit

Galera, tips on converting to (Percona XtraDB Cluster, MariaDB 10, or manually installed)

5.7's Query Rewrite -- perhaps 5.7's best perf gain, at least for this forum's users

Request for tuning / slowlog info

Best of MySQL Forum -- index of lots of tips, discussions, etc

Analyze MySQL Performance
    Analyze VARIABLEs and GLOBAL STATUS     Analyze SlowLog

My slides from conferences
Percona Live 4/2017 - Rick's RoTs (Rules of Thumb) - MySQL/MariaDB
Percona Live 4/2017 - Index Cookbook - MySQL/MariaDB
Percona Live 9/2015 - PARTITIONing - MySQL/MariaDB
(older ones upon request)

Contact me via LinkedIn; be sure to include a brief teaser in the Invite request:   View Rick James's profile on LinkedIn

Did my articles help you out? Like what you see? Consider donating:

☕️ Buy me a Banana Latte ($4) There is no obligation but it would put a utf8mb4 smiley 🙂 on my face, instead of the Mojibake "🙂"