The uptimate goal (here) is to provide an efficient way to "find the ten coffee shops nearest me" in a huge MySQL database.
Indexes are the way any RDBMS provides efficient access to a few rows out of millions. But indexes are one-dimensional, and we need a two-dimensional index.
ZOrder is a technique for combining two numbers, such as latitude and longitude, into one for the purpose of making a 1D index of 2D data.
It is mostly successful in that two points near to each other on the map are _usually_ near each other in the index. This allows for "find nearest" to mostly work efficiently.
Because of those "mostlys", the implementation and use of a ZOrder index is complicated. This document discusses such an implementation.
2D to 1D
Think of the fingers of one of your hands being the bits of the integer representing "latitude". Ditto for your other hand and "longitude". Now fold your hands; notice how the 'bits' are interleved. This represents turning two dimensional numbers in to a single, "ZOrder", number.
The "Z" comes from going back to the map and connecting, in zorder, the items. It looks like a zig-zag line that is "self-similar". The repeating pattern is a Z (or N or C or U, depending on how you interlace your fingers).
Wikipedia - Self-similarity
Wikipedia on Z-roder curve
Where it goes wrong
Mostly, nearby items on the globa are nearby in the index. But, while it is busy keeping some items nearby, others are not being included. In the worst case, two 'adjacent' items can be as far as halfway across the index. Furtunately, this is an extreme case, but it needs to be handled in the algorithm. To do so, the algorithm first looks for items nearby, then discovers a boundary between "nearby" and "far" in each direction (north or south, and east or west). Then it makes a separate effort to collect items in those "far" areas.
Creating the ZOrder number
1. convert lat and lng to integers, shifted and scaled sufficiently. I chose to scale them up to 25-bit integers so that there would be very little roundoff when comparing two nearly equally-distant items.
2. if 5 stages (ceil(log2(25))), spread the bits apart.
3. OR together the two bit strings (without the 1s hitting).
Poles (> 85 deg)
If there are only a small number of items above +/-85 degrees latitude, then the proposed code is to artificially comput the zorder number with latitude to +/-85.xx degrees.
The worst case for zorder is perhaps latitude 0 at longitude 0. The values for the combined zorder number in the 4 quadrants adjacent to that spot are radically different. Hence it would be unwise to try to fetch all the data with a single SELECT. Instead, the algorithm first discovers the "square" containing the source location and with the boundaries at the most significant zorder values. These boundaries would have the most trailing zero bits. Then do 4 selects, one in each quadrant.
However, optimizations can change those "4" selects into between 2 and 6
1. (Select #1) Scan forward in the box containing the source. ("forward" meaning ORDER BY zorder ASC)
2. (Select #2) Scan backward in the box. See if enough rows have been found in the first 2 selects to satisfy the query and all the items are still within the box. If so, quit.
3. (Select #3 - maybe) If selects 1 or 2 failed to reach all the way to the corners of the box, then recalculate a smaller bounding box and query it entirely. Again, this might lead to completion.
4. (Select #4 - maybe) Step over the closest edge and scan a rectangle outside the box.
5. (Select #5 - maybe) Ditto with the next closest edge.
6. (Select #6 - maybe) Finally scan the corner between them.
After most steps, decrease the radius if possible. That is, if looking for 10 items, lower the radius to the distance to the 10th closest item (if you have 10 so far).
Much of this is messy to do in a Stored Procedure; instead the reference code uses PHP.
Here is a working implementation tested against 3.1M cities in the world. There were no items near poles; the code does not account for such (sorry, polar bears).
See ...(link to be provided)
While I conceived the idea in about 2009, I did not actually implement it until 2019.
-- Rick James
MySQL Documents by Rick James
HowTo Techniques for Optimizing Tough Tasks:
Partition Maintenance (DROP+REORG) for time series (includes list of PARTITION uses)
Big DELETEs - how to optimize -- and other chunking advice, plus a use for PARTITIONing
Chunking lengthy DELETE/UPDATE/etc.
Data Warehouse techniques:
Overview Summary Tables High speed ingestion
Entity-Attribute-Value -- a common, poorly performing, design pattern (EAV); plus an alternative
5 methods for 'Find Nearest'
Find the nearest 10 pizza parlors -- efficient searching on Latitude + Longitude (another PARITION use)
Lat/Long representation choices
Z-Order 'find nearest'(under construction)
Pagination, not with OFFSET, LIMIT
Techniques on efficiently finding a random row (On beyond ORDER BY RAND())
GUID/UUID Performance (type 1 only)
IP Range Table Performance -- or other disjoint ranges
Rollup Unique User Counts
Alter of a Huge table -- Mostly obviated by 5.6
Latest 10 news articles -- how to optimize the schema and code for such
Build and execute a "Pivot" SELECT (showing rows as columns)
Find largest row for each group ("groupwise max")
Other Tips, Tuning, Debugging, Optimizations, etc...
Rick's RoTs (Rules of Thumb -- lots of tips)
Memory Allocation (caching, etc)
Character Set and Collation problem solver
Trouble with UTF-8 If you want case folding, but accent sensitivity, please file a request at http://bugs.mysql.com .
Python tips, PHP tips, other language tips
utf8 Collations utf8mb4 Collations on 8.0
Converting from MyISAM to InnoDB -- includes differences between them
Compound INDEXes plus other insights into the mysteries of INDEXing
Cookbook for Creating Indexes
Many-to-many mapping table wp_postmeta UNION+OFFSET
MySQL Limits -- built-in hard limits
767-byte INDEX limit
Galera, tips on converting to (Percona XtraDB Cluster, MariaDB 10, or manually installed)
5.7's Query Rewrite -- perhaps 5.7's best perf gain, at least for this forum's users
Request for tuning / slowlog info
Best of MySQL Forum -- index of lots of tips, discussions, etc
Analyze MySQL Performance
Analyze VARIABLEs and GLOBAL STATUS Analyze SlowLog
My slides from conferences
Percona Live 4/2017 - Rick's RoTs (Rules of Thumb) - MySQL/MariaDB
Percona Live 4/2017 - Index Cookbook - MySQL/MariaDB
Percona Live 9/2015 - PARTITIONing - MySQL/MariaDB
(older ones upon request)
Contact me via LinkedIn; be sure to include a brief teaser in the Invite request: