UNDER CONSTUCTION
Partition Maintenance (DROP+REORG) for time series (includes list of PARTITION uses)
Big DELETEs - how to optimize -- and other chunking advice, plus a use for PARTITIONing
Data Warehouse techniques:
Schema and code design for large Sensor database
Entity-Attribute-Value (EAV) -- a common, poorly performing, design pattern; plus an alternative
Lat/Lng search to Find the nearest 10 pizza parlors
The Problem
The uptimate goal (here) is to provide an efficient way to "find the ten coffee shops nearest me" in a huge MySQL database.
Indexes are the way any RDBMS provides efficient access to a few rows out of millions. But indexes are one-dimensional, and we need a two-dimensional index.
ZOrder is a technique for combining two numbers, such as latitude and longitude, into one
for the purpose of making a 1D index of 2D data.
It is mostly successful in that two points near to each other on the map are usually near each other in the index.
This allows for "find nearest" to mostly work efficiently.
Because of those "mostlys", the implementation and use of a ZOrder index is complicated. This document discusses such an implementation.
2D to 1D
Think of the fingers of one of your hands being the bits of the integer representing "latitude". Ditto for your other hand and "longitude".
Now fold your hands; notice how the 'bits' are interleved. This represents turning two dimensional numbers in to a single, "ZOrder", number.
The "Z" comes from going back to the map and connecting, in zorder, the items. It looks like a zig-zag line that is "self-similar".
The repeating pattern is a Z (or N or C or U, depending on how you interlace your fingers).
Wikipedia - Self-similarity
Wikipedia on Z-roder curve
Where it goes wrong
Mostly, nearby items on the globa are nearby in the index. But, while it is busy keeping some items nearby, others are not being included.
In the worst case, two 'adjacent' items can be as far as halfway across the index. Furtunately, this is an extreme case, but it needs to be handled in the algorithm. To do so, the algorithm first looks for items nearby, then discovers a boundary between "nearby" and "far" in each direction (north or south, and east or west). Then it makes a separate effort to collect items in those "far" areas.
Creating the ZOrder number
1. convert lat and lng to integers, shifted and scaled sufficiently. I chose to scale them up to 25-bit integers so that there would be very little roundoff when comparing two nearly equally-distant items.
2. if 5 stages (ceil(log2(25))), spread the bits apart.
3. OR together the two bit strings (without the 1s hitting).
Poles (> 85 deg)
If there are only a small number of items above +/-85 degrees latitude, then the proposed code is to artificially comput the zorder number with latitude to +/-85.xx degrees.
Multiple chunks
The worst case for zorder is perhaps latitude 0 at longitude 0. The values for the combined zorder number in the 4 quadrants adjacent to that spot are radically different. Hence it would be unwise to try to fetch all the data with a single SELECT. Instead, the algorithm first discovers the "square" containing the source location and with the boundaries at the most significant zorder values. These boundaries would have the most trailing zero bits. Then do 4 selects, one in each quadrant.
However, optimizations can change those "4" selects into between 2 and 6
1. (Select #1) Scan forward in the box containing the source. ("forward" meaning ORDER BY zorder ASC)
2. (Select #2) Scan backward in the box. See if enough rows have been found in the first 2 selects to satisfy the query and all the items are still within the box. If so, quit.
3. (Select #3 - maybe) If selects 1 or 2 failed to reach all the way to the corners of the box, then recalculate a smaller bounding box and query it entirely. Again, this might lead to completion.
4. (Select #4 - maybe) Step over the closest edge and scan a rectangle outside the box.
5. (Select #5 - maybe) Ditto with the next closest edge.
6. (Select #6 - maybe) Finally scan the corner between them.
After most steps, decrease the radius if possible. That is, if looking for 10 items, lower the radius to the distance to the 10th closest item (if you have 10 so far).
Much of this is messy to do in a Stored Procedure; instead the reference code uses PHP.
Reference code
Here is a working implementation tested against 3.1M cities in the world. There were no items near poles; the code does not account for such (sorry, polar bears).
See ...(link to be provided)
Postlog
While I conceived the idea in about 2009, I did not actually implement it until 2019.
-- Rick James
MySQL Documents by Rick James
HowTo Techniques for Optimizing Tough Tasks:
Chunking lengthy DELETE/UPDATE/etc.
Data Warehouse Overview
Summary Tables
High speed ingestion
Bulk Normalization
Lat/Long representation choices
Pagination, not with OFFSET, LIMIT
Techniques on efficiently finding a random row (On beyond ORDER BY RAND())
GUID/UUID Performance (type 1 only)
IP Range Table Performance -- or other disjoint ranges
Alter of a Huge table -- Mostly obviated by 5.6
Efficient List of Latest 10 news articles
Build and execute a Pivot SELECT (showing rows as columns)
(Groupwise Max): Efficiently find largest row(s) for each group
Other Tips, Tuning, Debugging, Optimizations, etc...
Rick's RoTs (Rules of Thumb -- lots of tips)
Datatypes and building a good schema
Memory Allocation (caching, etc)
Character Set and Collation problem solver
Trouble with UTF-8
If you want case folding, but accent sensitivity, please file a request at https://bugs.mysql.com .
Python tips,
PHP tips,
other language tips
utf8 Collations
utf8mb4 Collations on 8.0
Converting from MyISAM to InnoDB -- includes differences between them
Compound INDEXes plus other insights into the mysteries of INDEXing
Cookbook for Creating Indexes
Many-to-many mapping table
Handler counts
wp_postmeta
UNION+OFFSET
MySQL Limits -- built-in hard limits
767-byte INDEX limit
Galera, tips on converting to (Percona XtraDB Cluster, MariaDB 10, or manually installed)
5.7's Query Rewrite -- perhaps 5.7's best perf gain, at least for this forum's users
Analyze MySQL Performance
Analyze VARIABLEs and GLOBAL STATUS
Analyze SlowLog
My slides from conferences
MiniFest 2021 - Rick James & Daniel Black - Answering on Stack Overflow(+comments) - MariaDB Frontlines
Percona Live 4/2017 - Rick's RoTs (Rules of Thumb) - MySQL/MariaDB
Percona Live 4/2017 - Index Cookbook - MySQL/MariaDB
Percona Live 9/2015 - PARTITIONing - MySQL/MariaDB
Contact me via LinkedIn; be sure to include a brief teaser in the Invite request: