EAV - Entity-Attribute-Value Implentation

Table of Contents

The Desires

It goes by various Names

Bad solution

The Problems

A Solution

But what about the ad hoc queries?

Why it Works

Details on the BLOB/JSON

A quick Example

FULLTEXT to the rescue

Conclusions

Postlog

Brought to you by Rick James


The Desires


    ⚈  Open-ended set of "attributes" (key=value) for each "entity". That is, the list of attributes is not known at development time, and will grow in the future. (This makes one column per attribute impractical.)
    ⚈  "ad hoc" queries testing attributes.
    ⚈  Attribute values come in different types (numbers, strings, dates, etc.)
    ⚈  Scale to lots of entities, yet perform well.

It goes by various Names

    ⚈  EAV -- Entity - Attribute - Value
    ⚈  key-value
    ⚈  RDF -- This is a flavor of EAV
    ⚈  MariaDB has
dynamic columns that look something like the solution below. They have the added advantage of being able to index the columns otherwise hidden in the blob. (There are caveats.)
    ⚈  5.7 Has JSON datatype, plus functions to access parts
    ⚈  MongoDB, CouchDB -- and others -- Not SQL-based.

Bad solution


    ⚈  Table with 3 columns: entity_id, key, value
    ⚈  The "value" is a string, or maybe multiple columns depending on datatype or other kludges.
    ⚈  a JOIN b ON a.entity=b.entity AND b.key='x' JOIN c ON ... WHERE a.value=... AND b.value=...

The Problems


    ⚈  The SELECTs get messy -- multiple JOINs
    ⚈  Datatype issues -- It's clumsy to be putting numbers into strings
    ⚈  Numbers stored in VARCHAR do not compare 'correctly', especially for range tests.
    ⚈  Bulky.
    ⚈  Dedupping the Values is clumsy.
    ⚈  Inefficient implementation of the K-V table. See the following on how to improve WordPress somewhat:
wp_postmeta

A Solution


Decide which columns need to be searched/sorted by SQL queries. No, you don't need all the columns to be searchable or sortable. Certain columns are frequently used for selection; identify these. You probably won't use all of them in all queries, but you will use some of them in every query.

The solution uses one table for all the EAV stuff. The columns include the searchable fields plus one TEXT/BLOB. Searchable fields are declared appropriately (INT, TIMESTAMP, etc). The BLOB contains JSON-encoding of all the extra fields.

The table should be ENGINE=InnoDB, hence it should have a PRIMARY KEY. The entitity_id is the 'natural' PK. Add a small number of other indexes (often 'composite') on the searchable fields. PARTITIONing is unlikely to be of any use, unless the Entities should purged after some time. (Example: News Articles)

But what about the ad hoc queries?


You have included the most important fields to search on -- date, category, etc. These should filter the data down significantly. When you also need to filter on something more obscure, that will be handled differently. The application code will look at the BLOB for that; more on this later.

Why it Works


    ⚈  You are not really going to search on more than a few fields.
    ⚈  The disk footprint is smaller; Smaller --> More cacheable --> Faster
    ⚈  It needs no JOINs
    ⚈  The indexes are useful
    ⚈  The one table has one row per entity, and can grow as needed. (EAV needs many rows per entity.)
    ⚈  Performance is as good as the indexes you have on the 'searchable fields'.
    ⚈  Optionally, you can duplicate the indexed fields in the BLOB.
    ⚈  Values missing from 'searchable fields' would need to be NULL (or whatever), and the code would need to deal with such.

Details on the BLOB/JSON


    ⚈  Build the extra (or all) key-value pairs in a hash (associative array) in your application. Encode it. COMPRESS it. Insert that string into the BLOB.
    ⚈  
JSON is recommended, but not mandatory; it is simpler than XML. Other serializations (eg, YAML) could be used.
    ⚈  COMPRESS the JSON and put it into a BLOB (or MEDIUMBLOB) instead of a TEXT field. Compression gives about 3x shrinkage.
    ⚈  When SELECTing, UNCOMPRESS the blob. Decode the string into a hash. You are now ready to interrogate/display any of the extra fields.
    ⚈  If you choose to use the JSON features of MariaDB or 5.7, you will have to forgo the compression feature described.
    ⚈  MySQL 5.7.8's JSON native JSON datatype uses a binary format for more efficient access.

For JSON in PHP, see $str = json_encode($hash); and $hash = json_decode($str, true); Also, compress() and uncompress().

A quick Example


For example, in Real Estate, num_bedrooms and price are far more useful to split out than has_fireplace or septic_tank. All buyers filter on the first two: only a few buyers filter on fireplaces or have a phobia about septic_tanks. So, the explicit columns would include num_bedrooms TINYINT UNSIGNED and price INT UNSIGNED; the others would be buried in the JSON.

FULLTEXT to the rescue


For Boolean attributes, it may be very efficient and practical to use a single TEXT column with a FULLTEXT index. Throw into that column a list of keywords (eg, "fireplace, septic_tank"). Generally do not put any numbers there. Nor short words or words with too many different spellings.

When building the query, leave out the MATCH(...) AGAINST(...) clause if you have not included any keywords. Include that clause only if there are mandatory items that are likely to be in the list.

These admonitions are aimed at using the FULLTEXT index when appropriate and not letting it get in the way of performance when it is not appropriate.

When the WHERE includes the MATCH...AGAINST, that part will be performed first. The other conditions won't use indexes.

Keep in mind that OR usually defeats any index optimizations.

Conclusions


    ⚈  Schema is reasonably compact (compression, real datatypes, less redundancy, etc, than EAV)
    ⚈  Queries are fast (since you have picked 'good' indexes)
    ⚈  Expandable (JSON is happy to have new fields)
    ⚈  Compatible (No 3rd party products, just supported products)
    ⚈  Range tests work (unlike storing INTs in VARCHARs)
    ⚈  (Drawback) Cannot use the non-indexed attributes in WHERE or ORDER BY clauses, must deal with that in the app. (5.7 partially alleviates this.)

Postlog


Posted Jan, 2014; Refreshed Feb, 2016; FULLTEXT May, 2020.

MariaDB's Dynamic Columns
MySQL 5.7's JSON

This looks very promising; I will need to do more research to see how much of this blog is obviated by it: Using MySQL as a Document Store in 5.7
more DocStore discussion
EAV Fail
Document store & multi-valued indexes of 8.0.17
Car database example
Real Estate example
Drawbacks of EAV
Complex Filtering
Indexing a JSON field is clumsy
An example

If you insist on EAV, set optimizer_search_depth=1.
-- Rick James

MySQL Documents by Rick James

HowTo Techniques for Optimizing Tough Tasks:

Partition Maintenance (DROP+REORG) for time series (includes list of PARTITION uses)

Big DELETEs - how to optimize -- and other chunking advice, plus a use for PARTITIONing
    Chunking lengthy DELETE/UPDATE/etc.

Data Warehouse techniques:
    Data Warehouse Overview   Summary Tables   High speed ingestion   Bulk Normalization  

Schema and code design for large Sensor database

Entity-Attribute-Value (EAV) -- a common, poorly performing, design pattern; plus an alternative

5 methods for 'Find Nearest'

Lat/Lng search to Find the nearest 10 pizza parlors
    Lat/Long representation choices

Z-Order 'find nearest'

Pagination, not with OFFSET, LIMIT

Techniques on efficiently finding a random row (On beyond ORDER BY RAND())

GUID/UUID Performance (type 1 only)

IP Range Table Performance -- or other disjoint ranges

Rollup Unique User Counts

Alter of a Huge table -- Mostly obviated by 5.6

Efficient List of Latest 10 news articles

Build and execute a Pivot SELECT (showing rows as columns)

(Groupwise Max): Efficiently find largest row(s) for each group

Other Tips, Tuning, Debugging, Optimizations, etc...

Rick's RoTs (Rules of Thumb -- lots of tips)

Datatypes and building a good schema

Memory Allocation (caching, etc)

Character Set and Collation problem solver
    Trouble with UTF-8   If you want case folding, but accent sensitivity, please file a request at https://bugs.mysql.com .
    Python tips,   PHP tips,   other language tips
    utf8 Collations   utf8mb4 Collations on 8.0

Converting from MyISAM to InnoDB -- includes differences between them

Compound INDEXes plus other insights into the mysteries of INDEXing

Cookbook for Creating Indexes
    Many-to-many mapping table   Handler counts   wp_postmeta   UNION+OFFSET

MySQL Limits -- built-in hard limits
    767-byte INDEX limit

Galera, tips on converting to (Percona XtraDB Cluster, MariaDB 10, or manually installed)

5.7's Query Rewrite -- perhaps 5.7's best perf gain, at least for this forum's users

Analyze MySQL Performance
    Analyze VARIABLEs and GLOBAL STATUS     Analyze SlowLog

My slides from conferences
MiniFest 2021 - Rick James & Daniel Black - Answering on Stack Overflow(+comments) - MariaDB Frontlines
Percona Live 4/2017 - Rick's RoTs (Rules of Thumb) - MySQL/MariaDB
Percona Live 4/2017 - Index Cookbook - MySQL/MariaDB
Percona Live 9/2015 - PARTITIONing - MySQL/MariaDB

Contact me via LinkedIn; be sure to include a brief teaser in the Invite request:   View Rick James's profile on LinkedIn

Did my articles help you out? Like what you see? Consider donating:

☕️ Buy me a Banana Latte and bagel ($10) There is no obligation but it would put a utf8mb4 smiley 🙂 on my face, instead of the Mojibake "🙂"