How should we model entity subtypes in a relational DBMS? In other words – how do we model inheritance in a relational database system? This is a question I’ve come across many times, but I never felt like I had the perfect answer.
These are some of the solutions I’ve encountered:
- Separate tables
- Single, sparse table
- Parent-with-child tables
- Parent with attribute-value child table
- ORDBMS type/table inheritance
- Single table with JSON
- Single table with dynamic columns (MariaDB)
We simply create a new table for each subtype and duplicate all the shared attributes:
CREATE TABLE t1 ( id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, f1 FLOAT NOT NULL ); CREATE TABLE t2 ( id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, f2 FLOAT NOT NULL );
Advantages: Easy to implement on the database side. There is no need for a
subtype column since the actual table names says which subtype it is.
Disadvantages: In order to get a result-set containing rows from all the subtypes, we need to add a
LEFT JOIN for each table. This may not scale well if there are a lot of subtypes.
Single, sparse table
The idea with this solution is that similar entity types can cohabitate in a single, multipurpose table. We’re denormalising by placing all the subtype-specific attributes into NULL-able columns in a single table. We’ll probably want a special
type column to indicate which subtype each row is. If we need to enforce mandatory attributes (
NOT NULL) in the subtypes, we can use
CHECK CONSTRAINTs based on the
CREATE TABLE t ( id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, subtype enum('type1', 'type2') NOT NULL, f1 FLOAT, f2 FLOAT, CONSTRAINT float_check CHECK ((subtype='type1' AND f1 IS NOT NULL) OR (subtype='type2' AND f2 IS NOT NULL)) );
Advantages: Good performance and little complexity.
Disadvantages: Every row will have some unused columns (
NULL values), and enforcing mandatory attributes may be challenging if there are a lot of them.
This solution consists of a parent table for the shared attributes and child tables for the subtype-specific attributes. The child tables’ foreign key columns will also be their primary key columns.
CREATE TABLE parent ( id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY ); CREATE TABLE child1 ( id INT UNSIGNED PRIMARY KEY, f1 FLOAT NOT NULL, CONSTRAINT child1_fk_id FOREIGN KEY (id) REFERENCES parent(id) );
Advantages: This is a correct solution from a normalisation perspective. It’s now possible to have attributes for multiple subtypes for the same entity. This may be useful in some use-cases.
Disadvantages: Conversely, there is no way to enforce that there should be one and only one child per parent table, which may be what we want in many use-cases. Also, we need to add an extra
LEFT JOIN per child table to our
SELECT queries if we want to include all the subtypes. This may not scale well if we get a lot of different subtypes.
Parent with attribute-value child table
This is a variant of the parent-with-child tables solution above. We have a parent table for the shared attributes and a child table for the subtype-specific attributes. The child table’s foreign key column can be part ofits primary key. The subtype-specific attributes will have to be stored as strings.
CREATE TABLE parent ( id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY ); CREATE TABLE child ( id INT UNSIGNED, attribute VARCHAR(20) NOT NULL, val VARCHAR(100) NOT NULL, PRIMARY KEY(id, attribute), CONSTRAINT child_fk_id FOREIGN KEY (id) REFERENCES parent(id) );
Advantages: New attributes can be added without modifying the database schema, and an entity can be stored with any number of attributes specified. We can omit attributes that are not specified.
Disadvantages: We lose the attribute datatype because the values are stored as strings. We need to decide the length of the strings (attribute name and the value) at design-time, though, unless we use
TEXT. If there are a lot of attributes per entity subtype, then the child table can grow quite large.
ORDBMS type/table inheritance
In object-relational DBMSes you can use type or table inheritance to reuse all the attributes between subtypes, and then add extra subtype-specific attributes:
CREATE TABLE t ( id SERIAL PRIMARY KEY ); CREATE TABLE t1 ( f1 FLOAT ) INHERITS (t);
Advantages: This solution is probably closer to how most programmers think about inheritance, and there is no “visible” wasted space like we see in the single, sparse table solution.
Disadvantages: For queries that reference the subtype-specific columns, we’ll have to add
LEFT JOINs, just like in the parent-with-children tables solution, which may not scale well if there are a lot of subtypes.
Single table with JSON
The idea here is to have a single table, use normal columns for the shared attributes and then add a
JSON column for all the subtype-specific attributes.
CREATE TABLE t( id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, subtype enum('type1', 'type2') NOT NULL, attributes JSON NOT NULL );
Advantages: JSON is a flexible format and modern RDBMSes have lots of built-in JSON functions. New attributes can be added without modifying the database schema.
Disadvantages: The attributes no longer have data types, and querying JSON attributes is slower than querying normal columns, although there are ways of adding indexes to specific attributes.
JSON is also not as efficient in terms of storage as are regular columns.
Single table with dynamic columns (MariaDB)
Similar to the single table with JSON, if we use MariaDB we can have a single table with dynamic columns stored in a BLOB column:
CREATE TABLE t( id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, subtype enum('type1', 'type2') NOT NULL, attributes BLOB NOT NULL ); INSERT INTO t (subtype, attributes) VALUES ('type1', COLUMN_CREATE('wavelength', 553.4 AS DOUBLE, 'resolution', 2.4 AS DOUBLE, 'spots', 137 AS INTEGER ) );
Advantages: New attributes can be added without modifying the database schema, and yet we can also specify the datatype.
Disadvantages: Querying dynamic columns is slower than querying normal columns, although there are ways of adding indexes to specific attributes. Dynamic columns are not as efficient in terms of storage as are regular columns. We need to add a
BLOB to the table which comes with some issues, although if we know ahead of time that only a small number of attributes are needed per entity, then we can get away with a
VARCHAR. It’s also worth noting that dynamic columns are not well supported in third-party libraries.
Clearly there are more than one way to do this, and which solution is optimal for you will depend on your particular use-case. JSON is very flexible and allows developers or even application end-users to create new subtypes when/as needed. Table inheritance may be an option if you have an object-relational DBMS. Still, the single, sparse table is somewhat of a personal favourite of mine, at least for MariaDB and related DB systems. It’s very performant and easy to implement, although enforcing mandatory attributes can be challenging. (However, I suspect the main reason it’s not more widely used is because it’s aesthetically displeasing!) But I hope to follow-up with a more in-depth look at this solution in a later article.
Updates: 20. Sept. 2020: Two extra solutions were added.