How should we model entity subtypes in a relational DBMS? In other words - how do we model inheritance in a relational database system? This is a question I've come across many times, but I never felt like I had the perfect answer.
Below are the most common solutions I've encountered.
Table Of Contents
Separate tables
We simply create a new table for each subtype and duplicate all the shared attributes:
CREATE TABLE t1 (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
f1 FLOAT NOT NULL
);
CREATE TABLE t2 (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
f2 FLOAT NOT NULL
);
Advantages: Easy to implement on the database side. There is no need for a subtype
column since the actual table names says which subtype it is.
Disadvantages: In order to get a result-set containing rows from all the subtypes, we need to have a SELECT
for each table and a UNION ALL
between each SELECT
. This may not scale well if there are a lot of subtypes.
Single, sparse table
The idea with this solution is that similar entity types can cohabitate in a single, multipurpose table. We're denormalising by placing all the subtype-specific attributes into NULL-able columns in a single table. We'll probably want a special type
column to indicate which subtype each row is. If we need to enforce mandatory attributes (NOT NULL) in the subtypes, we can use CHECK CONSTRAINTs based on the type
:
CREATE TABLE t (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
subtype enum('type1', 'type2') NOT NULL,
f1 FLOAT,
f2 FLOAT,
CONSTRAINT float_check CHECK
((subtype='type1' AND f1 IS NOT NULL) OR
(subtype='type2' AND f2 IS NOT NULL))
);
Advantages: Good performance and little complexity.
Disadvantages: Every row will have some unused columns (NULL
values).
Parent-with-child tables
This solution consists of a parent table for the shared attributes with a 1-to-1 relationship with child tables for the subtype-specific attributes. As such, the child tables' primary key columns will also be used to in their foreign keys to reference the parent.
However, there is a little bit more to this, and it's very clever1: In the parent table we also have a type_id
and a seemingly superfluous unique constraint on (id
, type_id
). The child tables also have a type_id
, and its foreign key references (id
, type_id
) in the parent. Assuming there are multiple child tables, these special measures are needed to enforce a true 1-to-1 relationship between a row in the parent table and a single row in only one of the child tables.
CREATE TABLE parent (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
type_id INT UNSIGNED NOT NULL,
CONSTRAINT parent_unq UNIQUE KEY(id, type_id)
);
CREATE TABLE child1 (
id INT UNSIGNED PRIMARY KEY,
type_id INT UNSIGNED GENERATED ALWAYS AS (1) VIRTUAL,
f1 FLOAT NOT NULL,
CONSTRAINT child1_fk_id
FOREIGN KEY (id, type_id)
REFERENCES parent(id, type_id)
);
Advantages: This is a correct solution from a normalisation perspective. It's now possible to have attributes for multiple subtypes for the same entity. This may be useful in some use-cases.
Disadvantages: In order to get a result-set containing rows from all the subtypes, we need to add an extra INNER JOIN
per child table to our SELECT
query. This may not scale well if we get a lot of different subtypes.
ORDBMS type/table inheritance
In object-relational DBMSes you can use type or table inheritance to reuse all the attributes between subtypes, and then add extra subtype-specific attributes:
CREATE TABLE t (
id SERIAL PRIMARY KEY
);
CREATE TABLE t1 (
f1 FLOAT
) INHERITS (t);
Advantages: This solution is probably closer to how most programmers think about inheritance, and there is no "visible" wasted space like we see in the single, sparse table solution.
Disadvantages: In order to get a result-set containing rows from all the subtypes, we need to use JOIN
or UNION
between each child table. This may not scale well if there are a lot of subtypes.
Single table with JSON
The idea here is to have a single table, use normal columns for the shared attributes and then add a JSON column for all the subtype-specific attributes.
CREATE TABLE t(
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
subtype enum('type1', 'type2') NOT NULL,
attributes JSON NOT NULL
);
Advantages: JSON is a flexible format and modern RDBMSes have lots of built-in JSON functions. New attributes can be added without modifying the database schema.
Disadvantages: The attributes no longer have data types, and querying JSON attributes is slower than querying normal columns, although there are ways of adding indexes to specific attributes. JSON is also not as efficient in terms of storage as are regular columns.
Conclusion
Clearly there are more than one way to do this, and unfortunately, which solution is optimal for you will depend on your particular use-case. JSON is very flexible and allows developers or even application end-users to create new subtypes when/as needed. Table inheritance may be an option if you have an object-relational DBMS. Still, I think the single, sparse table is my personal favourite because - in my opinion - it's very performant and its only disadvantage is really that it looks ugly. I hope to follow-up with a more in-depth look at this solution in a later article.
References
-
I learnt about this in Implementing Table Inheritance in SQL Server ↩︎