How should we model entity subtypes in a relational DBMS? In other words - how do we model inheritance in a relational database system? This is a question I've come across many times, but I never felt like I had the perfect answer.
Below are the most common solutions I've encountered.
Table Of Contents
We simply create a new table for each subtype and duplicate all the shared attributes:
CREATE TABLE t1 ( id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, f1 FLOAT NOT NULL ); CREATE TABLE t2 ( id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, f2 FLOAT NOT NULL );
Advantages: Easy to implement on the database side. There is no need for a
subtype column since the actual table names says which subtype it is.
Disadvantages: In order to get a result-set containing rows from all the subtypes, we need to add a LEFT JOIN for each table. This may not scale well if there are a lot of subtypes.
Single, sparse table
The idea with this solution is that similar entity types can cohabitate in a single, multipurpose table. We're denormalising by placing all the subtype-specific attributes into NULL-able columns in a single table. We'll probably want a special
type column to indicate which subtype each row is. If we need to enforce mandatory attributes (NOT NULL) in the subtypes, we can use CHECK CONSTRAINTs based on the
CREATE TABLE t ( id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, subtype enum('type1', 'type2') NOT NULL, f1 FLOAT, f2 FLOAT, CONSTRAINT float_check CHECK ((subtype='type1' AND f1 IS NOT NULL) OR (subtype='type2' AND f2 IS NOT NULL)) );
Advantages: Good performance and little complexity.
Disadvantages: Every row will have some unused columns (
CREATE TABLE parent ( id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY ); CREATE TABLE child1 ( id INT UNSIGNED PRIMARY KEY, f1 FLOAT NOT NULL, CONSTRAINT child1_fk_id FOREIGN KEY (id) REFERENCES parent(id) );
Advantages: This is a correct solution from a normalisation perspective. It's now possible to have attributes for multiple subtypes for the same entity. This may be useful in some use-cases.
Disadvantages: Conversely, there is no way to enforce that there should be one and only one child per parent row, which may be what we want in many use-cases. Also, we need to add an extra
LEFT JOIN per child table to our
SELECT queries when displaying the subtypes together. This may not scale well if we get a lot of different subtypes.
ORDBMS type/table inheritance
In object-relational DBMSes you can use type or table inheritance to reuse all the attributes between subtypes, and then add extra subtype-specific attributes:
CREATE TABLE t ( id SERIAL PRIMARY KEY ); CREATE TABLE t1 ( f1 FLOAT ) INHERITS (t);
Advantages: This solution is probably closer to how most programmers think about inheritance, and there is no "visible" wasted space like we see in the single, sparse table solution.
Disadvantages: For queries that reference the subtype-specific columns, we'll have to add
LEFT JOINs, just like in the parent-with-children tables solution, which may not scale well if there are a lot of subtypes.
Single table with JSON
The idea here is to have a single table, use normal columns for the shared attributes and then add a JSON column for all the subtype-specific attributes.
CREATE TABLE t( id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, subtype enum('type1', 'type2') NOT NULL, attributes JSON NOT NULL );
Advantages: JSON is a flexible format and modern RDBMSes have lots of built-in JSON functions. New attributes can be added without modifying the database schema.
Disadvantages: The attributes no longer have data types, and querying JSON attributes is slower than querying normal columns, although there are ways of adding indexes to specific attributes. JSON is also not as efficient in terms of storage as are regular columns.
Clearly there are more than one way to do this, and which solution is optimal for you will depend on your particular use-case. JSON is very flexible and allows developers or even application end-users to create new subtypes when/as needed. Table inheritance may be an option if you have an object-relational DBMS. Still, I think the single, sparse table is my personal favourite because - in my opinion - it's very performant and its only disadvantage is really that it looks ugly. I hope to follow-up with a more in-depth look at this solution in a later article.