How should we model entity subtypes in a relational DBMS? In other words - how do we model inheritance in a relational database system? This is a question I've come across many times, but I never felt like I had the perfect answer.

Below are the most common solutions I've encountered.

Separate tables

We simply create a new table for each subtype and duplicate all the shared attributes:

CREATE TABLE t1 (
  id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  f1 FLOAT NOT NULL
);

CREATE TABLE t2 (
  id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  f2 FLOAT NOT NULL
);

Advantages: Easy to implement on the database side. There is no need for a subtype column since the actual table names says which subtype it is.

Disadvantages: In order to get a result-set containing rows from all the subtypes, we need to have a SELECT for each table and a UNION ALL between each SELECT. This may not scale well if there are a lot of subtypes.

Single, sparse table

The idea with this solution is that similar entity types can cohabitate in a single, multipurpose table. We're denormalising by placing all the subtype-specific attributes into NULL-able columns in a single table. We'll probably want a special type column to indicate which subtype each row is. If we need to enforce mandatory attributes (NOT NULL) in the subtypes, we can use CHECK CONSTRAINTs based on the type:

CREATE TABLE t (
  id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  subtype enum('type1', 'type2') NOT NULL,
  f1 FLOAT,
  f2 FLOAT,
  CONSTRAINT float_check CHECK
    ((subtype='type1' AND f1 IS NOT NULL) OR
      (subtype='type2' AND f2 IS NOT NULL))
);

Advantages: Good performance and little complexity.

Disadvantages: Every row will have some unused columns (NULL values).

Parent-with-child tables

This solution consists of a parent table for the shared attributes with a 1-to-1 relationship with child tables for the subtype-specific attributes. As such, the child tables' primary key columns will also be used to in their foreign keys to reference the parent.

However, there is a little bit more to this, and it's very clever¹: In the parent table we also have a type_id and a seemingly superfluous unique constraint on (id, type_id). The child tables also have a type_id, and its foreign key references (id, type_id) in the parent. Assuming there are multiple child tables, these special measures are needed to enforce a true 1-to-1 relationship between a row in the parent table and a single row in only one of the child tables.

CREATE TABLE parent (
  id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  type_id INT UNSIGNED NOT NULL,
  CONSTRAINT parent_unq UNIQUE KEY(id, type_id)
);

CREATE TABLE child1 (
  id INT UNSIGNED PRIMARY KEY,
  type_id INT UNSIGNED GENERATED ALWAYS AS (1) VIRTUAL,
  f1 FLOAT NOT NULL,
  CONSTRAINT child1_fk_id
    FOREIGN KEY (id, type_id)
      REFERENCES parent(id, type_id)
);

Advantages: This is a correct solution from a normalisation perspective. It's now possible to have attributes for multiple subtypes for the same entity. This may be useful in some use-cases.

Disadvantages: In order to get a result-set containing rows from all the subtypes, we need to add an extra INNER JOIN per child table to our SELECT query. This may not scale well if we get a lot of different subtypes.

ORDBMS type/table inheritance

In object-relational DBMSes you can use type or table inheritance to reuse all the attributes between subtypes, and then add extra subtype-specific attributes:

CREATE TABLE t (
  id SERIAL PRIMARY KEY
);

CREATE TABLE t1 (
  f1 FLOAT
) INHERITS (t);

Advantages: This solution is probably closer to how most programmers think about inheritance, and there is no "visible" wasted space like we see in the single, sparse table solution.

Disadvantages: In order to get a result-set containing rows from all the subtypes, we need to use JOIN or UNION between each child table. This may not scale well if there are a lot of subtypes.

Single table with JSON

The idea here is to have a single table, use normal columns for the shared attributes and then add a JSON column for all the subtype-specific attributes.

CREATE TABLE t(
  id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  subtype enum('type1', 'type2') NOT NULL,
  attributes JSON NOT NULL
);

Advantages: JSON is a flexible format and modern RDBMSes have lots of built-in JSON functions. New attributes can be added without modifying the database schema.

Disadvantages: The attributes no longer have data types, and querying JSON attributes is slower than querying normal columns, although there are ways of adding indexes to specific attributes. JSON is also not as efficient in terms of storage as are regular columns.

Conclusion

Clearly there are more than one way to do this, and unfortunately, which solution is optimal for you will depend on your particular use-case. JSON is very flexible and allows developers or even application end-users to create new subtypes when/as needed. Table inheritance may be an option if you have an object-relational DBMS. Still, I think the single, sparse table is my personal favourite because - in my opinion - it's very performant and its only disadvantage is really that it looks ugly. I hope to follow-up with a more in-depth look at this solution in a later article.

References

I learnt about this in Implementing Table Inheritance in SQL Server ↩︎

Inheritance in ERDB Modelling

Table Of Contents