Inheritance in entity-relationship modelling

How should we model entity subtypes in a relational DBMS? In other words – how do we model inheritance in a relational database system? This is a question I’ve come across many times, but I never felt like I had the perfect answer.

These are some of the solutions I’ve encountered:

  1. Separate tables
  2. Single, sparse table
  3. Parent-with-child tables
  4. Parent with attribute-value child table
  5. ORDBMS type/table inheritance
  6. Single table with JSON
  7. Single table with dynamic columns (MariaDB)

Separate tables

We simply create a new table for each subtype and duplicate all the shared attributes:

CREATE TABLE t1 (
  id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  f1 FLOAT NOT NULL
);

CREATE TABLE t2 (
  id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  f2 FLOAT NOT NULL
);

Advantages: Easy to implement on the database side. There is no need for a subtype column since the actual table names says which subtype it is.

Disadvantages: In order to get a result-set containing rows from all the subtypes, we need to add a LEFT JOIN for each table. This may not scale well if there are a lot of subtypes.

Single, sparse table

The idea with this solution is that similar entity types can cohabitate in a single, multipurpose table. We’re denormalising by placing all the subtype-specific attributes into NULL-able columns in a single table. We’ll probably want a special type column to indicate which subtype each row is. If we need to enforce mandatory attributes (NOT NULL) in the subtypes, we can use CHECK CONSTRAINTs based on the type:

CREATE TABLE t (
  id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, 
  subtype enum('type1', 'type2') NOT NULL, 
  f1 FLOAT, 
  f2 FLOAT, 
  CONSTRAINT float_check CHECK 
    ((subtype='type1' AND f1 IS NOT NULL) OR 
      (subtype='type2' AND f2 IS NOT NULL))
);

Advantages: Good performance and little complexity.

Disadvantages: Every row will have some unused columns (NULL values), and enforcing mandatory attributes may be challenging if there are a lot of them.

Parent-with-children tables

This solution consists of a parent table for the shared attributes and child tables for the subtype-specific attributes. The child tables’ foreign key columns will also be their primary key columns.

CREATE TABLE parent (
  id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY
);

CREATE TABLE child1 (
  id INT UNSIGNED PRIMARY KEY,
  f1 FLOAT NOT NULL, 
  CONSTRAINT child1_fk_id 
    FOREIGN KEY (id) 
      REFERENCES parent(id) 
);

Advantages: This is a correct solution from a normalisation perspective. It’s now possible to have attributes for multiple subtypes for the same entity. This may be useful in some use-cases.

Disadvantages: Conversely, there is no way to enforce that there should be one and only one child per parent table, which may be what we want in many use-cases. Also, we need to add an extra LEFT JOIN per child table to our SELECT queries if we want to include all the subtypes. This may not scale well if we get a lot of different subtypes.

Parent with attribute-value child table

This is a variant of the parent-with-child tables solution above. We have a parent table for the shared attributes and a child table for the subtype-specific attributes. The child table’s foreign key column can be part ofits primary key. The subtype-specific attributes will have to be stored as strings.

CREATE TABLE parent (
  id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY
);

CREATE TABLE child (
  id INT UNSIGNED,
  attribute VARCHAR(20) NOT NULL,
  val VARCHAR(100) NOT NULL,
  PRIMARY KEY(id, attribute),
  CONSTRAINT child_fk_id 
    FOREIGN KEY (id) 
      REFERENCES parent(id) 
);

Advantages: New attributes can be added without modifying the database schema, and an entity can be stored with any number of attributes specified. We can omit attributes that are not specified.

Disadvantages: We lose the attribute datatype because the values are stored as strings. We need to decide the length of the strings (attribute name and the value) at design-time, though, unless we use TEXT. If there are a lot of attributes per entity subtype, then the child table can grow quite large.

ORDBMS type/table inheritance

In object-relational DBMSes you can use type or table inheritance to reuse all the attributes between subtypes, and then add extra subtype-specific attributes:

CREATE TABLE t (
  id SERIAL PRIMARY KEY
);

CREATE TABLE t1 (
  f1 FLOAT
) INHERITS (t);

Advantages: This solution is probably closer to how most programmers think about inheritance, and there is no “visible” wasted space like we see in the single, sparse table solution.

Disadvantages: For queries that reference the subtype-specific columns, we’ll have to add LEFT JOINs, just like in the parent-with-children tables solution, which may not scale well if there are a lot of subtypes.

Single table with JSON

The idea here is to have a single table, use normal columns for the shared attributes and then add a JSON column for all the subtype-specific attributes.

CREATE TABLE t(
  id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  subtype enum('type1', 'type2') NOT NULL, 
  attributes JSON NOT NULL
);

Advantages: JSON is a flexible format and modern RDBMSes have lots of built-in JSON functions. New attributes can be added without modifying the database schema.

Disadvantages: The attributes no longer have data types, and querying JSON attributes is slower than querying normal columns, although there are ways of adding indexes to specific attributes. JSON is also not as efficient in terms of storage as are regular columns.

Single table with dynamic columns (MariaDB)

Similar to the single table with JSON, if we use MariaDB we can have a single table with dynamic columns stored in a BLOB column:

CREATE TABLE t(
  id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  subtype enum('type1', 'type2') NOT NULL, 
  attributes BLOB NOT NULL
);

INSERT INTO t (subtype, attributes) 
  VALUES ('type1', 
    COLUMN_CREATE('wavelength', 553.4 AS DOUBLE, 
      'resolution', 2.4 AS DOUBLE, 
      'spots', 137 AS INTEGER
    )
  );  

Advantages: New attributes can be added without modifying the database schema, and yet we can also specify the datatype.

Disadvantages: Querying dynamic columns is slower than querying normal columns, although there are ways of adding indexes to specific attributes. Dynamic columns are not as efficient in terms of storage as are regular columns. We need to add a BLOB to the table which comes with some issues, although if we know ahead of time that only a small number of attributes are needed per entity, then we can get away with a VARCHAR. It’s also worth noting that dynamic columns are not well supported in third-party libraries.

Conclusion

Clearly there are more than one way to do this, and which solution is optimal for you will depend on your particular use-case. JSON is very flexible and allows developers or even application end-users to create new subtypes when/as needed. Table inheritance may be an option if you have an object-relational DBMS. Still, the single, sparse table is somewhat of a personal favourite of mine, at least for MariaDB and related DB systems. It’s very performant and easy to implement, although enforcing mandatory attributes can be challenging. (However, I suspect the main reason it’s not more widely used is because it’s aesthetically displeasing!) But I hope to follow-up with a more in-depth look at this solution in a later article.

Updates: 20. Sept. 2020: Two extra solutions were added.

2 Replies to “Inheritance in entity-relationship modelling”

  1. Hi Karl. Great summary!

    You didn’t mention the EAV pattern, probably because it’s not widely used nowadays – and anyway, it’s very bad for performance and scalability.
    But it’s interesting to mention that it can be used in a hybrid way, like Magento used to do in the past: it had an EAV schema where new data were written, and then data was periodically moved to “flat tables”, which were built to match your products attributes.
    However, Magento does not recommend to use flat tables anymore because this came with serious performance problems during the reindexing operation. Instead of implementing a better update process (something similar to what pt-osc does), they simply deprecated the feature and pretended that EAV is fine.
    But something even better could be done. Magento stored actual data into EAV tables, but if you think about it, there is probably no good reason for that. You could use an EAV model for metadata, where each row represents a table’s attribute. That is not supposed to change often (if ever), but just to be sure, this EAV could be versioned. Flat tables could be built on an EAV version, and data could be written exclusively into the flat tables.

    Federico

    1. Hi Federico! EAV – good point, I should have mentioned that. Perhaps it could be sufficient here to use a variant of parent-with-child tables where the child table has just the “AV” columns since we already know the “E”? So basically key-value pairs – and both key and value could be stored as strings. That is a solution I’ve used before for a particular use-case.

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax