picture by Rudolf Vicek via Flickr
Generally the most pressing problems for software development concern quality, time to market, and cost. If you define referential integrity (RI) in your software you can improve all three of these items. RI improves quality by ensuring that data references truly exist and cannot be dangling. RI also reduces development time and lowers cost as it takes much less effort to define RI than to program the equivalent application code.
What is Referential Integrity?
A foreign key is a reference to a unique identifier that is usually the primary key. Foreign-to-primary-key bindings couple the tables of a relational database. For example, a database can have Customer and Order tables. A Customer can have many Orders; an Order is for one Customer. The Order table has a foreign key to the primary key of Customer.
Referential integrity (RI) is a constraint that ensures that every foreign key refers to an actual target. RI guarantees that the values referenced really exist and that foreign keys are not dangling.
Referential integrity is easy to specify. Database design software can generate referential integrity code automatically. For example, ERwin has explicit referential integrity options when generating schema.
Referential Integrity Actions
Referential integrity can not only keep a foreign key reference and primary key referent in sync, but can specify what to do if a database action would disrupt them. Referential integrity actions can apply both to deletions and updates. We favor using surrogate keys (database generated unique numbers) as primary keys. Surrogate keys never change, so there is a need for RI with deletions, but no need for RI with updates.
SQL has the following referential integrity actions.
- With a mandatory (not null) foreign key, the deletion of a referent record may imply the deletion of all referencing records. For example, the deletion of an Order could cause the deletion of the associated Order Items.
- No action. You may forbid the deletion of a referent record if there are referencing records. For example, we might forbid the deletion of an Address that is referenced by an open Order.
- Set null. With an optional (null allowed) foreign key, the deletion of a referent record could cause references to be set to null. For example, the firing of an employee might cause the account rep for an Order to be set to null.
- Set default. Upon deletion, you may set a foreign key to a default value instead of null. For example, alternatively the firing of an employee might cause the account rep to be set to a supervisor.
The performance overhead of referential integrity is usually minimal. When I tune database applications, I usually look to improve performance by at least an order of magnitude. The impact of referential integrity enforcement pales by comparison and is typically on the order of 10-20%.
Referential Integrity and Data Warehouses
Referential integrity can also be helpful for data warehouses. Warehouse data is loaded by special system code (ETL code). Such code is tightly controlled, but errors can still occur. Recently I worked on a data warehouse project where the database platform supported referential integrity definitions but lacked enforcement. We wrote our own code to periodically check foreign keys to ensure a lack of errors.
We used the following SQL code to find dangling foreign keys.
WHERE NOT EXISTS (
WHERE foreignKey = primaryKey );
As this code illustrates, every foreign key must be subsumed by a primary key.
Defective quality is often the most troublesome aspect of developing software. Bugs are difficult to find, costly to fix, and damaging to a software supplier’s reputation. Bugs in programming code are bad enough, but data bugs can be even worse. A database accumulates activity and it can be difficult to track a data error back to its source. Referential integrity guarantees that data references are intact, thereby increasing software quality.
If you don’t use database referential integrity, you will have to write additional application code (or take your chances with data errors). Referential integrity gives you the opportunity to write less code and deliver software more rapidly.