picture by strange little woman on stream via Flickr
This is the third and final blog in a series about database design errors. The two previous blogs addressed primary key and foreign key errors as well as confusion with many-to-many relationships. Now let’s discuss several other design errors. Our coverage is clearly not complete, but these are common mistakes that are found in practice.
Database normalization is important in that it helps to avoid redundancy. For example, the columns of a table should be fully dependent on the primary key. Otherwise there can be duplicate copies of data that are inconsistent.
Redundancy can also arise from business semantics. For example, a database with full transaction data as well as account balances has redundancy. An account balance can be computed by summing the pertinent transactions. An application may include both because it takes too long to reprocess all the transactions. But developers should recognize the redundancy and allow for it in their code.
For another example, consider an order comprised of order items. There are charges for orders such as taxes and shipping. There can also be charges for each individual order item. If an order record includes total cost, it must be kept consistent despite changes to items, item deletions, and item additions.
As a general rule, developers should use redundant data sparingly. Redundancy adds bulk to an application. It complicates development as it can be difficult to keep computed data consistent with base data. Redundancy also increases the odds of errors. Often it’s better to just compute summary data as needed on the fly.
A relational database is declarative. That means that it stores data as well as the description of data. When developers build a database, they should follow the same spirit.
This means that tables and columns should have meaningful names. They shouldn’t have anonymous names such as table1 and table2. A table should not store two different kinds of data in the same attribute – rather each kind of data should have an attribute of its own. Naming flaws can arise as an application evolves and is maintained. But developers should at least start with good names and minimize naming flaws.
Undefined encodings are another pitfall. Many databases represent enumerations as codes that programs must decipher. This is contrary to the spirit of relational databases. The value ‘001’ may have meaning to a business expert. But ‘small’ will have meaning to a wider audience. It’s acceptable to place abbreviations in the database as long as expanded, meaningful strings are stored there also.
Also be wary of binary data. Database blobs can be helpful for storing audio clips and images. But blobs should not be used to hide database details. If developers disguise data as binary data it can only be accessed by applications. You lose the power of SQL in querying the data. You risk losing the data’s meaning. Rather expand the data and include its structure in the database.
A database is not a flat file. All, but the smallest of applications, require more than just a few tables. Many problems arise when different kinds of data are forced together into the same table. The database becomes more complex. Data quality becomes more difficult to enforce. And development becomes a mess.
A database is not a spreadsheet. The data in a database should be described and structured. Here’s a table found during reverse engineering. A spreadsheet is a useful tool, but it has nothing to do with designing a database.