picture by Rob Blank via Flickr
That’s what we often tell our clients. If you build a database properly, performance will be fast for 80% of applications. The remaining 20% are outside the scope of this article – they have complex issues such as large schema, massive data, multi-user contention, and intricate logic.
Here are ten guidelines that developers can use to obtain good database performance. The guidelines apply to the day-to-day operational systems that run on a relational database.
- Prepare a data model. Building software is a costly and time-consuming endeavor. So why would you start without having a plan of attack? That is the purpose of a data model. A data model helps you think deeply about a database, choose proper abstractions, and decide how to structure the data before starting construction.
- Use the data model to create a database design. The data model is your plan of attack. The database design is the realization. You can carry forward the concepts, structure, and names of the data model to the database tables.
- Define a primary key for each table. A primary key consists of one or more columns that taken together uniquely identify each record in a table. Primary keys are important in that they give you the means to quickly access and cross reference individual records.
- Index every foreign key. A foreign key consists of one or more columns that reference a primary key. For example, the foreign key of table B references the primary key of table A. An index is a tree data structure for finding individual records that has a large fan out. It’s similar to the way that a phone book organizes by last name, then first name. Most database queries have foreign key to primary key traversals. If every table has a primary key and every foreign key has an index then these traversals will be fast. In contrast, a missing foreign key index can slow performance by orders of magnitude.
- Try to avoid denormalization. Denormalization combines tables so that their combined records can be accessed more quickly. Denormalization adds complexity to a database and reduces its flexibility. Most applications don’t require denormalization for good performance.
- Write “clean” SQL. Put join logic in the FROM clause and not in the WHERE clause. You can construct simple SQL queries as follows. First figure out the series of tables to traverse (the FROM clause). Then determine the desired fields (the SELECT clause). Then add the conditions that are needed to restrict the data (the WHERE clause). Clean SQL is easier to understand and therefore more likely to yield fast performance.
- Limit use of outer joins. An equijoin combines a record in one table with exactly one record in another table. An outer join combines a record in one with table with at most one record in another – for example, a person may have a primary place of business. Equijoins often occur via foreign key to primary key traversals and will be efficient if you follow our previous advice. The database algorithms for outer joins are more complex and slower than those for equijoins. Sometimes you can revise a query to forego outer joins.
- Iterate in SQL and not in programming code. We have seen developers use table scans and combine data with programming code. Where possible it is much better to use database joins and combine data in SQL. The difference in speed can be orders of magnitude.
- Try to forego nested queries. A nested query has one or more SQL statements nested within another. Sometimes such complexity is needed to express logic. More often you can rewrite a SQL query so that it is flat, consisting of single SELECT, FROM, and WHERE keywords. SQL optimizers try to convert nested queries to flat queries for better performance, but optimizers do make mistakes. If you can forego nested logic, you can avoid a possible surprise.
- Try to keep transaction scope small. In particular, don’t start a transaction and wait for user interaction. Small transactions allow greater concurrency and less delay from multi-user contention.
The previous advice will help you write database applications that run fast. Instead of worrying about performance, you can devote your attention to more important issues such as quality, extensibility, and speed of development.