Mayan ruins

Database Archaeology

picture by Abby Sasser via Flickr

Few database projects start with a clean slate. Many operational applications have legacy databases as a source of data and ideas. Analytical applications have the operational databases that are feeding data. Developers often encounter existing databases that are poorly documented and need to figure them out. We use the term database archaeology to refer to the study of database artifacts.

Continue reading Database Archaeology

cover for Agile Data Warehouse Design

Agile Data Warehouse Design video course

My new video course — Agile Data Warehouse Design — is now on the market.

A data warehouse is a database dedicated to decision support and business analysis. The inputs to a data warehouse are data from the day-to-day operational systems. A data warehouse integrates the input data and restructures it so that it is amenable to data mining.

With an agile approach developers build a data warehouse rapidly to get it into the hands of business users so they can give feedback. Where possible, SQL code substitutes for programming and ETL scripts.

The course is organized about a threaded case study for the retail industry.

A framework using the metaphor of bricks

Be Wary of Generic Database Layers

picture by Alejandro via Flickr

Over the years we’ve seen a number of projects where application architects use a generic layer to hide a database. This is a common approach with object-oriented languages accessing a relational database. Application code accesses the layer which in turn accesses the database. The use of a generic layer can be a valuable technique, but it is overused. Some architects seem to be unaware that there are other possibilities.

Continue reading Be Wary of Generic Database Layers

An arrow as a metaphor for referential integrity

Do Use Referential Integrity

picture by Rudolf Vicek via Flickr

Generally the most pressing problems for software development concern quality, time to market, and cost. If you define referential integrity (RI) in your software you can improve all three of these items. RI improves quality by ensuring that data references truly exist and cannot be dangling. RI also reduces development time and lowers cost as it takes much less effort to define RI than to program the equivalent application code.

Continue reading Do Use Referential Integrity

A physical filing cabinet

SQL Is More than Just Store and Retrieve

picture by shinichi via Flickr

I often work with programmers to build applications. It never ceases to amaze me that many programmers look at SQL as a store and retrieve mechanism and see nothing more. Their approach is to write programming code for all their logic. It’s not just a matter of programmers being unable to write complex SQL. It’s also a matter of them not imagining what is possible.

Continue reading SQL Is More than Just Store and Retrieve

Please Index Foreign Keys


picture by Travis via Flickr

In the database literature, the lore has been to first design a schema and then predict the query mix to decide on a tuning strategy. This approach can work, but it is risky and error prone. A small misjudgment of the query mix can lead to vastly inferior performance. For many applications a better strategy is to index every foreign key. That is often sufficient and further tuning is unnecessary.

Continue reading Please Index Foreign Keys

Six Data Modeling Patterns for Trees


picture by Thor via Flickr

Here’s a chapter from my book Patterns of Data Modeling. This chapter presents six data modeling patterns for trees — UML data models, IE data models, SQL queries, and examples. The book has much more information about patterns including patterns for directed graphs, patterns for undirected graphs, antipatterns, and archetypes.