picture by Abby Sasser via Flickr
Few database projects start with a clean slate. Many operational applications have legacy databases as a source of data and ideas. Analytical applications have the operational databases that are feeding data. Developers often encounter existing databases that are poorly documented and need to figure them out. We use the term database archaeology to refer to the study of database artifacts.
Human Archaeology and Database Archeology
Webster’s dictionary defines archaeology as “a science that deals with past human life and activities by studying the bones, tools, etc., of ancient people.” We see an analogy between the study of human artifacts and database artifacts. They were both created in the past. They are both relics that contain information about what happened before. They are both difficult to study but with effort can reveal insights.
Database Archaeology Is Also Called Database Reverse Engineering
Database archaeology has another name, database reverse engineering. Reverse engineering is the inverse to normal development – reverse engineers take an existing design and work backwards to infer the underlying logical intent. Database reverse engineering is the practice of reverse engineering for databases.
Database Reverse Engineering Motivations
Database reverse engineering can have various motivations.
- Recovery of data. Applications come and go. Data often survives. A new application may need to carry forward old data.
- Recovery of ideas. Databases are built with a model that is either explicitly constructed or implicit in the developers’ minds. Some past ideas have business value, are not easy to rediscover, and should be reused.
- Integration. Applications have more value when used together than as standalone software. Reverse engineering yields understanding which is a precondition for integration.
- Vendor assessment. Many vendor applications revolve about a database. The quality of the database is an indicator of the quality of the software. You can use reverse engineering to uncover the underlying model and assess its quality.
Some years ago we were testing a new vendor product. Our client was impressed with the marketing claims and wanted help.
We had problems from the start. Our systems administrator had a difficult time installing the software. We couldn’t tell if there was a problem with the software or with our systems administrator. We also had a programmer working with us who had trouble with the software. It was all quite perplexing. The vendor had an excellent grasp of business needs and on paper met many of the requirements.
So we decided to look under the covers at the database. We did not expect to find anything other than to deepen our understanding of the software. When we looked at the database everything suddenly became clear. The vendor did understand the business requirements but committed gross database errors.
Here’s an ERwin diagram showing a routine implementation of a many-to-many relationship as a junction table.
Here’s how the vendor implemented a many-to-many relationship. There were 10 parallel foreign keys (of which we show 4).
And they repeated this idiom over and over again throughout the database. It was suddenly quite clear why the software was hard to install, buggy to operate, and slow to run. We could imagine the mess of programming code. For example, to collect the many related entities – retrieve the first FK union the results for the second FK union the result for the third FK and so forth.
There’s a sequel to our story. A few months after our evaluation we got a chance to talk to the vendor’s developers. We
asked them about their flawed implementation of relationships. We discovered that they were programmers who hadn’t used a database before. They didn’t realize that they had made such a blunder.
Legacy applications can have an archaic architecture, slow performance, and inflexibility. However, just as with human archaeology, database artifacts can have value. Database relics can give insight to activities in the past. Sometimes there is knowledge to recover.