picture by londonbonsaipurple via Flickr
Here is a follow-on to our blog from several months ago on “Ten Reasons Why Developers Ignore Data Models”. Paraphrasing, reader Lawrence Hecht asked via Twitter if there is any way to tell from a schema if the developers had used a data model.
A trite answer is to look at the database schema and see if there is a tool legend. For example ER/Studio includes a preamble to their SQL generation. If a tool generated the SQL schema then there must have been a data model for the tool to operate upon.
* ER/Studio Data Architect 11.0 SQL Code Generation
* Project : standardDataModelExample.dm1
* Date Created : Thursday, October 15, 2015 09:46:51
* Target DBMS : Microsoft SQL Server 2012
More seriously, we can partially answer this question through database reverse engineering. Reverse engineering is the inverse to normal development – reverse engineers take an existing design and work backwards to infer the underlying logical intent. There are many reasons for reverse engineering of databases including salvage of past data, recovery of useful ideas, and support for software maintenance.
We start with an existing schema and supplement it with any other information sources. If we have access to developers, we ask them about their rationale. There may be a populated database available for query. Such queries can compensate for missing declarations in the schema, such as missing primary and foreign keys. Generally you can’t prove suppositions with data, but you can conclude that they are highly likely. There may also be documentation, such as a data dictionary, or existing software with which to experiment.
We have observed that some schema are quite regular while others are erratic in their design style. For example, if data types are consistent (such as for address fields) the schema is more likely to be created by a tool with data types assigned via domains.
Irregular schema are more likely to be human constructed. Human constructed schema may or may not have the substrate of a data model. Tools such as ERwin and ER/Studio support dictionaries for uniform abbreviation. Consistent abbreviations indicate the use of a model and tool. With ad-hoc abbreviations a model and tool are less likely. Any kind of systematic behavior raises the likelihood of design via a model and tool.
Generated names are often a clear indication of a tool and not a person. For example, ERwin generates index names like XIF1AccountCustomer, XIF2AccountCustomer, and XIF3AccountCustomer. ER/Studio generates index names like Ref14, Ref910, and Ref811. In addition, ER/Studio creates boilerplate comments and code to check the system catalog to determine if index creation was successful.
It’s not uncommon for a schema to have odd design paradigms. For example, we’ve seen many-to-many relationships implemented with parallel fields. If there’s a data model, there’s likely to be many-to-many relationships and few tools would generate such a strange design. In some schema foreign keys refer to candidate keys that are not primary keys. Data models lack mechanisms for such references to arbitrary candidate keys so tools would not emit such code.
Another indicator is the overall quality of a database design. A high-quality database design indicates a skilled developer. A skilled developer is more likely to use a model and tool than a developer with less ability.
As you can see, it’s often not absolutely determined that a schema came from a model, but a schema often has suggestive hints. Regularities in a schema suggest the use of a database design tool. And the use of a database design tool implies that there was an underlying data model.
Keep in mind that the schema may have originally been built with a tool and a model, but then manually patched over time. This is common. On occasion, we have seen schema where you can see the different design styles of several developers.