Chinese words for dogs as an example of a language.

When To Use the UML for Databases

picture by lili chin via Flickr

The UML is a popular notation for modeling software. Even though the UML was mostly developed for programming, it is also relevant for databases. This article takes a critical look at using the UML for databases.

What is the UML?

The UML (an acronym for Unified Modeling Language) is a general-purpose software notation. The UML is a standard and has been sponsored by the Object Modeling Group.

The UML was created 20 years ago as an outgrowth of interest in object-oriented technology. Nominally, the UML was intended to address all software development efforts. But politicking steered the standard towards programming. Nevertheless, despite the programming bias, the UML is still quite helpful for database purposes.

At the time, there were at least two motivations for creating the UML. The first was to resolve the Tower of Babel of competing notations. Several popular notations were in use, each with their strengths and weaknesses. Many of the notation differences were arbitrary and detrimental to communication. The UML sought to unify and supplant these competing notations and largely succeeded at it.

A second, less noble, reason for the UML’s creation was to establish marketing buzz for vendors so that they could sell more products and services. It’s always helpful to have something new.

The UML has a variety of diagrams, one of which (the class model) pertains to databases. The class model specifies classes and their relationships. The UML class model is essentially a dialect of the Entity-Relationship approach that was introduced many years previously. The UML class model adds a few helpful features and some twists of notation.

When to Use the UML for Databases

The UML offers real benefits for developing database applications.

We often use the UML when gathering requirements for operational applications. Most business staff are unfamiliar with software notations. They relate better to the UML than a conventional database notation such as IE (Information Engineering). We run data modeling sessions using the UML. We show the business staff the evolving model as we solicit their input. We ask for their help in understanding the application. We tell them that we are the database experts and that they need not waste their time with our job. The UML puts the focus on capturing requirements and lets us defer database details.

We also find the UML to be helpful when working with complex and abstract models. The UML is more concise than conventional notations because it omits database details. A concise notation is conducive to deep thinking. For example, we consider the Common Warehouse Metamodel (CWM) to be an excellent model and it is expressed using the UML. (See the book by John Poole et al.) As another example, CA has documented the ERwin metamodel with the UML.

When Not to Use the UML

Even though we favor the UML, we try to be sensible with its use.

The UML is clearly lacking for database design. Some UML tools have database capabilities, but they do not have the design power of a true database tool such as ERwin or ER Studio. A conventional notation, such as IE, shows the details of database design which is helpful for generating code and supporting production maintenance.

We also forego the UML notation when working with data warehouses. Data warehouses have a simple structure (the bus architecture using star schema). This simplicity of data warehouse models contrasts with the complex graph of tables for many operational applications. For data warehouse applications, there is little benefit to using the UML. A conventional database notation suffices for both modeling and database design.

Sometimes, we choose a notation based on tool features. We chose ERwin for a recent enterprise data modeling project because of its polished reports and ability to import/export metadata with other tools.

In Conclusion

In practice, we often use the UML together with a database notation such as IE. The UML is a good language for the business and IE is a good language for IT. The use of two notations provides a clear demarcation between the role of the business and the role of IT.

Leave a Reply

Your email address will not be published. Required fields are marked *