picture by Kanban Tool via Flickr
I’ve been practicing agile database techniques for about twenty years now. My use of agile techniques didn’t start as an explicit plan. Rather it evolved over time as I was working on consulting projects. It made sense to look for ways of working faster and better and with greater customer interaction.
I can think of at least three kinds of agile database techniques.
- For data modeling.
- For data warehouse development.
- For database reverse engineering.
Agile Data Modeling
Many persons think of data modeling as a laborious task. But that need not be the case. You can take data modeling out of the back room and do it in front of a customer. You might think that customers would be put off by the technical gore. But they are not. Business staff have a passion for their work and a vision of where they want to go. They are thrilled to have IT as a facilitator for their vision rather than a roadblock.
For agile data modeling, a developer needs to do the following.
- Be fluent with data modeling concepts and a data modeling tool.
- Have a “friendly” situation the first few times. You need to try your techniques on a nurturing audience until you gain confidence.
- Have at least one business champion in the audience. You need someone who understands the business deeply and will help drive others to consensus. You need someone to receive the output of your work.
- Have a solid project with a clear goal and business justification.
With agile data modeling, developers work quickly and deliver frequent model revisions. I prefer to have data modeling sessions no more than two hours long. The sessions are intense with everyone batting around ideas. After two hours everyone needs a rest. I also need time to clean up the model and catch up with documentation.
Agile techniques apply to all facets of data modeling, including day-to-day operational systems, data warehouses, master data models, and enterprise data models.
Agile Data Warehouse Development
Agile techniques are not only useful for modeling, but are also important for building a warehouse. There are two areas where we’ve been applying agile techniques to warehouse development.
The first is data staging. It’s a good practice is to buffer the warehouse proper from the operational sources with intermediary staging tables. Staging tables duplicate operational data and add metadata such as source system, processing date, effective timestamp, and expiration timestamp. Currently, in practice, most developers process staging data with custom ETL code. But we are aware of several tools (A2B Data, Attunity Replicate, FiveTran) that can automate the creation of staging table schema and their population.
We are also using agile techniques to improve warehouse data processing. We favor a mix of ETL and SQL. (ETL is an acronym for Extract, Transform, and Load.) We use ETL for core tables with complex logic. We use SQL for peripheral tables, and there’s many of them in a typical warehouse. It’s roughly 10 times faster to write SQL than ETL. The SQL code is also easier to evolve and easier to test. The amount of ETL that you can replace depends on warehouse complexity and your SQL skill. It’s a reasonable expectation to replace ETL with SQL for half the tables.
Both of these improvements reduce the quantity of ETL code to write and narrow the focus to where ETL is most valuable — for large data sets with multiple sources to integrate and process.
Agile in Database Reverse Engineering
We also use agile techniques during reverse engineering. Reverse engineering is the inverse to normal development – reverse engineers take an existing design and work backwards to infer the underlying logical intent. Database reverse engineering can have various motivations. You may be converting from one database platform to another. Or you may be trying to deepen understanding for maintenance. Or you may be trying to salvage requirements implicit in a schema for consideration by new software.
We usually start by analyzing the existing schema. A schema is often available and it is something that we can study quickly. We augment our study of schema with other available resources, such as documentation, data, and developer conversations.
A data modeling tool is an important resource for studying the schema. Most tools can import SQL schema so that you can see the tables and how they connect together. Often that is sufficient for reverse engineering analysis.
Agile development is a popular topic that has several database implications. An agile approach is helpful for data modeling, data warehouse development, and database reverse engineering.