Refactoring Databases

Even if you had perfect database design from the start, it is likely that requirements changed, and you have to change database schema accordingly. As there is legacy code, there can be legacy data, with schema designed for use cases which are no longer actual.

It is usually harder to fix data design than to fix code design, but it is doable. And as with code, you can apply refactoring techniques: improving the design without change in behavior.

A book “Refactoring Databases: Evolutionary Database Design” describes basics of database refactoring and lists different kinds of changes to relational databases. It starts from explanation why refactoring is a good thing, why it is important to make small incremental changes, and which organizational obstacles you can get on the way of implementing these techniques. You can imagine how hard it can be to make database changes if DBA and application developers are in different teams, and any database change requires coordination between these teams, sometimes using strict change management process. Basically, agile methodologies with their focus on cross-functional teams and short development cycles can help here.

When you start making changes, you notice that other programming practices can be useful for database development: version control, separate development, integration, staging and production environments and automated deployment.

And when all the tools and processes are in place, refactoring is easy. The book lists huge number of different refactorings: Add Column, Split Column, Introduce Common Format, MoveColumn, Drop Default, etc. There are detailed descriptions of the reasons to do each of them, steps to make the change, how it works in transition period, how to update application code and how to modify data. But most of them fit the same algorithm:

  1. Add new columns/tables/views/procedures.
  2. Add triggers to keep old columns/tables in sync with new ones.
  3. Modify/copy data to the new columns/tables.
  4. Modify application code to use a new schema.
  5. Drop old columns/tables/views/procedures after transition period is over.

When you remember this general algorithm, you can guess what to do in each case.

I would recommend reading this book if you are not convinced that evolutionary database development and database design refactoring is something useful, or if you are not familiar with agile development practices. Otherwise, this book will be not much helpful.

It is more useful to read this article, which describes common problems of working with legacy data and provides some guidance: http://www.agiledata.org/essays/legacyDatabases.html. And you can refer to the catalog of database refactorings for detailed descriptions of different schema changes: http://www.agiledata.org/essays/databaseRefactoringCatalog.html