Database versioning

Why

While using a code versioning process and pushing the code to the production environment without FTP-ing it is a first step in getting the things right, the immediate thing that you will probably think at, is how to version the database too. This is what we did. And we wanted to be part of the deployment process and also integrated with the development environments.

Before jumping in, to develop an in-house tool, I did some (Google)research.  What I’ve found was a few proprietary solutions available for MS SQL server, but I haven’t seen any functional tool/solution/script for the open source counterparts like MySQL or PostgreSQL.

How

Some terms first

Like with every process, we need some terms when referring to different parts or actions of this process. So, here they are:

  • baseline – represents the database schema including database objects like: triggers, views, etc; the file is named “xx.yy.zz.sql”
  • change script – represents a file that contains one ore more SQL commands; the file is named “xx.yy.zz.sql”
  • test data – represents a minimal test data that should work with the latest baseline and change scripts available;  the file is named “data.sql”
  • z_db_versioning – represents the table that will be created in each versioned database and will hold the current version of the database

Notes about revisions:

  • xx- represents the major no. of database
  • yy – the minor
  • zz – the revision point

Filesystem layout

Each project should accommodate, on its repository, the DB versioning directory structure. Each of this files are stored on SVN(or other source control system),  just like the usual code does.

  • the DB versioning related files go under “db” directory(this name is/should be configurable per project)
  • the baselines are stored under “/db/baselines/”
  • the change scripts under “/db/change_scripts/”
  • and the test data here “/db/test_data/”

Operations on localhost

Localhost is here referred as the developer machine.

Every structural change that is about to be applied to the database must be included in one ore more change scripts. It is recommended that a change script to contain a single SQL instruction, rather than a batch of SQL instructions.

It is the developer responsibility to commit “compatible” change script to SVN, in the way that the versioning system it is not concerned whether the change script is valid or not. When a certain change script ends up with an error, the system will simply skip it and it cannot be ever executed.

Operations on DEV

Here is the place where most of the versioning actions take place. Everytime the versioning system is being executed, some particular actions are taking place.

  1. the database is emptied(dropped, then re-created)
  2. the baseline is applied
  3. the change scripts are then applied
  4. finally, the test data is loaded

Because we can have multiple developers working on the same code base, on different development branches, and because all of them will need to test their code on the DEV machine, we can end-up with some incompatible database changes from one branch to another. That is why we chose to completely drop and re-create the database in the DB versioning process.

The versioning system will search and apply all change scripts that are higher than the latest baseline available on that branch. So, it will ignore the version number found in the z_db_versioning table. In fact this is the only environment where the z_db_versioning is being ignored.

Operations on STG

Here only the change scripts with the version number higher than DB version are being executed.  A change script can be applied only once no matter if it successfully got executed or ended up with a SQL error. While is very less likely to have change scripts running in errors, they are not applied again because they might be incompatible with the rest of the change scripts that were executed in the same batch.

Another key aspect of the STG server is that we are using it to generate the baselines whenever the developer/DBA thinks it is necessary. There is a good practice to always keep you database data in sync with the PROD environment to be able to see how the application behaves with the real data set.

Operations on PROD

Exactly like with the STG environment, only the change scripts with the version number higher than DB version are being executed. Like above, a change script can be applied only once.

What is not

Before closing this up, there are some things to mention about what DB versioning system is not or what it cannot do:

  • it is not database backup
  • it does not version data, but only the database schema and objects(triggers, views, rules, etc)
  • it cannot revert to a previous DB version; so it is incremental only

Leave a Reply

Your email address will not be published. Required fields are marked *