Decoupling Schema Database Migrations From Code Deployments

5th February, 2018 by

There are many frameworks out there which come with ORMs (object relational maps) which allow you to perform database migrations as part of your deployment. A  schema database migration is a process of updating a database to a new schema along with any data manipulations required. The idea is simple enough: you write some code requiring a new table in the database and package it with a migration. A migration is code executed by a migration tool that tells the database what changes to make – in this case, creating a new table.

The database will usually contain a versions table, and the migration tool will increase the desired version when you run a migration. When you deploy your code, the database version is checked against the desired version, and any new migrations are applied. This way, migrations fail fast. The app won’t start with an inconsistent database so you never end up with silent errors accumulating over time, causing data loss and instability. Some data-mappers like Doctrine can even generate the migrations automatically from code annotations.

Why Migration Tools are Useful

There are some real benefits to using a migration tool. Database changes are codified so other developers can review changes before they’re made to the production database; it’s easy to miss an index or specify the wrong engine when designing table structure. If developers are working with their own databases, migrations can decrease the overhead for testing. The reviewer simply checks out the branch from source control and runs the migration. Since migrations are checked into source control they can be tied to a feature and the business value. In a few months’ time, you can quickly see why a field was added and what code/feature depends on it. Finally, the database is always updated when new code is deployed, so there’s no need to worry before deployment that a database change was perhaps missed.

There are also some downsides to using a migration tool: updating the database is tightly coupled to the deployment. I have to deploy my codebase in order to run the migration, which can have some adverse effects. Since my new code relies on the new database structure, the schema migration must finish successfully in order for a successful deployment. Let’s dive deeper into some of the problems you can run into when pairing schema migrations with deployments.

The Downsides of Schema Migration Tools

Imagine I’ve been storing all of the passwords for my users in plain text in the database. Fixing it has been on my to-do list, but I’ve been too busy shipping other features that are visible to my users. The time has now come to fix this security issue, so I write my migration to generate a random, cryptographically secure salt for each password, hash each one with bcrypt and write the result back to the database. I’ve tested on my local environment and everything looks great. But when I deploy my new code to production, the migration hangs….oops.

While each encryption only takes 1ms, with 1,000,000 users this will take over 16 minutes to complete. 16 minutes of downtime for our users. Most frameworks insist the database versions match the desired version, and even if that’s not the case, the code expects the passwords to be hashed now. Most users will be unable to log in during this time, and those who are already logged in are in trouble too – all that math is hammering my web server!

OK, all your passwords are already cryptographically secure and you’re following industry best practices, so this is probably a bad example. But sometimes migrations can be problematic in the real world too. Magento recently released version 2.2 of their ecommerce platform, and they moved from storing serialized arrays in the database to representing them as JSON to help protect against remote code execution attacks. As part of the migration, they take all the existing serialized arrays and convert them to JSON – which could be a long process since logs were stored as serialized arrays. As in the above example, you’ll run into issues with downtime, but if you proactively clean up logs it’s a quite quick operation. In theory.

However, in our case, a recent serialized array was written with a length of 260 characters, and one of the tables has a serialized field with a data limit of only 255 characters. When the update script tries to unserialize the data, it finds that it’s invalid and throws an unhandled exception. The theoretically quick migration task dies. You’re left with an inconsistent database, code that refuses to run because the database needs to be updated, and a migration script that fails. And what’s worse is that the development team won’t have a good understanding of the issue because they didn’t write any of the code: it’s all framework. Definitely not a quick fix!

Small inconsistencies in data can lead to big issues with deployments when coupled with database migration.

Fixing the Issues

So, how do we fix these issues while still keeping the benefits of migrations? The answer: by decoupling database schema migrations from code deployment.

Using a combination of non-destructive schema changes and update triggers, it’s possible to keep a database in such a state that it works with both the current version of a program and the next version which requires a change to the database. By using an isolated tool or command to perform database migrations, you can migrate a database ahead of a feature roll-out, effectively mitigating any downtime induced during database update, and removing any risk of downtime caused by a failed migration. (Load on the database server will still be increased, however since the application code can work with both the old and the new versions of the database, the application will continue to run.) The migration code still lives inside your project’s codebase so you keep all the wins during testing and code reviews.

Migrating a database in a separate step makes sense for simple changes like adding a new column (add the new column, deploy the code that uses it) or removing a column (deploy code that no longer uses the column, delete the column), but it becomes more complicated for changes that are inherently tied to a version of code, like renaming a column. In this case, the migration creates a new column, adds an update trigger to the original column, and copies the content across. The database will now keep the two columns in sync automatically until the new code is deployed. At this point, the old column and update trigger can be removed. The book Refactoring Databases provides a comprehensive list of patterns for making incremental database changes like these.

However, these changes can come at a cost:

  • Added complexity with more changes: two (or more) sets of updates need to happen in order to keep the database deduplicated
  • Multiple versions to keep track of: Code is able to run against a set of database versions which is more complicated to track than just one.

For long release cycles, the migrations can become difficult to manage. Maintaining many updates, triggers, and cleanup steps is more complex, and you may have to change your code versioning conventions. It’s much easier if you’re practicing continuous deployment and making small, surgical changes with each deployment.

How Decoupling Helped LiveJournal Grow

We’ve known about the problems with big bang database migrations for some time. In the book Coders at Work, Brad Fitzpatrick—author of Memcached and creator of LiveJournal—describes a scenario in 1995 where his company ran a database migration for over a month in the background. As the user base continued to increase rapidly, the engineering team realized that the current database structure wouldn’t continue to scale. They knew they needed to make changes, but restructuring could be a massive ordeal, carrying with it risk of downtime and uncertainty.

Cue decoupling the large scale database migration from ongoing code deployments. Slowly migrating user tables from one main database into an array of new databases allowed LiveJournal to scale horizontally. By using a metadata table and migrating one user at a time, they were able to mitigate the estimated week of downtime that a big-bang migration would have caused and quickly reduce the load on their main database server.

Decoupling helped LiveJournal avoid a large scale downtime issue, which would have been devastating for their user base who relied on LiveJournal’s stability.

Determining the best migration strategy for your project

With this in mind, the migration tool that comes with your framework might be the best option for you. If you are developing a smaller project where downtime is acceptable or the application will be maintained but not developed, then decoupled deploys might be overkill for you. However, if you’re developing a mission-critical, large application that is constantly evolving then decoupling your migrations might save you many sleepless nights. If you are looking to run multiple application servers and update with rolling deploys or blue-green deployments then you will need to allow multiple versions of code to run against multiple database versions, so bite the bullet and build this into your deployment strategy from day one.

To get started in your project, you could try one of the many stand-alone tools to handle this for you, such as FlyAway or LiquiBase. People have also extracted the Ruby on Rails migration code to be a stand-alone tool and it’s possible the same has been done with your framework of choice if you are already using migrations in your project.