Django-migrations and its predecessor South have been around for ages, so it’s not usually necessary to test database migrations. But it just so happens that we’re introducing a dangerous type of migration—that is, one that introduces a new integrity constraint on our data. When I first ran the migration script against staging, I saw an error.
On larger projects, where you have sensitive data, you may want the additional confidence that comes from testing your migrations in a safe environment before applying them to production data, so this toy example will hopefully be a useful rehearsal.
Another common reason to want to test migrations is for speed—migrations often involve downtime, and sometimes, when they’re applied to very large datasets, they can take time. It’s good to know in advance how long that might be.
Here’s what happened to me when I first tried to deploy our new validation constraints in [chapter_18_second_deploy]:
$ cd deploy_tools $ fab deploy:[email protected] [...] Running migrations: Applying lists.0005_list_item_unique_together...Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/django/db/backends/utils.py", line 61, in execute return self.cursor.execute(sql, params) File "/usr/local/lib/python3.7/dist-packages/django/db/backends/sqlite3/base.py", line 475, in execute return Database.Cursor.execute(self, query, params) sqlite3.IntegrityError: columns list_id, text are not unique [...]
What happened was that some of the existing data in the database violated the integrity constraint, so the database was complaining when I tried to apply it.
In order to deal with this sort of problem, we’ll need to build a "data migration". Let’s first set up a local environment to test against.
We’ll use a copy of the live database to test our migration against.
Warning
|
Be very, very, very careful when using real data for testing. For example, you may have real customer email addresses in there, and you don’t want to accidentally send them a bunch of test emails. Ask me how I know this. |
Start a list with some duplicate items on your live site, as shown in A list with duplicate items.
Copy the database down from live:
$ scp [email protected]:\ /home/elspeth/sites/superlists.ottg.co.uk/database/db.sqlite3 . $ mv ../database/db.sqlite3 ../database/db.sqlite3.bak $ mv db.sqlite3 ../database/db.sqlite3
We now have a local database that has not been migrated, and that contains
some problematic data. We should see an error if we try to run migrate
:
$ python manage.py migrate --migrate python manage.py migrate Operations to perform: [...] Running migrations: [...] Applying lists.0005_list_item_unique_together...Traceback (most recent call last): [...] return Database.Cursor.execute(self, query, params) sqlite3.IntegrityError: columns list_id, text are not unique
Data migrations are a special type of migration that modifies data in the database rather than changing the schema. We need to create one that will run before we apply the integrity constraint, to preventively remove any duplicates. Here’s how we can do that:
$ git rm lists/migrations/0005_list_item_unique_together.py $ python manage.py makemigrations lists --empty Migrations for 'lists': 0005_auto_20140414_2325.py: $ mv lists/migrations/0005_*.py lists/migrations/0005_remove_duplicates.py
Check out the Django docs on data migrations for more info, but here’s how we add some instructions to change existing data:
# encoding: utf8
from django.db import models, migrations
def find_dupes(apps, schema_editor):
List = apps.get_model("lists", "List")
for list_ in List.objects.all():
items = list_.item_set.all()
texts = set()
for ix, item in enumerate(items):
if item.text in texts:
item.text = '{} ({})'.format(item.text, ix)
item.save()
texts.add(item.text)
class Migration(migrations.Migration):
dependencies = [
('lists', '0004_item_list'),
]
operations = [
migrations.RunPython(find_dupes),
]
We re-create the old migration using makemigrations
, which will ensure it
is now the sixth migration and has an explicit dependency on 0005
, the
data migration:
$ python manage.py makemigrations Migrations for 'lists': 0006_auto_20140415_0018.py: - Alter unique_together for item (1 constraints) $ mv lists/migrations/0006_* lists/migrations/0006_unique_together.py
We’re now ready to run our test against the live data:
$ cd deploy_tools $ fab deploy:[email protected] [...]
We’ll need to restart the live Gunicorn job too:
elspeth@server:$ sudo systemctl restart gunicorn-superlists.ottg.co.uk
And we can now run our FTs against staging:
$ STAGING_SERVER=staging.ottg.co.uk python manage.py test functional_tests [...] .... --------------------------------------------------------------------- Ran 4 tests in 17.308s OK
Everything seems in order! Let’s do it against live:
$ fab deploy:host=superlists.ottg.co.uk [superlists.ottg.co.uk] Executing task 'deploy' [...]
And that’s a wrap. git add lists/migrations
, git commit
, and so on.
This exercise was primarily aimed at building a data migration and testing it against some real data. Inevitably, this is only a drop in the ocean of the possible testing you could do for a migration. You could imagine building automated tests to check that all your data was preserved, comparing the database contents before and after. You could write individual unit tests for the helper functions in a data migration. You could spend more time measuring the time taken for migrations, and experiment with ways to speed it up by, for example, breaking up migrations into more or fewer component steps.
Remember that this should be a relatively rare case. In my experience, I haven’t felt the need to test 99% of the migrations I’ve worked on. But, should you ever feel the need on your project, I hope you’ve found a few pointers here to get started with.
- Be wary of migrations which introduce constraints
-
99% of migrations happen without a hitch, but be wary of any situations, like this one, where you are introducing a new constraint on columns that already exist.
- Test migrations for speed
-
Once you have a larger project, you should think about testing how long your migrations are going to take. Database migrations typically involve downtime, as, depending on your database, the schema update operation may lock the table it’s working on until it completes. It’s a good idea to use your staging site to find out how long a migration will take.
- Be extremely careful if using a dump of production data
-
In order to do so, you’ll want fill your staging site’s database with an amount of data that’s commensurate to the size of your production data. Explaining how to do that is outside of the scope of this book, but I will say this: if you’re tempted to just take a dump of your production database and load it into staging, be 'very' careful. Production data contains real customer details, and I’ve personally been responsible for accidentally sending out a few hundred incorrect invoices after an automated process on my staging server started processing the copied production data I’d just loaded into it. Not a fun afternoon.