Stop losing data when writing Django migrations !
François Farge7 min read
Saying that database structure is important is sort of an obvious statement. That is of course if you decided to use a structured database technology. But in that case, you want your database structure to be the closest to your functional and business needs, and the tightest in order to be able to rely on that structure.
However, when I was working on Django projects, I sometimes found it painful to change the data structure when I needed to. However, I discovered a very useful tool offered by Django migrations : the RunPython
command.
Simple migration
Let’s take a simple example and assume that we have a single model called Order
inside an orders
application.
We would then have the following models.py
file:
from django.db import models
class Order(models.Model):
reference = models.CharField(max_length=8)
amount = models.DecimalField(max_digits=8, decimal_places=2)
creation_date = models.DateField(auto_now=False, auto_now_add=False)
due_date = models.DateField(auto_now=False, auto_now_add=False)
customer_name = models.CharField(max_length=50)
customer_address = models.CharField(max_length=50)
customer_city = models.CharField(max_length=50)
customer_zip_code = models.CharField(max_length=50)
def __str__(self):
self.reference
Running python manage.py makemigrations
will produce the following migrations file, called 0001_initial.py
:
from django.db import migrations, models
class Migration(migrations.Migration):
initial = True
dependencies = []
operations = [
migrations.CreateModel(
name="Order",
fields=[
(
"id",
models.AutoField(
auto_created=True,
primary_key=True,
serialize=False,
verbose_name="ID",
),
),
("reference", models.CharField(max_length=8)),
("amount", models.DecimalField(decimal_places=2, max_digits=8)),
("creation_date", models.DateField()),
("due_date", models.DateField()),
("customer_name", models.CharField(max_length=50)),
("customer_address", models.CharField(max_length=50)),
("customer_city", models.CharField(max_length=50)),
("customer_zip_code", models.CharField(max_length=50)),
],
),
]
Let’s break down what’s happening in this migrations file. The migration operation is represented as a python class, with 3 attributes:
initial
states that this migration is the first of its Django app;dependencies
list all the migrations that need to be applied before this one can be applied. Here it is empty since the migration is the first of its Django app;operations
is the most important part. It is a list of actions that Django will apply to the database. Here the database was empty, so we created the model from scratch with theCreateModel
method.
The python manage.py makemigrations
performs well when the operations to perform are simple enough. Renaming a model, or renaming a field will be treated as equivalent database operations for example.
After applying the migration, let’s create a few entities in the database.
Complex structure migration
Let’s start again from our previous example and observe something that you may already have noticed. The 4 fields customer_name
, customer_address
, customer_city
, and customer_zip_code
are functionally attached to the Order
entity, while they represent another physical entity, a customer. And that is fine as long as there is no duplication. But now let’s imagine that for business purposes, you need to abstract a Customer
entity to which the Order
entities are linked.
Your models.py
file would then be:
from django.db import models
class Customer(models.Model):
name = models.CharField(max_length=50)
address = models.CharField(max_length=50)
city = models.CharField(max_length=50)
zip_code = models.CharField(max_length=50)
def __str__(self):
return self.name
class Order(models.Model):
reference = models.CharField(max_length=8)
amount = models.DecimalField(max_digits=8, decimal_places=2)
creation_date = models.DateField(auto_now=False, auto_now_add=False)
due_date = models.DateField(auto_now=False, auto_now_add=False)
customer = models.ForeignKey(Customer, on_delete=models.PROTECT)
def __str__(self):
return self.reference
This configuration now fits our functional needs. Let’s try and generate the migration !
> You are trying to add a non-nullable field 'customer' to order without a default; we can't do that (the database needs something to populate existing rows).
Please select a fix:
1) Provide a one-off default now (will be set on all existing rows with a null value for this column)
2) Quit, and let me add a default in models.py
There is a problem however. Django recognizes that we are trying to create a customer
field on the Order
model, which can’t be None
but is not defined. So it wants to add a non-null value to all the existing entries in the database, and asks us if we want to provide it now or want to set it in the Order
model.
However, none of these options satisfy us. We don’t want our orders to have customers that are either None
or a default customer, which would lose all the existing data.
Granted, we could also execute this migration, then manually set the customer
attribute of all the Order
model, but then this forces you to execute this same action on all of your environments. That would also make the migration quite difficult to rollback.
Fortunately, Django comes with an built-in solution to deal with this limitation.
Generating a custom migration
First of, let’s start by generating an empty migration that we will then edit.
> python manage.py makemigrations orders --empty
The generated migration file will look like this
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [
('orders', '0001_initial'),
]
operations = [
]
We then need to build our custom migration. It needs to have 6 steps :
- Create the new
Customer
model in the database, with all the necessary fields. - Create a nullable
customer
foreign key onOrder
- Set the
Order
fields to transfer as nullable - Transfer the data from the
Order
model to theCustomer
model. - Set the new
customer
field onOrder
as non-nullable - Remove the old fields from the
Order
model.
The Django documentation explains all the database operations needed.
Let’s write this migration :
from django.db import migrations, models
import django.db.models.deletion
class Migration(migrations.Migration):
dependencies = [
("orders", "0001_initial"),
]
operations = [
# step 1: add the new Customer model
migrations.CreateModel(
name="Customer",
fields=[
(
"id",
models.AutoField(
auto_created=True,
primary_key=True,
serialize=False,
verbose_name="ID",
),
),
("name", models.CharField(max_length=50)),
("address", models.CharField(max_length=50)),
("city", models.CharField(max_length=50)),
("zip_code", models.CharField(max_length=50)),
],
),
# step 2: add the nullable foreign key field `customer` to Order
migrations.AddField(
model_name="order",
name="customer",
field=models.ForeignKey(
null=True,
on_delete=django.db.models.deletion.PROTECT,
to="orders.Customer",
),
),
# step 3: set the order fields as nullable
migrations.AlterField(
model_name="order",
name="customer_address",
field=models.CharField(null=True, max_length=50),
),
migrations.AlterField(
model_name="order",
name="customer_city",
field=models.CharField(null=True, max_length=50),
),
migrations.AlterField(
model_name="order",
name="customer_name",
field=models.CharField(null=True, max_length=50),
),
migrations.AlterField(
model_name="order",
name="customer_zip_code",
field=models.CharField(null=True, max_length=50),
),
# step 4: transfer data from Order to Customer
...
# step 5: set the `customer` field as non-nullable
migrations.AlterField(
model_name="order",
name="customer",
field=models.ForeignKey(
null=False,
on_delete=django.db.models.deletion.PROTECT,
to="orders.Customer",
),
),
# step 6: remove the old Order fields
migrations.RemoveField(model_name="order", name="customer_address",),
migrations.RemoveField(model_name="order", name="customer_city",),
migrations.RemoveField(model_name="order", name="customer_name",),
migrations.RemoveField(model_name="order", name="customer_zip_code",),
]
In this migration, step 2 and 3 basically loosen the data structure, preparing it for the data transfer. Then step 5 and 6 tighten it again !
Let’s now dive in step 4, where the magic happens !
Write the data transfer function
In order to perform this operation, we are going to use the RunPython
migration operation. What it basically does is execute a python function using the ORM.
Its syntax is the following :
# step 4: transfer data from Order to Customer
migrations.RunPython(order_to_customer, reverse_code=customer_to_order)
In this step of the migration, we specify two functions :
order_to_customer
will be run when running the migration ;customer_to_order
will be run when reversing the migration.
def order_to_customer(apps, schema_editor):
Order = apps.get_model("orders", "Order")
Customer = apps.get_model("orders", "Customer")
for order in Order.objects.all():
customer, _ = Customer.objects.get_or_create(
name=order.customer_name,
address=order.customer_address,
city=order.customer_city,
zip_code=order.customer_zip_code,
)
order.customer = customer
order.save()
def customer_to_order(apps, schema_editor):
Order = apps.get_model("orders", "Order")
for order in Order.objects.all():
order.customer_name = order.customer.name
order.customer_address = order.customer.address
order.customer_city = order.customer.city
order.customer_zip_code = order.customer.zip_code
order.save()
These functions look a lot like what you would write to manually transfer the data from one model to the other. However there is one trick. It isn’t possible to import the models normally with from orders.models import Order, Customer
. Instead, we need to use a versioned model passed to the migration function by Django.
Running the migration
After running this migration via python manage.py migrate
, we can check that the migration has created the Customer
instances and linked them to the correct Orders
.
Customers
Orders
Et voilà ! This migration is now operational. It can also be easily rolled back with python manage.py migrate orders <previous_migration_id>
. The data will then be transferred back to the Order
model.
Conclusion
You now have a way to stop worrying about losing data when migrating your database with Django! The Django ORM is a great tool and although it does the job most of the time, understanding what happens under the hood is a great way of making your life easier when dealing with migrations.
Sources
- Django documentation on migrations:
- Link to the Github repo with the example https://github.com/fargito/django-runpyton