Coding

8 min read

Data backwards compatibility: Evolving your database with no downtime

Published on

February 8, 2021

Author

Gábor Farkas

Cloud Architect

Subscribe to our newsletter

Subscribe

Evolving data structures in a continuously operated application with a NoSQL database can be challenging. These are some of the experiences we’ve had so far in the development and operation of AODocs, a cloud-native, serverless document management system used by millions of users all around the world.

No downtime

I’m not saying I miss developing classic Java EE applications with relational databases, but certain things were undoubtedly easier.

Working with maintenance downtimes between releases was a natural part of the release process. These time windows allowed us to execute update scripts on the database and verify data consistency before allowing the application to reconnect to the database.
There were well established tools, like Liquibase, to manage database changes.

In many cloud-native applications with NoSQL databases this approach doesn’t really work.

In our example, we cannot afford maintenance downtime windows; the application has to be fully available all the time, all over the world.
We cannot execute schema update statements, as there are’s no schema, and there are no data update statements that we can execute on tables. Strictly speaking, there are no tables; there are just entity kinds. This means we have to perform data updates imperatively, processing the entities one by one.

Multiple application versions accessing the same database

Because we don’t have maintenance downtimes, there will be periods, even if very short, when multiple application versions will access the same database. We also do progressive rollouts of major versions where two application versions live simultaneously for weeks. But even if we just push a new hotfix version, there will be a few minutes when requests from the old version are still executing, and the new version has also started serving. This is unavoidable without an actual downtime.

These two application versions have to manage the data in a compatible way, both backwards and forwards. If a request from the new version updates the data in the new way, the old version still has to be able to read it. If the old version then overwrites the data with the old schema, the new version still has to be able to read it and operate normally.

The rule of 3 versions

Any change to the data structure has to be very carefully designed and coordinated. Even a slight change is usually done considering 3 application versions. Version N will be where we first make the feature available for customers. To ensure compatibility, we usually need to start groundwork on the previous version.

vN-1 starts supporting the new data, in the sense that it must operate normally if it reads an entity that has the new data structure. It usually still writes the data in the old way, or at least in a way that’s compatible with vN-2 at the time of its release.
vN can still read the old representation, but it writes the data in the new way.
When its rollout is complete, it can start supporting the new features associated with the new data, if any.
During the lifetime of vN, we execute a data upgrade process to make sure that all entities are upgraded to the new representation.
vN+1 cleans up the data migration code and removes read support from the old mechanism. The actual timing of these releases may vary.

A very simple example

We are currently working with Java 8 in App Engine, with Cloud Datastore. In this setup, people usually use Objectify, the de-facto standard library for ORM mapping with Java.

Let’s assume we have a simple entity:

The boolean field you see above stores whether the given user wishes to receive notifications from our application. (The @Data annotation is from Lombok.)

Let’s assume we’d like to evolve our reminder feature to allow the users to set the reminder frequency to daily. One way to represent this is to change the boolean field to an enum, supporting three values: NONE, ONCE, DAILY.

Note: This is not an actual example. This representation is not meant to be correct, rather it’s intentionally incorrect. This particular example can be modeled in other ways that don’t require a schema change, only a schema extension. But not all model changes can be easily implemented without a schema change, and it’s often better to update the schema than to stick to a worse data model.

So we’ll have a schema update mapping that looks like this:

In version vN-1 we need to be able to read entities that were written according to the new structure. One way to do this is in this example is to use @AlsoLoad provided by Objectify:

If this code in version vN-1 meets an entity of the new structure, it applies a default backwards mapping. Note that Objectify erases fields from the underlying entity if there are no corresponding Java fields declared, so if this version saves the entity, the reminderMode field will be null again.

Then, on version vN, we can rely primarily on the new field, but we still need to be able to read entities from the old schema.

But what happens to the DAILY value?

You probably noticed that our backwards mapping of the enum to boolean is not complete. If a customer sets this value to DAILY on the new version, then if an old version updates the entity, this value will be just lost, and the user will be back to ONCE reminders.

This cannot always be completely avoided; solving this depends on the actual case.

In this particular case we can reliably solve this by storing the ReminderMode in vN-1 and only changing it when the user updates the setting on vN-1. This can happen if we need to roll back a version for a longer period.
In other cases it’s not easy to decide whether it makes sense to keep the new setting on the old version, even if it’s not effective there. Both behaviors might result in unexpected behavior for the end user in edge cases.
In our particular example, if a user has already applied the new DAILY setting and we need to roll back for a longer period, they might then observe that the reminder setting is back to ONCE. Then they might decide to update some other setting that’s also a suitable solution for their actual business case, and they might get confused if the daily reminders just come back a few days later, when we continue the version rollout.

If the risk or the severity of an undesired behavior is high during this rollout period when two application versions operate at the same time, we can use feature flags so that customers can only start leveraging the new feature once the version rollout is complete.

Another approach

The reminders example could actually be migrated in two versions. We could avoid having to prepare for the migration in vN-1 if the code in vN declared both the boolean and the enum fields. In this case:

The backwards compatibility is achieved by populating the boolean field based on the enum when saving the entity.
The data migration is done by populating the enum field from the boolean, if the enum is null. This is also enough to ensure forwards compatibility. If the entity is saved by the previous version, the enum value will be null. This means that if the users have already set the DAILY value, it will be reset to ONCE if there’s a version rollback. Whether or not this is a problem depends on the actual case, discussed in the previous section.

This approach is more often applied when there’s a purely technical (not user-facing), or a functionally equivalent data structure change.

The code example for our case:

In this particular case there’s no required code modification for vN-1. This cannot always be achieved. The general rule is that we have to ensure that vN-1 works properly if it reads an entity written by vN. For example, if we add a new possible enum value to an enum typed field, we have to add and handle that enum value somehow in the previous version, otherwise we’ll get an exception when reading the entity.

A new unicity constraint

Let’s assume we have an entity that can be edited by customers. At some point we realize that the display-name of this entity should be unique, so users shouldn’t be able to have two entities with the same name. The rule of 3 versions also applies here.

There’s nothing to do in vN-1 to ensure compatibility. (This may depend on how we interpret the rules, but let’s start here.)

vN will start applying the unicity constraint. The problem here is that vN-1 can still create duplicate names, and we also haven’t migrated data yet, meaning that any previously created duplicate is still there. Depending on where exactly we apply the unicity check, this can be problematic if the unicity check also prevents system actions on that entity.

One solution is to allow all updates to this entity, but if the name is changed, we only allow changing it to something unique.

Another approach is to enforce the name unicity rule only when there’s an actual user that is updating the value. This cannot always be easily identified. In the case of a frontend over a regular REST API call, for example, we cannot be sure if there’s an actual user at the other end who will be able to meaningfully handle our error message, or if we would block some business integration process with our error.

When vN is fully rolled out, we can execute a migration process that automatically assigns a unique name for problematic entities.

In vN+1 we can clean up the ‘eased’ rules and apply the unicity rule completely.

No silver bullet

All this sounds cumbersome. And, it really is. And it’s often not just cumbersome, but also complex. While the general pattern applies, every case is a bit different, and requires very careful consideration, design, and execution.

We’ve added multiple checkpoints in various parts of our development process to ensure that all changes that might affect compatibility are noticed and properly managed.

‍

Author

Gábor Farkas

Cloud Architect

Subscribe to our newsletter

Subscribe

New opportunities with cloud solutions!

‍Aliz is a proud Google Cloud Partner with specializations in Infrastructure, Data Analytics, Cloud Migration and Machine Learning. We deliver data analytics, machine learning, and infrastructure solutions, off the shelf, or custom-built on GCP using an agile, holistic approach.

Let's talk!