Enabling database migrations with cross-database PostgreSQL triggers
As part of my role at Lantum, I work with the team responsible for writing a new implementation of the part of the platform that allows workers to submit their timesheets and seek payment for their work.
Within the Lantum product, a worker, after completing a shift at a hospital, has a record of how many hours they worked and thus how much money they will receive for that work: their timesheet.
The system allows workers to make changes to the timesheet before they submit it to the hospital that they worked for (for example, stating that they started work earlier or finished later), and the hospital has the ability to approve or reject those changes before paying the worker.
The old backend system was badly designed with a lot of duplication in the data structure and data going out of sync, hence we decided to re-design the database in a more atomic, normalised way.
To complete that work, we realised pretty early on that the best way to change from one system to the other with minimal disruption was to write the new API, complete with a new, better designed and more normalised database schema. Then we would migrate the data over and switch which service is used.
So we could perform the migration early and minimise disruptions for the users, we decided on the following pattern:
- Update the code that writes to the database to use the new service
- Keep data replicated to the old database, so any services that need read access can still use the old service temporarily
- Migrate the code that reads from the database to read from the new service
This was because there were fewer places performing writes than performing reads, and those were the more important places to update, so we could prioritise updating those early.
How we planned to replicate the data
Since the two database schemas were different, I proposed that we use PostgreSQL triggers to keep the data replicated to the old database.
Triggers are a PostgreSQL feature that allow you to run queries before or after data is inserted to, updated, or deleted from a table, or instead of an insert, update, or delete query that’s run against a view. For our use case, this would allow us to know when data is inserted to the new database tables, and replicate it to the old tables using the old format.
This was a very interesting challenge, and I wanted to share the procedure I followed for anybody else wanting to do something similar. For obvious reasons I have changed the names of columns, tables, and databases, but the procedure should be the same.
The database schemas that we’re working with
For the purpose of this blog post, consider that this is the schema as it is in the new database on the top, and as it was in the old database on the bottom:
As you can see the column hospital_id
was moved to the timesheet_options
table (there are sensible reasons for this move that I won’t bore you with), and the is_approved
column was removed, as it’s just an indication of status == approved
Creating a cross-database connection
This migration is happening across two different databases, let’s say from timesheet_old
to timesheet_new
. We need to set up a connection from timesheet_new
to timesheet_old
using the PostgreSQL Foreign Data Wrapper (FDW), which was introduced in version 9.5. This will allow us to run queries on the timesheet_old
database when records are inserted in the timesheet_new
database.
First enable the FDW extension:
CREATE EXTENSION IF NOT EXISTS postgres_fdw; |
Then we create a reference to the server and database that we want to connect to. In this case, it’s a different database on the same server, so we use localhost
, but it could be on a completely different server.
We also create a user mapping, which specifies the username and password we use to connect to the foreign database.
|
Even though we created a reference to the foreign server, we have no visibility of the tables available on that server at the time being. We could manually specify all the foreign tables using CREATE FOREIGN TABLE, or just import the whole schema right away. In this case, we are opting for the latter.
Note that this is just a representation of the foreign schema at the point in time, and not a real-time reference to that schema. This means that it does not automatically update. There is also no requirement for the foreign schema to match the actual physical schema on the foreign server, although if the schemas do not match queries may not succeed. There are, however, cases, where you may not want to copy certain columns to the foreign schema, which I will get to later.
In order to create a schema for our foreign database and import the schema from the remote server, we run the following:
|
Now if you check the tables in timesheet_old_schema
you should be able to see all tables that exist in the remote database. You can run any SQL queries on them as normal. For example, the following command should return data that exists in one of the foreign tables:
SELECT * FROM timesheet_old_schema.timesheet_old LIMIT 1; |
Creating the database triggers
Now that we have the foreign server defined, the schema imported and have verified that the connection works, we can actually create the database triggers that will keep our old database up to date when records are inserted into our new database.
In order to create the triggers, we first need to define a function that will run when the record is inserted. Because this function is run as part of a trigger, it has access to two special variables, OLD
and NEW. OLD
is only available when running an UPDATE
query and has the values of the old record. NEW
is available for inserts and updates and refers to the new record.
|
As you can see, the syntax is mostly the same as a normal SQL query, although because we are using PL/pgSQL as a language, we could use some additional features (such as declaring variables) if we needed to. This function basically takes a record (NEW
) , and inserts the same data into the old schema after converting it to the correct format.
Finally, in order to actually run this function when a record is inserted to our new table, we create the trigger:
|
After running this query, we now have a trigger running every time a new record is inserted into the new table! Let’s try it out:
|
This should have created the following record in the old table:
SELECT * FROM timesheet_old_schema.timesheet_old; |
id |
worker_id |
hospital_id |
status |
is_approved |
40 |
33 |
15 |
approved |
true |
As you can see, the data is converted into the format of the old table and inserted into that table as well. Clients can now continue reading from the old database until the relevant code has been rewritten.
Performing this migration and keeping the records in sync was a challenge I really enjoyed. I hope my write up of how I achieved this with the Postgres FDW and AFTER INSERT/UPDATE
triggers will help someone else.
Author
Erry KostalaErry is a Senior Software Engineer at Lantum. You can check out their Github here: https://github.com/errietta