Enabling database migrations with cross-database PostgreSQL triggers

As part of my role at Lantum, I work with the team responsible for writing a new implementation of the part of the platform that allows workers to submit their timesheets and seek payment for their work.

Within the Lantum product, a worker, after completing a shift at a hospital, has a record of how many hours they worked and thus how much money they will receive for that work: their timesheet.

The system allows workers to make changes to the timesheet before they submit it to the hospital that they worked for (for example, stating that they started work earlier or finished later), and the hospital has the ability to approve or reject those changes before paying the worker.

The old backend system was badly designed with a lot of duplication in the data structure and data going out of sync, hence we decided to re-design the database in a more atomic, normalised way.

To complete that work, we realised pretty early on that the best way to change from one system to the other with minimal disruption was to write the new API, complete with a new, better designed and more normalised database schema. Then we would migrate the data over and switch which service is used.

So we could perform the migration early and minimise disruptions for the users, we decided on the following pattern:

Update the code that writes to the database to use the new service
Keep data replicated to the old database, so any services that need read access can still use the old service temporarily
Migrate the code that reads from the database to read from the new service

This was because there were fewer places performing writes than performing reads, and those were the more important places to update, so we could prioritise updating those early.

How we planned to replicate the data

Since the two database schemas were different, I proposed that we use PostgreSQL triggers to keep the data replicated to the old database.

Triggers are a PostgreSQL feature that allow you to run queries before or after data is inserted to, updated, or deleted from a table, or instead of an insert, update, or delete query that’s run against a view. For our use case, this would allow us to know when data is inserted to the new database tables, and replicate it to the old tables using the old format.

This was a very interesting challenge, and I wanted to share the procedure I followed for anybody else wanting to do something similar. For obvious reasons I have changed the names of columns, tables, and databases, but the procedure should be the same.

The database schemas that we’re working with

For the purpose of this blog post, consider that this is the schema as it is in the new database on the top, and as it was in the old database on the bottom:

The schema for the old database, with timesheet_new and a separate database for timesheet_options_id. Plus the schema for timesheet_old, which housed everything.

As you can see the column hospital_id was moved to the timesheet_options table (there are sensible reasons for this move that I won’t bore you with), and the is_approved column was removed, as it’s just an indication of status == approved

Creating a cross-database connection

This migration is happening across two different databases, let’s say from timesheet_old to timesheet_new. We need to set up a connection from timesheet_new to timesheet_old using the PostgreSQL Foreign Data Wrapper (FDW), which was introduced in version 9.5. This will allow us to run queries on the timesheet_old database when records are inserted in the timesheet_new database.

First enable the FDW extension:

CREATE EXTENSION IF NOT EXISTS postgres_fdw;

Then we create a reference to the server and database that we want to connect to. In this case, it’s a different database on the same server, so we use localhost, but it could be on a completely different server.

We also create a user mapping, which specifies the username and password we use to connect to the foreign database.

CREATE SERVER timesheet_old_server

FOREIGN DATA WRAPPER postgres_fdw

OPTIONS (host 'localhost', dbname 'timesheet_old');

# This specifies which username and password to use.

CREATE USER MAPPING FOR CURRENT_USER

SERVER timesheet_old_server

OPTIONS (user 'username', password 'password');

Even though we created a reference to the foreign server, we have no visibility of the tables available on that server at the time being. We could manually specify all the foreign tables using CREATE FOREIGN TABLE, or just import the whole schema right away. In this case, we are opting for the latter.

Note that this is just a representation of the foreign schema at the point in time, and not a real-time reference to that schema. This means that it does not automatically update. There is also no requirement for the foreign schema to match the actual physical schema on the foreign server, although if the schemas do not match queries may not succeed. There are, however, cases, where you may not want to copy certain columns to the foreign schema, which I will get to later.

In order to create a schema for our foreign database and import the schema from the remote server, we run the following:

CREATE SCHEMA timesheet_old_schema;

IMPORT FOREIGN SCHEMA public

FROM SERVER timesheet_old_server

INTO timesheet_old_schema;

Now if you check the tables in timesheet_old_schema you should be able to see all tables that exist in the remote database. You can run any SQL queries on them as normal. For example, the following command should return data that exists in one of the foreign tables:

SELECT * FROM timesheet_old_schema.timesheet_old LIMIT 1;

Creating the database triggers

Now that we have the foreign server defined, the schema imported and have verified that the connection works, we can actually create the database triggers that will keep our old database up to date when records are inserted into our new database.

In order to create the triggers, we first need to define a function that will run when the record is inserted. Because this function is run as part of a trigger, it has access to two special variables, OLD and NEW. OLD is only available when running an UPDATE query and has the values of the old record. NEW is available for inserts and updates and refers to the new record.

CREATE OR REPLACE FUNCTION new_records_inserted()

RETURNS TRIGGER

AS $$

BEGIN

INSERT INTO timesheet_old_schema.timesheet_old

(id, worker_id, hospital_id, status, is_approved)

VALUES (

NEW.id,

NEW.worker_id,

(SELECT tso.hospital_id FROM timesheet_options tso

WHERE tso.id=NEW.timesheet_options_id),

NEW.status,

(CASE WHEN NEW.status='approved' THEN true ELSE false END) --- is_approved

);

RETURN NEW;

END;

$$ LANGUAGE PLPGSQL;

As you can see, the syntax is mostly the same as a normal SQL query, although because we are using PL/pgSQL as a language, we could use some additional features (such as declaring variables) if we needed to. This function basically takes a record (NEW) , and inserts the same data into the old schema after converting it to the correct format.

Finally, in order to actually run this function when a record is inserted to our new table, we create the trigger:

CREATE CONSTRAINT TRIGGER do_dreplicate_to_old_table

-- When to run the trigger; this can be AFTER or BEFORE INSERT or UPDATE. In this case we want to run the trigger after an INSERT statement is ran

AFTER INSERT ON timesheet_new

-- By making the trigger deferrable and initially deferred, it will only run at the end of any transactions that the query may be ran as part of.

DEFERRABLE INITIALLY DEFERRED

-- for each row or for each statement; if we went with for each statement our function would need to be able to handle more than one insert at a time, as it's possible to insert multiple rows in a single statement. We don't bulk-create these records, so "for each row" is efficient enough.

FOR EACH ROW

-- Which function to call, this is the function that was defined earlier.

EXECUTE PROCEDURE new_records_inserted();

After running this query, we now have a trigger running every time a new record is inserted into the new table! Let’s try it out:

INSERT INTO timesheet_options (hospital_id) VALUES (15);

INSERT INTO timesheet_new (worker_id, status, timesheet_options_id) VALUES (33, 'approved', 14);

This should have created the following record in the old table:

SELECT * FROM timesheet_old_schema.timesheet_old;

id	worker_id	hospital_id	status	is_approved
40	33	15	approved	true

As you can see, the data is converted into the format of the old table and inserted into that table as well. Clients can now continue reading from the old database until the relevant code has been rewritten.

Performing this migration and keeping the records in sync was a challenge I really enjoyed. I hope my write up of how I achieved this with the Postgres FDW and AFTER INSERT/UPDATE triggers will help someone else.

Enabling database migrations with cross-database PostgreSQL triggers

How we planned to replicate the data

The database schemas that we’re working with

Creating a cross-database connection

Creating the database triggers

Author

Subscribe to the newsletter

Subscribe to the newsletter

About Lantum

Categories

Lantum on Twitter

Get in touch with the Lantum Team