Infrastructure at your Service

Daniel Westermann

PostgreSQL 15: Add row filters for logical replication

My last post about logical replication in PostgreSQL was in 2017 before PostgreSQL 10 was released later that year. There have been many enhancements to logical replication since then and today we’ll look at a new features that will probably come with PostgreSQL 15 later this year: Allow specifying row filters for logical replication of tables. What is it about? Up until today you can only replicate whole tables to a subscriber node. It is not possible to filter for specific rows or, in other words: You cannot replicate a subset of the rows in the table. With this commit this becomes possible, which is a great feature.

Lets start by creating two brand new instances, the first one will be used as the publisher, and the second one as the subscriber:

[email protected]:/home/postgres/ [pgdev] mkdir /var/tmp/pub && initdb -D /var/tmp/pub && echo "port=8888" >> /var/tmp/pub/postgresql.auto.conf
[email protected]:/home/postgres/ [pgdev] mkdir /var/tmp/sub && initdb -D /var/tmp/sub && echo "port=9999" >> /var/tmp/sub/postgresql.auto.conf

For being able to use logical replication, wal_level needs to be increased to “logical”, so lets do that and start up both instances:

[email protected]:/home/postgres/ [pgdev] echo "wal_level='logical'" >> /var/tmp/pub/postgresql.auto.conf
[email protected]:/home/postgres/ [pgdev] echo "wal_level='logical'" >> /var/tmp/sub/postgresql.auto.conf
[email protected]:/home/postgres/ [pgdev] pg_ctl -D /var/tmp/pub/ start -l /dev/null
[email protected]:/home/postgres/ [pgdev] pg_ctl -D /var/tmp/sub/ start -l /dev/null
[email protected]:/home/postgres/ [pgdev] psql -p 8888 -l
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(3 rows)

[email protected]:/home/postgres/ [pgdev] psql -p 9999 -l
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(3 rows)

Although it is not really required (as long as table and column names match), the exactly same schema and table are now getting created in both clusters:

[email protected]:/home/postgres/ [pgdev] cat c.sql 
create schema a;
create table a.t ( a int primary key, b text, c date );
[email protected]:/home/postgres/ [pgdev] psql -f /home/postgres/c.sql -p 8888
CREATE SCHEMA
CREATE TABLE
[email protected]:/home/postgres/ [pgdev] psql -f /home/postgres/c.sql -p 9999
CREATE SCHEMA
CREATE TABLE

Lets assume we have a kind of tenancy in that table on the publisher, which is defined by the second (b) column:

[email protected]:/home/postgres/ [pgdev] psql -c "insert into a.t select i,'aaa',now() from generate_series(1,1000) i" -p 8888
INSERT 0 1000
[email protected]:/home/postgres/ [pgdev] psql -c "insert into a.t select i,'bbb',now() from generate_series(1001,2000) i" -p 8888
INSERT 0 1000
[email protected]:/home/postgres/ [pgdev] psql -c "insert into a.t select i,'ccc',now() from generate_series(2001,3000) i" -p 8888
INSERT 0 1000

Up to PostgreSQL 14, the only option you have for logically replicating that table, is to replicate the whole table. Since the commit mentioned above we can filter on a subset of the rows:

[email protected]:/home/postgres/ [pgdev] psql -c "\h create publication" -p 8888
Command:     CREATE PUBLICATION
Description: define a new publication
Syntax:
CREATE PUBLICATION name
    [ FOR ALL TABLES
      | FOR publication_object [, ... ] ]
    [ WITH ( publication_parameter [= value] [, ... ] ) ]

where publication_object is one of:

    TABLE [ ONLY ] table_name [ * ] [ WHERE ( expression ) ] [, ... ]
    ALL TABLES IN SCHEMA { schema_name | CURRENT_SCHEMA } [, ... ]

URL: https://www.postgresql.org/docs/devel/sql-createpublication.html

Lets assume we only want to replicate tenant ‘aaa’, so we can do it like this:

[email protected]:/home/postgres/ [pgdev] psql -c "create publication pub1 for table a.t where (b = 'aaa')" -p 8888
CREATE PUBLICATION

Once we consume that on the subscriber side we’ll only have the data for that tenant:

[email protected]:/home/postgres/ [pgdev] psql -c "create subscription sub1 connection 'port=8888' publication pub1" -p 9999
NOTICE:  created replication slot "sub1" on publisher
CREATE SUBSCRIPTION
[email protected]:/home/postgres/ [pgdev] psql -c "select distinct(b), count(*) from a.t group by b" -p 9999
  b  | count 
-----+-------
 aaa |  1000
(1 row)

Really cool. Thanks to all involved for making this happen.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure