My last post about logical replication in PostgreSQL was in 2017 before PostgreSQL 10 was released later that year. There have been many enhancements to logical replication since then and today we’ll look at a new features that will probably come with PostgreSQL 15 later this year: Allow specifying row filters for logical replication of tables. What is it about? Up until today you can only replicate whole tables to a subscriber node. It is not possible to filter for specific rows or, in other words: You cannot replicate a subset of the rows in the table. With this commit this becomes possible, which is a great feature.
Lets start by creating two brand new instances, the first one will be used as the publisher, and the second one as the subscriber:
postgres@debian11pg:/home/postgres/ [pgdev] mkdir /var/tmp/pub && initdb -D /var/tmp/pub && echo "port=8888" >> /var/tmp/pub/postgresql.auto.conf postgres@debian11pg:/home/postgres/ [pgdev] mkdir /var/tmp/sub && initdb -D /var/tmp/sub && echo "port=9999" >> /var/tmp/sub/postgresql.auto.conf
For being able to use logical replication, wal_level needs to be increased to “logical”, so lets do that and start up both instances:
postgres@debian11pg:/home/postgres/ [pgdev] echo "wal_level='logical'" >> /var/tmp/pub/postgresql.auto.conf postgres@debian11pg:/home/postgres/ [pgdev] echo "wal_level='logical'" >> /var/tmp/sub/postgresql.auto.conf postgres@debian11pg:/home/postgres/ [pgdev] pg_ctl -D /var/tmp/pub/ start -l /dev/null postgres@debian11pg:/home/postgres/ [pgdev] pg_ctl -D /var/tmp/sub/ start -l /dev/null postgres@debian11pg:/home/postgres/ [pgdev] psql -p 8888 -l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+----------+----------+-------------+-------------+----------------------- postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres + | | | | | postgres=CTc/postgres template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres + | | | | | postgres=CTc/postgres (3 rows) postgres@debian11pg:/home/postgres/ [pgdev] psql -p 9999 -l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+----------+----------+-------------+-------------+----------------------- postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres + | | | | | postgres=CTc/postgres template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres + | | | | | postgres=CTc/postgres (3 rows)
Although it is not really required (as long as table and column names match), the exactly same schema and table are now getting created in both clusters:
postgres@debian11pg:/home/postgres/ [pgdev] cat c.sql create schema a; create table a.t ( a int primary key, b text, c date ); postgres@debian11pg:/home/postgres/ [pgdev] psql -f /home/postgres/c.sql -p 8888 CREATE SCHEMA CREATE TABLE postgres@debian11pg:/home/postgres/ [pgdev] psql -f /home/postgres/c.sql -p 9999 CREATE SCHEMA CREATE TABLE
Lets assume we have a kind of tenancy in that table on the publisher, which is defined by the second (b) column:
postgres@debian11pg:/home/postgres/ [pgdev] psql -c "insert into a.t select i,'aaa',now() from generate_series(1,1000) i" -p 8888 INSERT 0 1000 postgres@debian11pg:/home/postgres/ [pgdev] psql -c "insert into a.t select i,'bbb',now() from generate_series(1001,2000) i" -p 8888 INSERT 0 1000 postgres@debian11pg:/home/postgres/ [pgdev] psql -c "insert into a.t select i,'ccc',now() from generate_series(2001,3000) i" -p 8888 INSERT 0 1000
Up to PostgreSQL 14, the only option you have for logically replicating that table, is to replicate the whole table. Since the commit mentioned above we can filter on a subset of the rows:
postgres@debian11pg:/home/postgres/ [pgdev] psql -c "h create publication" -p 8888 Command: CREATE PUBLICATION Description: define a new publication Syntax: CREATE PUBLICATION name [ FOR ALL TABLES | FOR publication_object [, ... ] ] [ WITH ( publication_parameter [= value] [, ... ] ) ] where publication_object is one of: TABLE [ ONLY ] table_name [ * ] [ WHERE ( expression ) ] [, ... ] ALL TABLES IN SCHEMA { schema_name | CURRENT_SCHEMA } [, ... ] URL: https://www.postgresql.org/docs/devel/sql-createpublication.html
Lets assume we only want to replicate tenant ‘aaa’, so we can do it like this:
postgres@debian11pg:/home/postgres/ [pgdev] psql -c "create publication pub1 for table a.t where (b = 'aaa')" -p 8888 CREATE PUBLICATION
Once we consume that on the subscriber side we’ll only have the data for that tenant:
postgres@debian11pg:/home/postgres/ [pgdev] psql -c "create subscription sub1 connection 'port=8888' publication pub1" -p 9999 NOTICE: created replication slot "sub1" on publisher CREATE SUBSCRIPTION postgres@debian11pg:/home/postgres/ [pgdev] psql -c "select distinct(b), count(*) from a.t group by b" -p 9999 b | count -----+------- aaa | 1000 (1 row)
Really cool. Thanks to all involved for making this happen.