Infrastructure at your Service

Daniel Westermann

In-core logical replication will hit PostgreSQL 10

Finally in PostgreSQL 10 (expected to be released this September) a long awaited feature will probably appear: In-core logical replication. PostgreSQL supports physical replication since version 9.0 and now the next step happened with the implementation of logical replication. This will be a major help in upgrading PostgreSQL instances from one version to another with no (or almost no) downtime. In addition this can be used to consolidate data from various instances into one instance for reporting purposes or you can use it to distribute only a subset of your data to selected users on other instances. In contrast to physical replication logical replication works on the table level so you can replicate changes in one or more tables, one database are all databases in a PostgreSQL instance which is quite flexible.

In PostgreSQL logical replication is implemented using a publisher and subscriber model. This mean the publisher is the one who will send the data and the subscriber is the one who will receive and apply the changes. A subscriber can be a publisher as well so you can build cascading logical replication. Here is an overview of a possible setup:

pg-logocal-replication-overview

For setting up logical replication when you do not start with an empty database you’ll need to initially load the database where you want to replicate to. How can you do that? I have two PostgreSQL 10 instances (build from the git sources) running on the same host:

Role Port
Publisher 6666
Subsriber 6667

Lets assume we have this sample setup on the publisher instance:

drop table if exists t1;
create table t1 ( a int primary key
                , b varchar(100)
                );
with generator as 
 ( select a.*
     from generate_series ( 1, 5000000 ) a
    order by random()
 )
insert into t1 ( a,b ) 
     select a
          , md5(a::varchar)
       from generator;
select * from pg_size_pretty ( pg_relation_size ('t1' ));

On the subscriber instance there is the same table, but empty:

create table t1 ( a int primary key
                , b varchar(100)
                );

Before we start with the initial load lets take a look at the process list:

postgres@pgbox:/home/postgres/ [PUBLISHER] ps -ef | egrep "PUBLISHER|SUBSCRIBER"
postgres 17311     1  0 11:33 pts/0    00:00:00 /u01/app/postgres/product/dev/db_01/bin/postgres -D /u02/pgdata/PUBLISHER
postgres 17313 17311  0 11:33 ?        00:00:00 postgres: PUBLISHER: checkpointer process   
postgres 17314 17311  0 11:33 ?        00:00:00 postgres: PUBLISHER: writer process   
postgres 17315 17311  0 11:33 ?        00:00:00 postgres: PUBLISHER: wal writer process   
postgres 17316 17311  0 11:33 ?        00:00:00 postgres: PUBLISHER: autovacuum launcher process   
postgres 17317 17311  0 11:33 ?        00:00:00 postgres: PUBLISHER: stats collector process   
postgres 17318 17311  0 11:33 ?        00:00:00 postgres: PUBLISHER: bgworker: logical replication launcher   
postgres 17321     1  0 11:33 pts/1    00:00:00 /u01/app/postgres/product/dev/db_01/bin/postgres -D /u02/pgdata/SUBSCRIBER
postgres 17323 17321  0 11:33 ?        00:00:00 postgres: SUBSCRIBER: checkpointer process   
postgres 17324 17321  0 11:33 ?        00:00:00 postgres: SUBSCRIBER: writer process   
postgres 17325 17321  0 11:33 ?        00:00:00 postgres: SUBSCRIBER: wal writer process   
postgres 17326 17321  0 11:33 ?        00:00:00 postgres: SUBSCRIBER: autovacuum launcher process   
postgres 17327 17321  0 11:33 ?        00:00:00 postgres: SUBSCRIBER: stats collector process   
postgres 17328 17321  0 11:33 ?        00:00:00 postgres: SUBSCRIBER: bgworker: logical replication launcher   

You’ll notice that there is a new background process called “bgworker: logical replication launcher”. We’ll come back to that later.

Time to create our first publication on the publisher with the create publication command:

postgres@pgbox:/u02/pgdata/PUBLISHER/ [PUBLISHER] psql -X postgres
psql (10devel)
Type "help" for help.

postgres=# create publication my_first_publication for table t1;
CREATE PUBLICATION

On the subscriber we need to create a subscription by using the create subscription command:

postgres@pgbox:/u02/pgdata/SUBSCRIBER/ [SUBSCRIBER] psql -X postgres
psql (10devel)
Type "help" for help.

postgres=# create subscription my_first_subscription connection 'host=localhost port=6666 dbname=postgres user=postgres' publication my_first_publication;
ERROR:  could not create replication slot "my_first_subscription": ERROR:  logical decoding requires wal_level >= logical

Ok, good hint. After changing that on both instances:

postgres@pgbox:/home/postgres/ [SUBSCRIBER] psql -X postgres
psql (10devel)
Type "help" for help.

postgres=# create subscription my_first_subscription connection 'host=localhost port=6666 dbname=postgres user=postgres' publication my_first_publication;
CREATE SUBSCRIPTION

If you are not on super fast hardware and check the process list again you’ll see something like this:

postgres 19465 19079 19 11:58 ?        00:00:04 postgres: SUBSCRIBER: bgworker: logical replication worker for subscription 16390 sync 16384  

On the subscriber the “logical replication launcher” background process launched a worker process and syncs the table automatically (this can be avoided by using the “NOCOPY DATA”):

postgres=# show port;
 port 
------
 6667
(1 row)

postgres=# select count(*) from t1;
  count  
---------
 5000000
(1 row)

Wow, that was really easy. You can find more details in the logfile of the subscriber instance:

2017-04-13 11:58:15.099 CEST - 1 - 19087 -  - @ LOG:  starting logical replication worker for subscription "my_first_subscription"
2017-04-13 11:58:15.101 CEST - 1 - 19463 -  - @ LOG:  logical replication apply for subscription my_first_subscription started
2017-04-13 11:58:15.104 CEST - 2 - 19463 -  - @ LOG:  starting logical replication worker for subscription "my_first_subscription"
2017-04-13 11:58:15.105 CEST - 1 - 19465 -  - @ LOG:  logical replication sync for subscription my_first_subscription, table t1 started
2017-04-13 11:59:03.373 CEST - 1 - 19082 -  - @ LOG:  checkpoint starting: xlog
2017-04-13 11:59:37.985 CEST - 2 - 19082 -  - @ LOG:  checkpoint complete: wrote 14062 buffers (85.8%); 1 transaction log file(s) added, 0 removed, 0 recycled; write=26.959 s, sync=2.291 s, total=34.740 s; sync files=13, longest=1.437 s, average=0.171 s; distance=405829 kB, estimate=405829 kB
2017-04-13 12:02:23.728 CEST - 2 - 19465 -  - @ LOG:  logical replication synchronization worker finished processing

On the publisher instance you get another process for sending the changes to the subscriber:

postgres 19464 18318  0 11:58 ?        00:00:00 postgres: PUBLISHER: wal sender process postgres ::1(41768) idle

Changes to the table on the publisher should now get replicated to the subscriber node:

postgres=# show port;
 port 
------
 6666
(1 row)
postgres=# insert into t1 (a,b) values (-1,'aaaaa');
INSERT 0 1
postgres=# update t1 set b='bbbbb' where a=-1;
UPDATE 1

On the subscriber node:

postgres=# show port;
 port 
------
 6667
(1 row)

postgres=# select * from t1 where a = -1;
 a  |   b   
----+-------
 -1 | aaaaa
(1 row)

postgres=# select * from t1 where a = -1;
 a  |   b   
----+-------
 -1 | bbbbb
(1 row)

As mentioned initially you can make the subscriber a publisher and the publisher a subscriber at the same time. So when we create this table on both instances:

create table t2 ( a int primary key );

Then create a publication on the subscriber node:

postgres=# create table t2 ( a int primary key );
CREATE TABLE
postgres=# show port;
 port 
------
 6667
(1 row)

postgres=# create publication my_second_publication for table t2;
CREATE PUBLICATION
postgres=# 

Then create the subscription to that on the publisher node:

postgres=# show port;
 port 
------
 6666
(1 row)

postgres=# create subscription my_second_subscription connection 'host=localhost port=6667 dbname=postgres user=postgres' publication my_second_publication;
CREATE SUBSCRIPTION

… we have a second logical replication the other way around:

postgres=# show port;
 port 
------
 6667
(1 row)
postgres=# insert into t2 values ( 1 );
INSERT 0 1
postgres=# insert into t2 values ( 2 );
INSERT 0 1
postgres=# 

On the other instance:

postgres=# show port;
 port 
------
 6666
(1 row)

postgres=# select * from t2;
 a 
---
 1
 2
(2 rows)

There are two new catalog views which give you information about subscriptions and publications:

postgres=# select * from pg_subscription;
 subdbid |        subname         | subowner | subenabled |                      subconninfo                       |      subslotname       |     subpublications     
---------+------------------------+----------+------------+--------------------------------------------------------+------------------------+-------------------------
   13216 | my_second_subscription |       10 | t          | host=localhost port=6667 dbname=postgres user=postgres | my_second_subscription | {my_second_publication}
(1 row)

postgres=# select * from pg_publication;
       pubname        | pubowner | puballtables | pubinsert | pubupdate | pubdelete 
----------------------+----------+--------------+-----------+-----------+-----------
 my_first_publication |       10 | f            | t         | t         | t
(1 row)

What a cool feature and so easy to use. Thanks to all who brought that into PostgreSQL 10, great work.

 

Leave a Reply


× 4 = twenty

Daniel Westermann
Daniel Westermann

Senior Consultant and Technology Leader Open Infrastructure