Infrastructure at your Service

Mouhamadou Diaw

Replicating specific tables in PostgreSQL 10 Beta with mimeo

In this blog I am going to test the extension mimeo with PostgreSQL 10 beta. Mimeo is a replication extension for copying specific tables in one of several specialized ways from any number of source databases to a destination database where mimeo is installed.
In our configuration we are going to replicate data on a same server but between 2 clusters running on different ports. But it’s same for different servers. The pg_hba.conf should be configured to allow remote connection.
Source
Hostname: pgservertools.localdomain (192.168.56.30)
Database: prima (port 5432)
Target
Hostname: pgservertools.localdomain (192.168.56.30)
Database: repl (port 5433)
The first thing is to install the extension on the destination server. For this we will use the command git to clone the extension directory on the server.
[root@pgservertools ~]# yum install perl-Git.noarch
[root@pgservertools ~]# git clone git://github.com/omniti-labs/mimeo.git
Cloning into 'mimeo'...
remote: Counting objects: 1720, done.
remote: Total 1720 (delta 0), reused 0 (delta 0), pack-reused 1720
Receiving objects: 100% (1720/1720), 1.24 MiB | 429.00 KiB/s, done.
Resolving deltas: 100% (1094/1094), done.
[root@pgservertools ~]#

Then in the mimeo directory let’s run following commands

[root@pgservertools mimeo]# make
[root@pgservertools mimeo]# make install

If there is no error, we can create our two databases. The source database will be named prima and the target will be named repl.

[postgres@pgservertools postgres]$ psql
psql (10beta2)
Type "help" for help.
.
postgres=# show port;
port
------
5432
(1 row)
.
postgres=# create database prima;
CREATE DATABASE
postgres=#


postgres=# show port;
port
------
5433
(1 row)
.
postgres=# create database repl;
CREATE DATABASE
postgres=#

Now we have to install the extension mimeo in the destination database repl.
The extension mimeo requires the extension dblink. If this extension is not present, an error will be raised

repl=# create schema mimeo;
CREATE SCHEMA
.
repl=# create extension mimeo schema mimeo;
ERROR: required extension "dblink" is not installed
HINT: Use CREATE EXTENSION ... CASCADE to install required extensions too.
repl=#

The extension dblink should be already present with the standard installation. This can be verified by listing files the extension directory.

[root@pgservertools extension]# pwd
/usr/pgsql-10/share/extension
.
[root@pgservertools extension]# ls -l dblink*
-rw-r--r--. 1 root root 419 Jul 13 12:15 dblink--1.0--1.1.sql
-rw-r--r--. 1 root root 2832 Jul 13 12:15 dblink--1.1--1.2.sql
-rw-r--r--. 1 root root 6645 Jul 13 12:15 dblink--1.2.sql
-rw-r--r--. 1 root root 170 Jul 13 12:15 dblink.control
-rw-r--r--. 1 root root 2863 Jul 13 12:15 dblink--unpackaged--1.0.sql
[root@pgservertools extension]#

So rexecuting the instruction with the cascade option will install the extension dblink.

repl=# create extension mimeo schema mimeo cascade;
NOTICE: installing required extension "dblink"
CREATE EXTENSION
repl=#

On the target database let’s create a user mimeo we will use for the replication and let’s give him all required privileges. Superuser is not needed by will also work.

repl=# create user mimeo password 'root';
CREATE ROLE
repl=# GRANT USAGE ON SCHEMA mimeo TO mimeo;
GRANT
repl=# GRANT USAGE ON SCHEMA public TO mimeo;
GRANT
repl=# GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA mimeo TO mimeo;
GRANT
repl=#

On the source database let’s create same user on the source and give him required privileges

prima=# create user mimeo password 'root';
CREATE ROLE
prima=# CREATE SCHEMA mimeo;
CREATE SCHEMA
prima=# ALTER SCHEMA mimeo OWNER TO mimeo;
ALTER SCHEMA
prima=#

Every source database needs to have its connection information stored in mimeo’s dblink_mapping_mimeo table on the destination database. You can have as many source databases as you need, which makes creating a central replication destination for many master databases easy. All data is pulled by the destination database, never pushed by the source.

repl=# INSERT INTO mimeo.dblink_mapping_mimeo (data_source, username, pwd)
VALUES ('host=192.168.56.30 port=5432 dbname=prima', 'mimeo', 'root');
INSERT 0 1
repl=#

On the source let’s create table to be replicated and insert some data

prima=# create table article(idart int primary key, name varchar(20));
CREATE TABLE


prima=# table article;
idart | name
-------+-------
1 | Paper
2 | Food
(2 rows)
prima=#

Grant required privilege to mimeo

prima=# grant select on article to mimeo;
GRANT
prima=# grant trigger on article to mimeo;
GRANT
prima=#

Now we are ready to start the replication. We have three methods of replication:
-Snapshot replication
-Incremental replication
-DML replication
We will discuss only for snapshot and DML methods. Indeed the incremental method can replicate only inserted and updated data. It will not replicate any deleted data. See the documentation here

Snapshot Replication
This method is the only one to replicate data and structure change (add column….)
To initialize the table we use the function snapshot_maker (as we are using snapshot replication) and we pass as arguments the table to be replicated and the id of the dblink we want to use.

repl=# select * from mimeo.dblink_mapping_mimeo ;
data_source_id | data_source | username | pwd
----------------+-------------------------------------------+----------+------
1 | host=192.168.56.30 port=5432 dbname=prima | mimeo | root

So following command is used to initialize the table

repl=# SELECT mimeo.snapshot_maker('public.article', 1);
NOTICE: attempting first snapshot
NOTICE: attempting second snapshot
NOTICE: Done
snapshot_maker
----------------
.
(1 row)
repl=#

And we can easily verify that the two tables are synchronized.

repl=# table article;
idart | name
-------+-------
1 | Paper
2 | Food
(2 rows)
repl=#

Now let’s insert new data in the source table and let’s see how to refresh the target table.

prima=# table article;
idart | name
-------+-------
1 | Paper
2 | Food
3 | Oil
4 | Books

On the target database we just have to use the refresh_snap function

repl=# SELECT mimeo.refresh_snap('public.article');
refresh_snap
--------------
.
(1 row)

And we see that the source table was updated.

repl=# table article;
idart | name
-------+-------
1 | Paper
2 | Food
3 | Oil
4 | Books
(4 rows)
repl=#

A refresh can be scheduled using crontab for example every two minutes in my case

[postgres@pgservertools ~]$ crontab -l
*/2 * * * * psql -p 5433 -d repl -c "SELECT mimeo.refresh_snap('public.article')";
[postgres@pgservertools ~]$

DML Replication
The snapshot method is easier to setup, but it is not recommended for large table as
a table setup with this method will have the entire contents refreshed every time it is run.
So for large tables DML replication is recommended.
Let’s create a table customers on the source

prima=# create table customers(idcust int primary key, name varchar(30));
CREATE TABLE
. ^
prima=# insert into customers values(1,'Dbi');
INSERT 0 1
prima=# insert into customers values(2,'XZZ');
INSERT 0 1
.
prima=# table customers
prima-# ;
idcust | name
--------+------
1 | Dbi
2 | XZZ
(2 rows)
prima=#

And let’s grant required privileges to mimeo on customers

prima=# grant select on customers to mimeo;
GRANT
prima=# grant trigger on customers to mimeo;
GRANT
prima=#

On the target we use the function dml_maker to replicate data. We can see that we can even change the name of the destination table.

repl=# SELECT mimeo.dml_maker('public.customers', 1, p_dest_table := 'public.customers_repl');
NOTICE: Creating objects on source database (function, trigger & queue table)...
NOTICE: Pulling data from source...
dml_maker
-----------
.
(1 row)

We can verify that the table customers_repl is created

repl=# \d
List of relations
Schema | Name | Type | Owner
--------+----------------+-------+----------
public | article | view | postgres
public | article_snap1 | table | postgres
public | article_snap2 | table | postgres
public | customers_repl | table | postgres
(4 rows)

And that data are replicated

repl=# select * from customers_repl ;
idcust | name
--------+------
1 | Dbi
2 | XZZ
(2 rows)

Now let’s insert again new data in the source table and let’s see how to refresh the target

prima=# insert into customers values(3,'Linux');
INSERT 0 1
prima=# insert into customers values(4,'Unix');
INSERT 0 1
.
prima=# table customers
;
idcust | name
--------+-------
1 | Dbi
2 | XZZ
3 | Linux
4 | Unix
(4 rows)

On the target database we have to run the refresh_dml function

repl=# SELECT mimeo.refresh_dml('public.customers_repl');
refresh_dml
-------------
.
(1 row)


repl=# table customers_repl;
idcust | name
--------+-------
1 | Dbi
2 | XZZ
3 | Linux
4 | Unix
(4 rows)
repl=#

Like the snapshot method a crontab can be scheduled .
Conclusion
In a previous blog, we saw that logical replication is now supported on PostgreSQL 10. But the extension mimeo can still be used.

 

Leave a Reply


8 + = eleven

Mouhamadou Diaw
Mouhamadou Diaw

Consultant