dbi Blog

What if you could clone a PostgreSQL database instantly without affecting the original source database, without impacting performance, without any external tool but using your linux Btrfs storage layer.
This is what I will demonstrate in the following blogpost.

Introduction

Sometimes developpers need urgently a copy of a PostgreSQL database where they can test new developments or make changes that won’t modify the source database.
The usual way is to ask for the most recent backup of the database and restore it on another server (staging or test).
The major problem of this solution is the time needed which is exponential with the database size and nowadays it is not affordable to wait every time several hours.
But with SUSE Btrfs, it is possible to circumvent this pitfal by using a nice feature called “Copy-On-Write” snapshot which is default since SLES version 12 .
Of course, and as a prerequisite, your source Postgres cluster must reside on a Btrfs filesystem.

Installation

For my demonstration that you can easily reproduce, I will use a SLES version 15 minimal installation.
As usual we start to create a PostgreSQL user and Group. We add it to the sudo configuration so we don’t need everytime to jump from postgres to root and conversely.
sles15:~ # groupadd postgres sles15:~ # useradd -g postgres -m postgres sles15:~ # passwd postgres
We need now to install the required packages, prepare the installation, download and install PostgreSQL 12.4
postgres@sles15:~> sudo zypper -n install wget gcc readline-devel zlib-devel libopenssl-devel pam-devel libxml2-devel libxslt-devel openldap2-devel python3-devel sysstat make systemd-devel bzip2 llvm7-devel llvm7 clang7 libicu-devel clang7 llvm7-devel llvm7 wget postgres@sles15:~> sudo mkdir -p /u01/app/postgres postgres@sles15:~> sudo chown postgres:postgres /u01/app/postgres postgres@sles15:~> wget https://ftp.postgresql.org/pub/source/v12.4/postgresql-12.4.tar.bz2 postgres@sles15:~> tar -axf postgresql-12.4.tar.bz2 postgres@sles15:~> cd postgresql-12.4/ postgres@sles15:~/postgresql-12.4> ./configure --prefix=/u01/app/postgres/product/12/db_4 postgres@sles15:~/postgresql-12.4> make all postgres@sles15:~/postgresql-12.4> make install postgres@sles15:~/postgresql-12.4> cd contrib postgres@sles15:~/postgresql-12.4/contrib> make install postgres@sles15:~/postgresql-12.4/contrib> cd ../.. postgres@sles15:~> rm -rf postgresql-12.4
We create now a new Btrfs filesystem and a subvolume for the source PostgreSQL cluster.
postgres@sles15:~> sudo mkdir -p /pgdatas postgres@sles15:~> lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 30G 0 disk ├─sda1 8:1 0 8M 0 part ├─sda2 8:2 0 18.6G 0 part /boot/grub2/i386-pc ├─sda3 8:3 0 9.4G 0 part /home └─sda4 8:4 0 2G 0 part [SWAP] sdb 8:16 0 52.7G 0 disk sr0 11:0 1 373M 0 rom postgres@sles15:~> sudo mkfs.btrfs /dev/sdb postgres@sles15:~> exit sles15:~ # echo "/dev/sdb /pgdatas btrfs defaults" >> /etc/fstab sles15:~ # mount -a sles15:~ # df -h /pgdatas /dev/sdb 53G 3.8M 53G 1% /pgdata sles15:~ # su - postgres postgres@sles15:~> sudo chown postgres:postgres /pgdatas/ postgres@sles15:~> sudo btrfs subvolume create /pgdatas/pg1 Create subvolume '/pgdatas/pg1' postgres@sles15:~> sudo btrfs subvolume list /pgdatas/ ID 257 gen 8 top level 5 path pg1
Let’s create and start the PostgreSQL cluster to be cloned.
As we need a consistent database for our tests, we will populate some data by using pgbench, the PostgreSQL benchmarking tool, in order to get a 15Gb database.
postgres@sles15:~> sudo chown postgres:postgres /pgdatas/pg1 postgres@sles15:~> /u01/app/postgres/product/12/db_4/bin/initdb -D /pgdatas/pg1 postgres@sles15:~> /u01/app/postgres/product/12/db_4/bin/pg_ctl -D /pgdatas/pg1 -l /dev/null start postgres@sles15:~> /u01/app/postgres/product/12/db_4/bin/psql -l List of databases Name | Owner. | Encoding | Collate | Ctype | Access privileges -----------+----------+----------+-------------+-------------+----------------------- postgres | postgres | UTF8. | en_US.UTF-8 | en_US.UTF-8 | template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres + template1 | postgres | UTF8. | en_US.UTF-8 | en_US.UTF-8 | =c/postgres + (3 rows) postgres@sles15:~> export PATH=/u01/app/postgres/product/12/db_4/bin/:$PATH postgres@sles15:~> createuser --login --pwprompt dbi postgres@sles15:~> createdb -e --owner=dbi dbi postgres@sles15:~> pgbench --initialize --scale=1000 -U dbi dbi postgres@sles15:~> psql -c "select pg_size_pretty(pg_database_size('dbi'))" pg_size_pretty ---------------- 15 GB
For my demonstration to be effective, I need also to generate some load on the source cluster and I will use again pgbench for that, with 60 transactions per second and 16 simultaneous users. I let it run for 10mn and meanwhile in another session, I will create two clones, staging and test to show you that it will take no resources on the original database.
postgres@sles15:~> pgbench -U dbi --rate=60 --client=16 --progress=5 --time=600 dbi postgres@sles15:~> cd /pgdatas/ postgres@sles15:/pgdatas> time sudo btrfs subvolume snapshot PG1 staging Create a snapshot of 'pg1' in './staging' real 0m0.123s user 0m0.011s sys 0m0.034s postgres@sles15:/pgdatas> time sudo btrfs subvolume snapshot pg1 test Create a snapshot of 'pg1' in './test' real 0m0.098s user 0m0.024s sys 0m0.014s
You can see below that with the pgbench statistics, there is absolutely no impact on performances meaning that this could be easily done on a productive cluster.
progress: 5.0 s, 51.8 tps, lat 945.726 ms stddev 203.743, lag 655.979 ms progress: 10.0 s, 67.0 tps, lat 245.902 ms stddev 259.702, lag 105.411 ms progress: 15.0 s, 61.8 tps, lat 77.370 ms stddev 52.207, lag 0.556 ms progress: 20.0 s, 61.2 tps, lat 67.853 ms stddev 42.487, lag 2.345 ms progress: 25.0 s, 60.6 tps, lat 63.429 ms stddev 71.115, lag 3.930 ms progress: 30.0 s, 66.2 tps, lat 49.639 ms stddev 49.599, lag 5.884 ms
Our 2 clones are ready but we need still some additional work.
As it is an atomic snapshot of our PostgreSQL subvolume, and we have exactly the same content, the postmaster.pid file which contains the process id of the source cluster has to be removed because as you might know, we can’t start our new PostgreSQL instance with the same PID, so we remove it.
But it’s not enough, we have also to change the port and we do that by echoing the new port in postgresql.auto.conf.
postgres@sles15:/u02/pgdata> rm -f /pgdatas/staging/postmaster.pid postgres@sles15:/u02/pgdata> rm -f /pgdatas/test/postmaster.pid postgres@sles15:/u02/pgdata> echo "port=5433" > /u02/pgdata/staging/postgresql.auto.conf postgres@sles15:/u02/pgdata> echo "port=5434" > /u02/pgdata/test/postgresql.auto.conf
The starting process will take some time because clones must be consistent and include everything that has been committed. So PostgreSQL will perform a recovery.
postgres@sles15:/u02/pgdata> pg_ctl -D /pgdatas/staging/ start # /u01/app/postgres/product/12/db_4/bin/pg_ctl -D /pgdatas/staging start waiting for server to start....2020-09-23 16:03:41.930 CEST [4248] LOG: starting PostgreSQL 12.4 dbi services build on x86_64-pc-linux-gnu, compiled by gcc (SUSE Linux) 7.5.0, 64-bit 2020-09-23 16:03:41.934 CEST [4248] LOG: listening on IPv6 address "::1", port 5433 2020-09-23 16:03:41.934 CEST [4248] LOG: listening on IPv4 address "127.0.0.1", port 5433 2020-09-23 16:03:41.943 CEST [4248] LOG: listening on Unix socket "/tmp/.s.PGSQL.5433" 2020-09-23 16:03:41.980 CEST [4249] LOG: database system was interrupted; last known up at 2020-09-23 15:48:30 CEST 2020-09-23 16:03:42.137 CEST [4249] LOG: database system was not properly shut down; automatic recovery in progress 2020-09-23 16:03:42.144 CEST [4249] LOG: redo starts at 11/6545A890 .2020-09-23 16:03:43.034 CEST [4249] LOG: invalid record length at 11/66662ED8: wanted 24, got 0 2020-09-23 16:03:43.034 CEST [4249] LOG: redo done at 11/66662EB0 ............2020-09-23 16:03:55.151 CEST [4248] LOG: database system is ready to accept connections done server started

Test

To illustrate that our clones won’t affect the source database when performing queries, let’s make a simple test on both clones.
First we check the filler field of the pgbench_tellers on all 3 clusters. We update it on the clones and we remove most of the rows of the pgbench_telers on the test clone only and we check again.
for val in {2 3 4} do psql -p 543${val} -U dbi -d dbi -c "select * from pgbench_tellers order by 1 limit 2" psql -p 543${val} -U dbi -d dbi -c "select count(*) from pgbench_tellers" done Source ------ tid | bid | tbalance | filler -----+-----+----------+-------- 1 | 1 | 24028. | 2 | 1. | -27698. | count ------- 10000 Clone staging ------------- tid | bid | tbalance | filler -----+-----+----------+-------- 1 | 1 | 22651. | 2 | 1 | -34706. | count ------- 10000 Clone test ---------- tid | bid | tbalance | filler -----+-----+----------+-------- 1 | 1 | 22651 | 2 | 1 | -34706. | # psql -p 5433 -d dbi -c update pgbench_tellers set filler = I am the Clone of pg1" UPDATE 10000 # psql -p 5434 -d dbi -c update pgbench_accounts set filler = I am the second Clone of pg1" UPDATE 10000 # psql -p 5434 -d dbi -c "delete from pgbench_tellers where tid > 100 and tid < 9800" DELETE 9699 Source ------ tid | bid | tbalance | filler ------+-----+----------+-------- 61 | 7 | -27082 | 8892 | 890 | 14471. | count ------- 10000 Clone staging ------------- tid | bid | tbalance | filler ------+-----+----------+-------------------------------------------------------------------------------------- 73 | 8 | 25292. | I am the Clone of pg1 48 | 5.| -34248 | I am the Clone of pg1 count ------- 10000 Clone test ---------- tid | bid | tbalance | filler -----+-----+----------+-------------------------------------------------------------------------------------- 73. | 8 | 25292. | I am the second Clone of pg1 48 | 5. | -34248 | I am the second Clone of pg1 count ------- 301
So, when tests are over, it’s easy to remove the clones but first, don’t forget to stop them and check that all subvolumes have been removed.
# pg_ctl -D /pgdatas/staging stop # pg_ctl -D /pgdatas/test stop # sudo btrfs subvolume delete staging Delete subvolume (no-commit): '/pgdatas/staging' # sudo btrfs subvolume delete test Delete subvolume (no-commit): '/pgdatas/test' # sudo btrfs subvolume list /pgdatas ID 258 gen 6958 top level 5 path pg1

Conclusion

Instant cloning a PostgreSQL cluster with Btrfs is easy & straight forward for developers (validation procedures), rapid testing and even PostgreSQL upgrade.
It’s simple to setup, you don’t need a licence, there is no impact on performance and it does not affect the source cluster.
So happy instant cloning.

Post Views: 1,145

Instant PostgreSQL Cloning with SUSE and Btrfs

Introduction

Installation

Test

Conclusion

Saïd Mendi

Leave a Reply:

Related blog articles

Introduction

Installation

Test

Conclusion

Saïd Mendi

Leave a Reply:

Related blog articles

Another file system for Linux: bcachefs (2) – multi device file systems

Another file system for Linux: bcachefs (1) – basics

PostgreSQL 17: pg_buffercache_evict()

PostgreSQL 17: Split and Merge partitions