What if you could clone a PostgreSQL database instantly without affecting the original source database, without impacting performance, without any external tool but using your linux Btrfs storage layer.
This is what I will demonstrate in the following blogpost.

Introduction

Sometimes developpers need urgently a copy of a PostgreSQL database where they can test new developments or make changes that won’t modify the source database.
The usual way is to ask for the most recent backup of the database and restore it on another server (staging or test).
The major problem of this solution is the time needed which is exponential with the database size and nowadays it is not affordable to wait every time several hours.
But with SUSE Btrfs, it is possible to circumvent this pitfal by using a nice feature called “Copy-On-Write” snapshot which is default since SLES version 12 .
Of course, and as a prerequisite, your source Postgres cluster must reside on a Btrfs filesystem.

Installation

For my demonstration that you can easily reproduce, I will use a SLES version 15 minimal installation.
As usual we start to create a PostgreSQL user and Group. We add it to the sudo configuration so we don’t need everytime to jump from postgres to root and conversely.
sles15:~ # groupadd postgres
sles15:~ # useradd -g postgres -m postgres
sles15:~ # passwd postgres

We need now to install the required packages, prepare the installation, download and install PostgreSQL 12.4
postgres@sles15:~> sudo zypper -n install wget gcc readline-devel zlib-devel libopenssl-devel pam-devel libxml2-devel libxslt-devel openldap2-devel python3-devel sysstat make systemd-devel bzip2 llvm7-devel llvm7 clang7 libicu-devel
clang7 llvm7-devel llvm7 wget

postgres@sles15:~> sudo mkdir -p /u01/app/postgres
postgres@sles15:~> sudo chown postgres:postgres /u01/app/postgres
postgres@sles15:~> wget https://ftp.postgresql.org/pub/source/v12.4/postgresql-12.4.tar.bz2
postgres@sles15:~> tar -axf postgresql-12.4.tar.bz2
postgres@sles15:~> cd postgresql-12.4/
postgres@sles15:~/postgresql-12.4> ./configure --prefix=/u01/app/postgres/product/12/db_4
postgres@sles15:~/postgresql-12.4> make all
postgres@sles15:~/postgresql-12.4> make install
postgres@sles15:~/postgresql-12.4> cd contrib
postgres@sles15:~/postgresql-12.4/contrib> make install
postgres@sles15:~/postgresql-12.4/contrib> cd ../..
postgres@sles15:~> rm -rf postgresql-12.4

We create now a new Btrfs filesystem and a subvolume for the source PostgreSQL cluster.
postgres@sles15:~> sudo mkdir -p /pgdatas
postgres@sles15:~> lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 30G 0 disk
├─sda1 8:1 0 8M 0 part
├─sda2 8:2 0 18.6G 0 part /boot/grub2/i386-pc
├─sda3 8:3 0 9.4G 0 part /home
└─sda4 8:4 0 2G 0 part [SWAP]
sdb 8:16 0 52.7G 0 disk
sr0 11:0 1 373M 0 rom
postgres@sles15:~> sudo mkfs.btrfs /dev/sdb
postgres@sles15:~> exit
sles15:~ # echo "/dev/sdb /pgdatas btrfs defaults" >> /etc/fstab
sles15:~ # mount -a
sles15:~ # df -h /pgdatas
/dev/sdb 53G 3.8M 53G 1% /pgdata
sles15:~ # su - postgres
postgres@sles15:~> sudo chown postgres:postgres /pgdatas/
postgres@sles15:~> sudo btrfs subvolume create /pgdatas/pg1
Create subvolume '/pgdatas/pg1'
postgres@sles15:~> sudo btrfs subvolume list /pgdatas/
ID 257 gen 8 top level 5 path pg1

Let’s create and start the PostgreSQL cluster to be cloned.
As we need a consistent database for our tests, we will populate some data by using pgbench, the PostgreSQL benchmarking tool, in order to get a 15Gb database.
postgres@sles15:~> sudo chown postgres:postgres /pgdatas/pg1
postgres@sles15:~> /u01/app/postgres/product/12/db_4/bin/initdb -D /pgdatas/pg1
postgres@sles15:~> /u01/app/postgres/product/12/db_4/bin/pg_ctl -D /pgdatas/pg1 -l /dev/null start
postgres@sles15:~> /u01/app/postgres/product/12/db_4/bin/psql -l
List of databases
Name       | Owner.   | Encoding | Collate     | Ctype       | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
postgres   | postgres | UTF8.    | en_US.UTF-8 | en_US.UTF-8 |
template0  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
template1  | postgres | UTF8.    | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
(3 rows)
postgres@sles15:~> export PATH=/u01/app/postgres/product/12/db_4/bin/:$PATH
postgres@sles15:~> createuser --login --pwprompt dbi
postgres@sles15:~> createdb -e --owner=dbi dbi
postgres@sles15:~> pgbench --initialize --scale=1000 -U dbi dbi
postgres@sles15:~> psql -c "select pg_size_pretty(pg_database_size('dbi'))"
pg_size_pretty
----------------
15 GB

For my demonstration to be effective, I need also to generate some load on the source cluster and I will use again pgbench for that, with 60 transactions per second and 16 simultaneous users. I let it run for 10mn and meanwhile in another session, I will create two clones, staging and test to show you that it will take no resources on the original database.
postgres@sles15:~> pgbench -U dbi --rate=60 --client=16 --progress=5 --time=600 dbi
postgres@sles15:~> cd /pgdatas/
postgres@sles15:/pgdatas> time sudo btrfs subvolume snapshot PG1 staging
Create a snapshot of 'pg1' in './staging'
real 0m0.123s
user 0m0.011s
sys 0m0.034s
postgres@sles15:/pgdatas> time sudo btrfs subvolume snapshot pg1 test
Create a snapshot of 'pg1' in './test'
real 0m0.098s
user 0m0.024s
sys 0m0.014s

You can see below that with the pgbench statistics, there is absolutely no impact on performances meaning that this could be easily done on a productive cluster.
progress: 5.0 s, 51.8 tps, lat 945.726 ms stddev 203.743, lag 655.979 ms
progress: 10.0 s, 67.0 tps, lat 245.902 ms stddev 259.702, lag 105.411 ms
progress: 15.0 s, 61.8 tps, lat 77.370 ms stddev 52.207, lag 0.556 ms
progress: 20.0 s, 61.2 tps, lat 67.853 ms stddev 42.487, lag 2.345 ms
progress: 25.0 s, 60.6 tps, lat 63.429 ms stddev 71.115, lag 3.930 ms
progress: 30.0 s, 66.2 tps, lat 49.639 ms stddev 49.599, lag 5.884 ms

Our 2 clones are ready but we need still some additional work.
As it is an atomic snapshot of our PostgreSQL subvolume, and we have exactly the same content, the postmaster.pid file which contains the process id of the source cluster has to be removed because as you might know, we can’t start our new PostgreSQL instance with the same PID, so we remove it.
But it’s not enough, we have also to change the port and we do that by echoing the new port in postgresql.auto.conf.
postgres@sles15:/u02/pgdata> rm -f /pgdatas/staging/postmaster.pid
postgres@sles15:/u02/pgdata> rm -f /pgdatas/test/postmaster.pid
postgres@sles15:/u02/pgdata> echo "port=5433" > /u02/pgdata/staging/postgresql.auto.conf
postgres@sles15:/u02/pgdata> echo "port=5434" > /u02/pgdata/test/postgresql.auto.conf

The starting process will take some time because clones must be consistent and include everything that has been committed. So PostgreSQL will perform a recovery.
postgres@sles15:/u02/pgdata> pg_ctl -D /pgdatas/staging/ start
# /u01/app/postgres/product/12/db_4/bin/pg_ctl -D /pgdatas/staging start
waiting for server to start....2020-09-23 16:03:41.930 CEST [4248] LOG: starting PostgreSQL 12.4 dbi services build on x86_64-pc-linux-gnu, compiled by gcc (SUSE Linux) 7.5.0, 64-bit
2020-09-23 16:03:41.934 CEST [4248] LOG: listening on IPv6 address "::1", port 5433
2020-09-23 16:03:41.934 CEST [4248] LOG: listening on IPv4 address "127.0.0.1", port 5433
2020-09-23 16:03:41.943 CEST [4248] LOG: listening on Unix socket "/tmp/.s.PGSQL.5433"
2020-09-23 16:03:41.980 CEST [4249] LOG: database system was interrupted; last known up at 2020-09-23 15:48:30 CEST
2020-09-23 16:03:42.137 CEST [4249] LOG: database system was not properly shut down; automatic recovery in progress
2020-09-23 16:03:42.144 CEST [4249] LOG: redo starts at 11/6545A890
.2020-09-23 16:03:43.034 CEST [4249] LOG: invalid record length at 11/66662ED8: wanted 24, got 0
2020-09-23 16:03:43.034 CEST [4249] LOG: redo done at 11/66662EB0
............2020-09-23 16:03:55.151 CEST [4248] LOG: database system is ready to accept connections
done
server started

Test

To illustrate that our clones won’t affect the source database when performing queries, let’s make a simple test on both clones.
First we check the filler field of the pgbench_tellers on all 3 clusters. We update it on the clones and we remove most of the rows of the pgbench_telers on the test clone only and we check again.
for val in {2 3 4}
do
psql -p 543${val} -U dbi -d dbi -c "select * from pgbench_tellers order by 1 limit 2"
psql -p 543${val} -U dbi -d dbi -c "select count(*) from pgbench_tellers"
done

Source
------
tid  | bid | tbalance | filler
-----+-----+----------+--------
1    | 1   | 24028.   |
2    | 1.  | -27698.  |
count
-------
10000
Clone staging
-------------
tid  | bid | tbalance | filler
-----+-----+----------+--------
1    | 1   | 22651.   |
2    | 1   | -34706.  |
count
-------
10000
Clone test
----------
tid  | bid | tbalance | filler
-----+-----+----------+--------
1    | 1   | 22651    |
2    | 1   | -34706.  |
# psql -p 5433 -d dbi -c update pgbench_tellers set filler = I am the Clone of pg1"
UPDATE 10000
# psql -p 5434 -d dbi -c update pgbench_accounts set filler = I am the second Clone of pg1"
UPDATE 10000
# psql -p 5434 -d dbi -c "delete from pgbench_tellers where tid > 100 and tid < 9800"
DELETE 9699
Source
------
tid   | bid | tbalance | filler
------+-----+----------+--------
61    | 7   | -27082   |
8892  | 890 | 14471.   |
count
-------
10000
Clone staging
-------------
tid | bid | tbalance | filler
------+-----+----------+--------------------------------------------------------------------------------------
73    | 8 | 25292.     | I am the Clone of pg1
48    | 5.| -34248     | I am the Clone of pg1
count
-------
10000
Clone test
----------
tid  | bid | tbalance | filler
-----+-----+----------+--------------------------------------------------------------------------------------
73.  | 8   | 25292.   | I am the second Clone of pg1
48   | 5.  | -34248   | I am the second Clone of pg1
count
-------
301

So, when tests are over, it’s easy to remove the clones but first, don’t forget to stop them and check that all subvolumes have been removed.
# pg_ctl -D /pgdatas/staging stop
# pg_ctl -D /pgdatas/test stop
# sudo btrfs subvolume delete staging
Delete subvolume (no-commit): '/pgdatas/staging'
# sudo btrfs subvolume delete test
Delete subvolume (no-commit): '/pgdatas/test'
# sudo btrfs subvolume list /pgdatas
ID 258 gen 6958 top level 5 path pg1

Conclusion

Instant cloning a PostgreSQL cluster with Btrfs is easy & straight forward for developers (validation procedures), rapid testing and even PostgreSQL upgrade.
It’s simple to setup, you don’t need a licence, there is no impact on performance and it does not affect the source cluster.
So happy instant cloning.