Infrastructure at your Service

Mouhamadou Diaw

Backup and Restore PostgreSQL with PgBackRest II

In a precedent blog I shown a basic utilization of PgBackRest which is a tool to backup and restore PostgreSQL databases. In this blog I am going to talk some useful features of this tool. In practical examples we will see some tasks we can do with this tool. Of course the official documentation remains the best source of knowledges.

Encryption
Nowadays encryption of backups is very critical and is mandatory for many companies. PgBackRest allows us to encrypt the repository where backups are stored. A passphrase is used to encrypt/decrypt files of the repository. As you may already know, it is recommended to use a strong passphrase. In the following demonstration we use the openssl to generate a passphrase.

[postgres@pgserver ~]$ openssl rand -base64 48
FhXg7oW2pZb9UICZ4iYZPn3X4I6fF0ni7IL6QjaB1IL8qz4LIrP+GW+XqCZqIi3w
[postgres@pgserver ~]$

Once the passphrase generated, we can update the PgBackRest configuration file with
2 options: repo-cipher-pass and repo-cipher-type

[postgres@pgserver clustpgserver]$ cat /etc/pgbackrest.conf
[global] repo-path=/var/lib/pgbackrest
repo-cipher-pass=FhXg7oW2pZb9UICZ4iYZPn3X4I6fF0ni7IL6QjaB1IL8qz4LIrP+GW+XqCZqIi3w
repo-cipher-type=aes-256-cbc
[clustpgserver] db-path=/var/lib/pgsql/10/data
retention-full=2

The next step is to create the stanza

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 stanza-create
2018-02-13 13:54:50.447 P00 INFO: stanza-create command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --stanza=clustpgserver
2018-02-13 13:55:04.520 P00 INFO: stanza-create command end: completed successfully
[postgres@pgserver ~]$

As we can see the system automatically detect that the repository is encrypted and then will rewrite the command including the –repo-cipher-pass and the –repo-cipher-type options. After the creation of the stanza we can check the status of our stanza

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 check
2018-02-13 13:56:08.999 P00 INFO: check command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --stanza=clustpgserver
2018-02-13 13:57:08.026 P00 INFO: WAL segment 00000002000000000000004C successfully stored in the archive at '/var/lib/pgbackrest/archive/clustpgserver/10-1/0000000200000000/00000002000000000000004C-f5ced60cd351d74a91c9ce2e913b761144165e28.gz'
2018-02-13 13:57:08.030 P00 INFO: check command end: completed successfully

Everything seems fine, so let’s run a backup. Note that outputs are truncated

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 backup
2018-02-13 14:01:40.012 P00 INFO: backup command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --retention-full=2 --stanza=clustpgserver
WARN: no prior backup exists, incr backup has been changed to full
2018-02-13 14:01:54.118 P00 INFO: execute non-exclusive pg_start_backup() with label "pgBackRest backup started at 2018-02-13 14:01:52": backup begins after the next regular checkpoint completes
...
type=aes-256-cbc --repo-path=/var/lib/pgbackrest --retention-archive=2 --retention-full=2 --stanza=clustpgserver
2018-02-13 14:35:08.281 P00 INFO: full backup total < 2 - using oldest full backup for 10-1 archive retention
2018-02-13 14:35:08.801 P00 INFO: expire command end: completed successfully
[postgres@pgserver ~]$

In a non-encrypted repository, file backup.info can be read. Now with encryption the if we try to read the file backup.info in the repository, we cannot.

[postgres@pgserver clustpgserver]$ less /var/lib/pgbackrest/backup/clustpgserver/backup.info
"/var/lib/pgbackrest/backup/clustpgserver/backup.info" may be a binary file. See it anyway?

And using the command strings, we can see that the file is encrypted.

[postgres@pgserver clustpgserver]$ strings /var/lib/pgbackrest/backup/clustpgserver/backup.info
Salted__Fx
.;Ru
cz4@
do:t
\pi3"E
VUSO
}a.R*
Wx5M
,?,W
3CXWB
[postgres@pgserver clustpgserver]$

From now, backups cannot be used unless the password is provided.

Restore in another location
PgBackRest allows to restore to another location. This can be useful if we want to duplicate our cluster on the same server or to another server. In the following demonstration, let’s duplicate on the same server.
The data directory of the source cluster is /var/lib/pgsql/10/data

postgres=# show data_directory;
data_directory
------------------------
/var/lib/pgsql/10/data
(1 row)
postgres=#

To duplicate to a new data directory /u01/devdata for example, the option –db-path is used

[postgres@pgserver log]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-path=/u01/devdata restore


2018-02-14 09:40:05.755 P01 INFO: restore file /u01/devdata/base/1/13657 (0B, 100%)
2018-02-14 09:40:05.773 P01 INFO: restore file /u01/devdata/base/1/13652 (0B, 100%)
2018-02-14 09:40:05.811 P01 INFO: restore file /u01/devdata/base/1/13647 (0B, 100%)
2018-02-14 09:40:05.983 P01 INFO: restore file /u01/devdata/base/1/13642 (0B, 100%)
2018-02-14 09:40:06.067 P00 INFO: write /u01/devdata/recovery.conf
2018-02-14 09:40:14.403 P00 INFO: restore global/pg_control (performed last to ensure aborted restores cannot be started)
2018-02-14 09:40:30.187 P00 INFO: restore command end: completed successfully

After the duplicate don’t forget to change the port (as we are in the same server) and then start your new cluster

postgres=# show data_directory ;
data_directory
----------------
/u01/devdata
(1 row)
postgres=#

Restore specific databases
With PgBackRest, we can restore specific user databases. Note that built-in databases (template0, template1 and postgres) are always restored.
Let’s show an example. In our source cluster we actually have two databases test and sandbox.

sandbox=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
sandbox | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
test | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
(5 rows)

In sandbox we have a table mytab with 2 rows

sandbox=# \c sandbox
You are now connected to database "sandbox" as user "postgres".
sandbox=# table mytab;
id
----
1
2
(2 rows)

Now let’s restore the cluster but only with test database, the option –db-include will be used.

[postgres@pgserver log]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-path=/u01/devdata --db-include=test restore
2018-02-14 10:11:00.948 P00 INFO: restore command begin 1.28: --db-include=test=1 --db1-path=/u01/devdata --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --stanza=clustpgserver
2018-02-14 10:11:05.137 P00 INFO: restore backup set 20180214-095439F_20180214-100446I
2018-02-14 10:11:25.110 P00 INFO: remap $PGDATA directory to /u01/devdata
...

After the restore completed, let’s start the new cluster and let’s verify present databases.

[postgres@pgserver devdata]$ psql -p 5436
psql (10.1)
Type "help" for help.
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
sandbox | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
test | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
(5 rows)

What!! the sandbox is still present despite the use of option –include-db=test. But if we try to connect to sandbox database. We get an error.

postgres=# \c sandbox
FATAL: relation mapping file "base/24581/pg_filenode.map" contains invalid data
Previous connection kept
postgres=#

And if we compare at OS level the size of files of the database at the source cluster and at the target

[postgres@pgserver log]$ du -sh /var/lib/pgsql/10/data/base/24581
7.8M /var/lib/pgsql/10/data/base/24581
[postgres@pgserver log]$ du -sh /u01/devdata/base/24581
16K /u01/devdata/base/24581
[postgres@pgserver log]$

We can see that at the target cluster, sandbox uses less disk space during the selective restore than it would have if the entire database had been restored. To finish the selective restore, we have to manually drop the sandbox database. Indeed PgBackRest cannot automatically drop the database because the cluster is not accessible until the recovery process finishes.

postgres=# drop database sandbox;
DROP DATABASE
postgres=#

Automatic cleanup of expired backups
Another nice feature of PgBackRest is that expired backups are automatically removed.
If we check our pgbackrest.conf file, we see that the retention-full is set to 2. This means that 2 full backups will be maintained. So if we do a third full backup, the first full backup and all corresponding incremental and differential backups will be expired and removed

[postgres@pgserver log]$ cat /etc/pgbackrest.conf
[global] repo-path=/var/lib/pgbackrest
repo-cipher-pass=FhXg7oW2pZb9UICZ4iYZPn3X4I6fF0ni7IL6QjaB1IL8qz4LIrP+GW+XqCZqIi3w
repo-cipher-type=aes-256-cbc
[clustpgserver] db-path=/var/lib/pgsql/10/data
retention-full=2
[postgres@pgserver log]$

Let’s do a quick demonstration. Actually we have 2 full backups

[postgres@pgserver log]$ pgbackrest --stanza=clustpgserver info
stanza: clustpgserver
status: ok
db (current)
wal archive min/max (10-1): 00000002000000000000004E / 000000020000000000000056
full backup: 20180213-140152F
timestamp start/stop: 2018-02-13 14:01:52 / 2018-02-13 14:32:00
wal start/stop: 00000002000000000000004E / 00000002000000000000004E
database size: 577MB, backup size: 577MB
repository size: 28.8MB, repository backup size: 28.8MB
incr backup: 20180213-140152F_20180213-152509I
timestamp start/stop: 2018-02-14 09:31:03 / 2018-02-14 09:33:17
wal start/stop: 000000020000000000000052 / 000000020000000000000052
database size: 30.7MB, backup size: 285.3KB
repository size: 3.6MB, repository backup size: 24.3KB
backup reference list: 20180213-140152F
full backup: 20180214-095439F
timestamp start/stop: 2018-02-14 09:54:39 / 2018-02-14 09:58:53
wal start/stop: 000000020000000000000054 / 000000020000000000000054
database size: 30.7MB, backup size: 30.7MB
repository size: 3.6MB, repository backup size: 3.6MB
incr backup: 20180214-095439F_20180214-100446I
timestamp start/stop: 2018-02-14 10:04:46 / 2018-02-14 10:07:43
wal start/stop: 000000020000000000000056 / 000000020000000000000056
database size: 38.3MB, backup size: 7.6MB
repository size: 4.5MB, repository backup size: 928.5KB
backup reference list: 20180214-095439F
[postgres@pgserver log]$

And we can confirm by executing a simple ls in the repository where backups are stored

[postgres@pgserver clustpgserver]$ ls -ld *
drwxr-x---. 3 postgres postgres 69 Feb 13 14:32 20180213-140152F
drwxr-x---. 3 postgres postgres 69 Feb 14 09:33 20180213-140152F_20180213-152509I
drwxr-x---. 3 postgres postgres 69 Feb 14 09:59 20180214-095439F
drwxr-x---. 3 postgres postgres 69 Feb 14 10:07 20180214-095439F_20180214-100446I
drwxr-x---. 3 postgres postgres 17 Feb 13 14:33 backup.history
-rw-r-----. 1 postgres postgres 2992 Feb 14 10:08 backup.info
-rw-r-----. 1 postgres postgres 2992 Feb 14 10:08 backup.info.copy
lrwxrwxrwx. 1 postgres postgres 33 Feb 14 10:08 latest -> 20180214-095439F_20180214-100446I
[postgres@pgserver clustpgserver]$ ls -ld
drwxr-x---. 7 postgres postgres 4096 Feb 14 10:08 .
[postgres@pgserver clustpgserver]$ ls -ld *

Now let’s do a third full backup

[postgres@pgserver clustpgserver]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 --type=full backup
2018-02-14 10:55:52.250 P00 INFO: backup command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --retention-full=2 --stanza=clustpgserver --type=full

2018-02-14 11:19:02.001 P00 INFO: backup command end: completed successfully
2018-02-14 11:19:02.107 P00 INFO: expire command begin 1.28: --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --retention-archive=2 --retention-full=2 --stanza=clustpgserver
2018-02-14 11:19:02.928 P00 INFO: expire full backup set: 20180213-140152F, 20180213-140152F_20180213-152509I
2018-02-14 11:22:08.759 P00 INFO: remove expired backup 20180213-140152F_20180213-152509I
2018-02-14 11:22:09.000 P00 INFO: remove expired backup 20180213-140152F

2018-02-14 11:22:49.387 P00 INFO: expire command end: completed successfully
[postgres@pgserver clustpgserver]$

We can see that at the end of backups, some old backups are expired and removed. We can also confirm this by listing files in the repository

[postgres@pgserver clustpgserver]$ ls -ld *
drwxr-x---. 3 postgres postgres 69 Feb 14 09:59 20180214-095439F
drwxr-x---. 3 postgres postgres 69 Feb 14 10:07 20180214-095439F_20180214-100446I
drwxr-x---. 3 postgres postgres 69 Feb 14 11:13 20180214-105603F
drwxr-x---. 3 postgres postgres 17 Feb 13 14:33 backup.history
-rw-r-----. 1 postgres postgres 2320 Feb 14 11:19 backup.info
-rw-r-----. 1 postgres postgres 2320 Feb 14 11:20 backup.info.copy
lrwxrwxrwx. 1 postgres postgres 16 Feb 14 11:14 latest -> 20180214-105603F
[postgres@pgserver clustpgserver]$

Point-in-Time Recovery
PgBackRest can also do a point-in-time recovery. Let’s drop table article in the database test

test=# table article;
nom
---------
printer
(1 row)
.
test=# select now();
now
-------------------------------
2018-02-14 11:39:28.024378+01
(1 row)
.
test=# drop table article;
DROP TABLE
.
test=# table article;
ERROR: relation "article" does not exist
LINE 1: table article;
^
test=#

And now let’s restore until just before we drop the table let’s say 2018-02-14 11:39:28.
But as we have many backup sets we have to restore from a backup done before the table was dropped.
If we check our backups, we have to restore from the full backup: 20180214-105603F
which was taken before table article was dropped.

[postgres@pgserver devdata]$ pgbackrest --stanza=clustpgserver --log-level-console=info info
stanza: clustpgserver
status: ok
db (current)
wal archive min/max (10-1): 000000020000000000000054 / 00000002000000000000005A
full backup: 20180214-095439F
timestamp start/stop: 2018-02-14 09:54:39 / 2018-02-14 09:58:53
wal start/stop: 000000020000000000000054 / 000000020000000000000054
database size: 30.7MB, backup size: 30.7MB
repository size: 3.6MB, repository backup size: 3.6MB
incr backup: 20180214-095439F_20180214-100446I
timestamp start/stop: 2018-02-14 10:04:46 / 2018-02-14 10:07:43
wal start/stop: 000000020000000000000056 / 000000020000000000000056
database size: 38.3MB, backup size: 7.6MB
repository size: 4.5MB, repository backup size: 928.5KB
backup reference list: 20180214-095439F
full backup: 20180214-105603F
timestamp start/stop: 2018-02-14 10:56:03 / 2018-02-14 11:12:26

wal start/stop: 000000020000000000000058 / 000000020000000000000058
database size: 38.3MB, backup size: 38.3MB
repository size: 4.5MB, repository backup size: 4.5MB
incr backup: 20180214-105603F_20180214-121044I
timestamp start/stop: 2018-02-14 12:10:44 / 2018-02-14 12:15:14
wal start/stop: 00000002000000000000005A / 00000002000000000000005A
database size: 38.3MB, backup size: 1.1MB
repository size: 4.5MB, repository backup size: 140.8KB
backup reference list: 20180214-105603F
[postgres@pgserver devdata]$

For the restore we use the option –set which allows us to specify the backup set we want to use for the restore. Note also the use of –type=time and –target

[postgres@pgserver log]$ pgbackrest --stanza=clustpgserver --log-level-console=info --type=time "--target=2018-02-14 11:39:28.024378+01" --db-path=/u01/devdata --set=20180214-105603F restore
2018-02-14 13:36:50.848 P00 INFO: restore command begin 1.28: --db1-path=/u01/devdata --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --set=20180214-105603F --stanza=clustpgserver "--target=2018-02-14 11:39:28.024378+01" --type=time
2018-02-14 13:37:03.406 P00 INFO: restore backup set 20180214-105603F
...

At the end of the restore let’s see the contents of the recovery.done file

[postgres@pgserver devdata]$ cat recovery.conf
restore_command = '/usr/bin/pgbackrest --db1-path=/u01/devdata --log-level-console=info --stanza=clustpgserver archive-get %f "%p"'
recovery_target_time = '2018-02-14 11:39

If we start our new cluster, we can see in log files that PITR is starting
2018-02-14 13:54:23.824 CET [10049] LOG: starting point-in-time recovery to 2018-02-14 11:39:28.024378+01
And once the recovery finished, we can verify that the table article is present

postgres=# \c test
You are now connected to database "test" as user "postgres".
test=# \d article
Table "public.article"
Column | Type | Collation | Nullable | Default
--------+-----------------------+-----------+----------+---------
nom | character varying(50) | | |
.
test=# table article;
nom
---------
printer
(1 row)
test=#

Conclusion:
In this blog I talked about some features about PgBackRest. But as already specified, there are many, many other options with this wonderful tool. The official documentation can give more information. In future blogs we will explore more with advanced configuration.

Leave a Reply

Mouhamadou Diaw
Mouhamadou Diaw

Consultant