Infrastructure at your Service

Daniel Westermann

Using pgBackRest to backup your PostgreSQL instances to a s3 compatible storage

When it comes to backup and restore of PostgreSQL we usually recommend pgBackRest to our customers. This tool comes with many options and features and should bring everything you need. Most of the times we install a dedicated backup host for pgBackRest, which centralizes all backups of the PostgreSQL instances. Sometimes we have several of those, e.g. one for production instances, another for test and development or even a dedicated one for development. How you organize this is mostly driven by business requirements. Most of the times, these dedicated backups hosts get backed up by an enterprise backup solution for long term archiving. Another option you have is, to write the backups directly to a S3 compatible storage solution and this is the topic of this post.

We could just use an AWS S3 bucket for the scope of this post, but for this you’ll need access to AWS. Being able to re-play the demos of this blog without access to external sources has always been a priority for me, so we will be using a solution that is freely available for testing. While searching for a free AWS S3 compatible storage solution I came across a MinIO. MinIO is quite easy to setup on a single node for testing. All you need to do is this:

[email protected]:/home/postgres/ [pg140] mkdir /u02/minio/
[email protected]:/home/postgres/ [pg140] wget https://dl.min.io/server/minio/release/linux-amd64/minio
[email protected]:/home/postgres/ [pg140] chmod +x mini
[email protected]:/home/postgres/ [pg140] ./minio server /u02/minio/

If you want to start it automatically when the machine is starting up, you can use this systemd service file as a template:

[email protected]:/home/postgres/ [pg140] cat /etc/systemd/system/minio.service 
[Unit]
Description=MinIO
Documentation=https://docs.min.io
Wants=network-online.target
After=network-online.target

[Service]
WorkingDirectory=/usr/local

User=postgres
Group=postgres
ProtectProc=invisible

ExecStart=/home/postgres/minio server /u02/minio/ --console-address :8888
# User: minioadmin
# Pwd: minioadmin

# Let systemd restart this service always
Restart=always

# Specifies the maximum file descriptor number that can be opened by this process
LimitNOFILE=65536

# Specifies the maximum number of threads this process can create
TasksMax=infinity

# Disable timeout logic and wait until process is stopped
TimeoutStopSec=infinity
SendSIGKILL=no

[Install]
WantedBy=multi-user.target

Once MinIO is running, you can point your browser to http://[HOST]:8888

Obviously we’ll need a new bucket we can write our backups to, so let’s create a new bucket:


Here you will see the limitations of the single node test setup: Some features are not available and you of course should not use this kind of setup in production:

For accessing a bucket you need an access key and a secret, so we need to create those too:


Before you continue with configuring pgBackRest it is a good idea to test access to the bucket with a command line utility. The AWS Cli is freely available, easy to install and can be used for that:

[email protected]:/home/postgres/ [pg140] curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
[email protected]:/home/postgres/ [pg140] unzip awscliv2.zip 
[email protected]:/home/postgres/ [pg140] sudo ./aws/install
[email protected]:/home/postgres/ [pg140] rm -rf awscliv2.zip aws/
[email protected]:/home/postgres/ [pg140] /usr/local/bin/aws --version
aws-cli/2.3.6 Python/3.8.8 Linux/5.10.0-9-amd64 exe/x86_64.debian.11 prompt/off

The AWS command line utilities by default read the credentials form “~/.aws/credentials”, this is where we need to add the access key and the secret we’ve created above:

[email protected]:/home/postgres/ [pg140] aws configure
AWS Access Key ID [****************7890]: 1234567890
AWS Secret Access Key [****************4321]: 0987654321
Default region name [eu-central-1]: eu-central-1
Default output format : text

Having that in place let’s test if we can see the bucket:

[email protected]:/home/postgres/ [pg140] aws configure set default.s3.signature_version s3v4
[email protected]:/home/postgres/ [pg140] aws --endpoint-url http://localhost:9000 s3 ls
2021-11-16 10:25:50 postgresql-backups

Looks good. Now we need to tell pgBackRest about the repository. My current configuration on the backup host looks like this:

[email protected]:/home/postgres/ [pg140] cat /u01/app/postgres/local/dmk/etc/pgbackrest.conf 
[global]
repo1-path=/u02/backups
repo1-retention-full=2
repo1-retention-diff=1
log-path=/u01/app/postgres/local/dmk/log
log-level-file=detail
spool-path=/tmp
start-fast=y
archive-async=y
expire-auto=y
compress-type=bz2
process-max=4

[PG14]
pg1-path=/u02/pgdata/14/PG1/
pg1-host=192.168.100.170
pg1-user=postgres
pg1-port=5432
pg2-path=/u02/pgdata/14/PG1/
pg2-host=192.168.100.171
pg2-user=postgres
pg2-port=5432
pg3-path=/u02/pgdata/14/PG1/
pg3-host=192.168.100.172
pg3-user=postgres
pg3-port=5432

This means I have one repository pointing to a local directory. The stanza below the global configuration is a three node Patroni cluster. As pgBackRest supports multiple repositories we can just add our new repository like this:

[email protected]:/home/postgres/ [pg140] cat /u01/app/postgres/local/dmk/etc/pgbackrest.conf
[global]
repo1-path=/u02/backups
repo1-retention-full=2
repo1-retention-diff=1
repo2-s3-bucket=postgresql-backups
repo2-s3-endpoint=192.168.100.173
repo2-s3-key=1234567890
repo2-s3-key-secret=0987654321
repo2-s3-region=us-west-rack1
repo2-storage-port=9000
repo2-storage-verify-tls=n
repo2-s3-uri-style=path
repo2-type=s3
log-path=/u01/app/postgres/local/dmk/log
log-level-file=detail
spool-path=/tmp
start-fast=y
archive-async=y
expire-auto=y
compress-type=bz2
process-max=4
...

Let’s try to create the stanza:

[email protected]:/home/postgres/ [pg140] pgbackrest --stanza=PG14 stanza-create --log-level-console=info 
2021-11-16 12:09:07.682 P00   INFO: stanza-create command begin 2.36: --exec-id=2255-0ef6d4c5 --log-level-console=info --log-level-file=detail --log-path=/u01/app/postgres/local/dmk/log --pg1-host=192.168.100.170 --pg2-host=192.168.100.171 --pg3-host=192.168.100.172 --pg1-path=/u02/pgdata/14/PG1/ --pg2-path=/u02/pgdata/14/PG1/ --pg3-path=/u02/pgdata/14/PG1/ --pg1-port=5432 --pg2-port=5432 --pg3-port=5432 --pg1-user=postgres --pg2-user=postgres --pg3-user=postgres --repo1-path=/u02/backups --repo2-s3-bucket=postgresql-backups --repo2-s3-endpoint=127.0.0.1 --repo2-s3-key= --repo2-s3-key-secret= --repo2-s3-region=eu-central-1 --repo2-s3-uri-style=path --repo2-storage-port=9000 --no-repo2-storage-verify-tls --repo2-type=s3 --stanza=PG14
2021-11-16 12:09:09.972 P00   INFO: stanza-create for stanza 'PG14' on repo1
2021-11-16 12:09:09.977 P00   INFO: stanza 'PG14' already exists on repo1 and is valid
2021-11-16 12:09:09.977 P00   INFO: stanza-create for stanza 'PG14' on repo2
ERROR: [101]: TLS error [1:336130315] wrong version number
2021-11-16 12:11:10.430 P00   INFO: stanza-create command end: aborted with exception [101]

This means we need an encrypted connection, otherwise it will not work. MinIO provides a handy utility to create a self signed certificate:

[email protected]:/home/postgres/ [pg140] wget https://github.com/minio/certgen/releases/download/v0.0.2/certgen-linux-amd64
[email protected]:/home/postgres/ [pg140] chmod +x certgen-linux-amd64 
[email protected]:/home/postgres/ [pg140] ./certgen-linux-amd64 -ca -host "192.168.100.173"   # replace with your IP
2021/11/16 14:25:33 wrote public.crt
2021/11/16 14:25:33 wrote private.key
[email protected]:/home/postgres/ [pg140] mv private.key public.crt .minio/certs/
[email protected]:/home/postgres/ [pg140] sudo systemctl restart minio

Once MinIO is restarted we can use https to access MinIO and creating the stanza works smoothly:

[email protected]:/home/postgres/ [pg140] pgbackrest --stanza=PG14 stanza-create --log-level-console=info 
2021-11-16 14:40:17.086 P00   INFO: stanza-create command begin 2.36: --exec-id=2491-22a7ebd4 --log-level-console=info --log-level-file=detail --log-path=/u01/app/postgres/local/dmk/log --pg1-host=192.168.100.170 --pg2-host=192.168.100.171 --pg3-host=192.168.100.172 --pg1-path=/u02/pgdata/14/PG1/ --pg2-path=/u02/pgdata/14/PG1/ --pg3-path=/u02/pgdata/14/PG1/ --pg1-port=5432 --pg2-port=5432 --pg3-port=5432 --pg1-user=postgres --pg2-user=postgres --pg3-user=postgres --repo1-path=/u02/backups --repo2-s3-bucket=postgresql-backups --repo2-s3-endpoint=192.168.100.173 --repo2-s3-key= --repo2-s3-key-secret= --repo2-s3-region=us-west-rack1 --repo2-s3-uri-style=path --repo2-storage-port=9000 --no-repo2-storage-verify-tls --repo2-type=s3 --stanza=PG14
2021-11-16 14:40:19.613 P00   INFO: stanza-create for stanza 'PG14' on repo1
2021-11-16 14:40:19.618 P00   INFO: stanza 'PG14' already exists on repo1 and is valid
2021-11-16 14:40:19.618 P00   INFO: stanza-create for stanza 'PG14' on repo2
2021-11-16 14:40:19.982 P00   INFO: stanza-create command end: completed successfully (2901ms)

If you take a look into the bucket you’ll see that the usual directories for pgBackRest have been created:

Before doing any backups lets perform a pgBackRest check to confirm everything is fine:

[email protected]:/u01/app/postgres/local/dmk/ [pg140] pgbackrest --stanza=PG14 check --log-level-console=info 
2021-11-16 16:59:34.501 P00   INFO: check command begin 2.36: --exec-id=2917-47e567f2 --log-level-console=info --log-level-file=detail --log-path=/u01/app/postgres/local/dmk/log --pg1-host=192.168.100.170 --pg2-host=192.168.100.171 --pg3-host=192.168.100.172 --pg1-path=/u02/pgdata/14/PG1/ --pg2-path=/u02/pgdata/14/PG1/ --pg3-path=/u02/pgdata/14/PG1/ --pg1-port=5432 --pg2-port=5432 --pg3-port=5432 --pg1-user=postgres --pg2-user=postgres --pg3-user=postgres --repo1-path=/u02/backups --repo2-s3-bucket=postgresql-backups --repo2-s3-endpoint=192.168.100.173 --repo2-s3-key= --repo2-s3-key-secret= --repo2-s3-region=us-west-rack1 --repo2-s3-uri-style=path --repo2-storage-port=9000 --no-repo2-storage-verify-tls --repo2-type=s3 --stanza=PG14
2021-11-16 16:59:36.842 P00   INFO: check repo1 (standby)
2021-11-16 16:59:36.848 P00   INFO: check repo2 (standby)
2021-11-16 16:59:36.866 P00   INFO: switch wal not performed because this is a standby
2021-11-16 16:59:36.870 P00   INFO: check repo1 configuration (primary)
2021-11-16 16:59:36.870 P00   INFO: check repo2 configuration (primary)
2021-11-16 16:59:37.089 P00   INFO: check repo1 archive for WAL (primary)
2021-11-16 16:59:38.292 P00   INFO: WAL segment 00000002000000000000000F successfully archived to '/u02/backups/archive/PG14/14-1/0000000200000000/00000002000000000000000F-a63ad380ed332ebebf528c0726694cca5c2214ab.gz' on repo1
2021-11-16 16:59:38.292 P00   INFO: check repo2 archive for WAL (primary)
2021-11-16 16:59:38.295 P00   INFO: WAL segment 00000002000000000000000F successfully archived to '/var/lib/pgbackrest/archive/PG14/14-1/0000000200000000/00000002000000000000000F-a63ad380ed332ebebf528c0726694cca5c2214ab.gz' on repo2
2021-11-16 16:59:38.599 P00   INFO: check command end: completed successfully (4099ms)

Looks good, now we can do a backup:

[email protected]:/u01/app/postgres/local/dmk/ [pg140] pgbackrest --stanza=PG14 backup --log-level-console=info --repo=2
2021-11-16 20:45:57.845 P00   INFO: backup command begin 2.36: --compress-type=bz2 --exec-id=3004-6b8fb3c3 --expire-auto --log-level-console=info --log-level-file=detail --log-path=/u01/app/postgres/local/dmk/log --pg1-host=192.168.100.170 --pg2-host=192.168.100.171 --pg3-host=192.168.100.172 --pg1-path=/u02/pgdata/14/PG1/ --pg2-path=/u02/pgdata/14/PG1/ --pg3-path=/u02/pgdata/14/PG1/ --pg1-port=5432 --pg2-port=5432 --pg3-port=5432 --pg1-user=postgres --pg2-user=postgres --pg3-user=postgres --process-max=4 --repo=2 --repo1-path=/u02/backups --repo1-retention-diff=1 --repo2-retention-diff=1 --repo1-retention-full=2 --repo2-retention-full=2 --repo2-s3-bucket=postgresql-backups --repo2-s3-endpoint=192.168.100.173 --repo2-s3-key= --repo2-s3-key-secret= --repo2-s3-region=us-west-rack1 --repo2-s3-uri-style=path --repo2-storage-port=9000 --no-repo2-storage-verify-tls --repo2-type=s3 --stanza=PG14 --start-fast
WARN: no prior backup exists, incr backup has been changed to full
2021-11-16 20:46:00.941 P00   INFO: execute non-exclusive pg_start_backup(): backup begins after the requested immediate checkpoint completes
2021-11-16 20:46:01.990 P00   INFO: backup start archive = 000000020000000000000011, lsn = 0/11000028
WARN: resumable backup 20211116-144452F of same type exists -- remove invalid files and resume
2021-11-16 20:46:08.369 P00   INFO: execute non-exclusive pg_stop_backup() and wait for all WAL segments to archive
2021-11-16 20:46:08.913 P00   INFO: backup stop archive = 000000020000000000000011, lsn = 0/11000138
2021-11-16 20:46:08.943 P00   INFO: check archive for segment(s) 000000020000000000000011:000000020000000000000011
2021-11-16 20:46:09.873 P00   INFO: new backup label = 20211116-144452F
2021-11-16 20:46:10.090 P00   INFO: full backup size = 25.3MB, file total = 957
2021-11-16 20:46:10.090 P00   INFO: backup command end: completed successfully (12252ms)
2021-11-16 20:46:10.091 P00   INFO: expire command begin 2.36: --exec-id=3004-6b8fb3c3 --log-level-console=info --log-level-file=detail --log-path=/u01/app/postgres/local/dmk/log --repo=2 --repo1-path=/u02/backups --repo1-retention-diff=1 --repo2-retention-diff=1 --repo1-retention-full=2 --repo2-retention-full=2 --repo2-s3-bucket=postgresql-backups --repo2-s3-endpoint=192.168.100.173 --repo2-s3-key= --repo2-s3-key-secret= --repo2-s3-region=us-west-rack1 --repo2-s3-uri-style=path --repo2-storage-port=9000 --no-repo2-storage-verify-tls --repo2-type=s3 --stanza=PG14
2021-11-16 20:46:10.371 P00   INFO: expire command end: completed successfully (280ms)

All done and we have our backups in a S3 compatible storage.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure