Infrastructure at your Service

Daniel Westermann

PostgreSQL 13: Backup validation and backup manifests

Currently a lot of stuff is being committed for PostgreSQL and what we will look at in this post is a feature, I am sure, a lot of PostgreSQL users have been waiting for for a long time: Finally there is a native way to validate your base backups: pg_validatebackup. This is a new binary that can be used to validate base backups against a backup manifest, that is written automatically when you do backup using pg_basebackup. Lets see how that works.

When you do a base backup without any specific flags in PostgreSQL 13 there will be a new file in the directory that holds the backup:

[email protected]:/home/postgres/ [pgdev] mkdir /var/tmp/backup
[email protected]:/home/postgres/ [pgdev] pg_basebackup -D /var/tmp/backup/
[email protected]:/home/postgres/ [pgdev] ls /var/tmp/backup/backup_manifest 
/var/tmp/backup/backup_manifest

This is the so called backup manifest and when you have a look at it, you’ll notice that it is a simple json file:

[email protected]:/home/postgres/ [pgdev] head -n 10 /var/tmp/backup/backup_manifest
{ "PostgreSQL-Backup-Manifest-Version": 1,
"Files": [
{ "Path": "backup_label", "Size": 225, "Last-Modified": "2020-04-03 19:51:48 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "3cbc1336" },
{ "Path": "global/1262", "Size": 8192, "Last-Modified": "2020-04-03 18:52:13 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "f98856b1" },
{ "Path": "global/2964", "Size": 0, "Last-Modified": "2020-04-03 18:52:12 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "00000000" },
{ "Path": "global/1213", "Size": 8192, "Last-Modified": "2020-04-03 18:52:12 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "860d02d5" },
{ "Path": "global/1260", "Size": 8192, "Last-Modified": "2020-04-03 18:52:12 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "3b8ad06a" },
{ "Path": "global/1261", "Size": 8192, "Last-Modified": "2020-04-03 18:52:12 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "968c06d9" },
{ "Path": "global/1214", "Size": 8192, "Last-Modified": "2020-04-03 18:52:12 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "2f187a01" },
{ "Path": "global/2396", "Size": 8192, "Last-Modified": "2020-04-03 18:52:13 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "d3229ead" },

It contains a list of all the files in the backup with the size, the last modified timestamp and a check sum that was generated with CRC32C. This is the default but pg_basebackup comes with new options you can use to change the algorithm that is used to created the checksums:

[email protected]:/home/postgres/ [pgdev] pg_basebackup --help 
pg_basebackup takes a base backup of a running PostgreSQL server.

Usage:
  pg_basebackup [OPTION]...

Options controlling the output:
  -D, --pgdata=DIRECTORY receive base backup into directory
  -F, --format=p|t       output format (plain (default), tar)
  -r, --max-rate=RATE    maximum transfer rate to transfer data directory
                         (in kB/s, or use suffix "k" or "M")
  -R, --write-recovery-conf
                         write configuration for replication
  -T, --tablespace-mapping=OLDDIR=NEWDIR
                         relocate tablespace in OLDDIR to NEWDIR
      --waldir=WALDIR    location for the write-ahead log directory
  -X, --wal-method=none|fetch|stream
                         include required WAL files with specified method
  -z, --gzip             compress tar output
  -Z, --compress=0-9     compress tar output with given compression level

General options:
  -c, --checkpoint=fast|spread
                         set fast or spread checkpointing
  -C, --create-slot      create replication slot
  -l, --label=LABEL      set backup label
  -n, --no-clean         do not clean up after errors
  -N, --no-sync          do not wait for changes to be written safely to disk
  -P, --progress         show progress information
  -S, --slot=SLOTNAME    replication slot to use
  -v, --verbose          output verbose messages
  -V, --version          output version information, then exit
      --no-slot          prevent creation of temporary replication slot
      --no-verify-checksums
                         do not verify checksums
      --no-estimate-size do not estimate backup size in server side
      --no-manifest      suppress generation of backup manifest
      --manifest-force-encode
                         hex encode all filenames in manifest
      --manifest-checksums=SHA{224,256,384,512}|CRC32C|NONE
                         use algorithm for manifest checksums
  -?, --help             show this help, then exit

You can also go back to the previous behavior and disable the generation of the backup manifest altogether. Once the backup is there and the manifest is generated you can use pg_validatebackup to check integrity of what was written by pg_basebackup:

[email protected]:/home/postgres/ [pgdev] pg_validatebackup --help
pg_validatebackup validates a backup against the backup manifest.

Usage:
  pg_validatebackup [OPTION]... BACKUPDIR

Options:
  -e, --exit-on-error         exit immediately on error
  -i, --ignore=RELATIVE_PATH  ignore indicated path
  -m, --manifest=PATH         use specified path for manifest
  -n, --no-parse-wal          do not try to parse WAL files
  -s, --skip-checksums        skip checksum verification
  -w, --wal-directory=PATH    use specified path for WAL files
  -V, --version               output version information, then exit
  -?, --help                  show this help, then exit

In the most simple form this is just:

[email protected]:/home/postgres/ [pgdev] pg_validatebackup /var/tmp/backup/
backup successfully verified

That is a really cool feature.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure