As you might know PostgreSQL 9.1 is out of support since last September. But, no surprise, there are still plenty of PostgreSQL 9.1 instances out there which need to be upgraded to a supported release. At one of our customers we exactly have this situation: The production environment is running on PostgreSQL 9.1.8 and needs to be upgraded to a recent release (9.5.5 in this case). To make life a little bit more fun there are more constraints: This is a PostgreSQL streaming replication configuration, so there is a standby database involved. Allowed downtime is 30 minutes and there is space pressure on the current systems and the current systems need to be reused. In this post we’ll look at how you can do the upgrade with minimal downtime (without using logical replication).
First, lets build the test environment. We need two systems, one for the master instance (192.168.22.32) and one for the standby (192.168.22.32). On both of these system we’ll need PostgreSQL 9.1.8 installed, so:
postgres@debian6pg:~$ wget https://ftp.postgresql.org/pub/source/v9.1.8/postgresql-9.1.8.tar.bz2 postgres@debian6pg:~$ tar -axf postgresql-9.1.8.tar.bz2 postgres@debian6pg:~$ cd postgresql-9.1.8/ postgres@debian6pg:~/postgresql-9.1.8$ PGHOME=/u01/app/postgres/product/91/db_8 postgres@debian6pg:~/postgresql-9.1.8$ SEGSIZE=2 postgres@debian6pg:~/postgresql-9.1.8$ BLOCKSIZE=8 postgres@debian6pg:~/postgresql-9.1.8$ WALSEGSIZE=16 postgres@debian6pg:~/postgresql-9.1.8$ ./configure --prefix=${PGHOME} --exec-prefix=${PGHOME} --bindir=${PGHOME}/bin --libdir=${PGHOME}/lib --sysconfdir=${PGHOME}/etc --includedir=${PGHOME}/include --datarootdir=${PGHOME}/share --datadir=${PGHOME}/share --with-pgport=5432 --with-perl --with-python --with-tcl --with-openssl --with-pam --with-ldap --with-libxml --with-libxslt --with-segsize=${SEGSIZE} --with-blocksize=${BLOCKSIZE} --with-wal-segsize=${WALSEGSIZE} postgres@debian6pg:~/postgresql-9.1.8$ make world postgres@debian6pg:~/postgresql-9.1.8$ make install postgres@debian6pg:~/postgresql-9.1.8$ cd contrib postgres@debian6pg:~/postgresql-9.1.8/contrib$ make install postgres@debian6pg:~/postgresql-9.1.8/contrib$ cd ../.. postgres@debian6pg:~$ rm -rf postgresql-9.1.8*
Once this is available on both nodes we can initialize our master instance:
postgres@debian6pg:~$ /u01/app/postgres/product/91/db_8/bin/initdb -D /u02/pgdata/testmig -X /u03/pgdata/testmig The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locale en_US.UTF-8. The default database encoding has accordingly been set to UTF8. The default text search configuration will be set to "english". creating directory /u02/pgdata/testmig ... ok creating directory /u03/pgdata/testmig ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 24MB creating configuration files ... ok creating template1 database in /u02/pgdata/testmig/base/1 ... ok initializing pg_authid ... ok initializing dependencies ... ok creating system views ... ok loading system objects' descriptions ... ok creating collations ... ok creating conversions ... ok creating dictionaries ... ok setting privileges on built-in objects ... ok creating information schema ... ok loading PL/pgSQL server-side language ... ok vacuuming database template1 ... ok copying template1 to template0 ... ok copying template1 to postgres ... ok WARNING: enabling "trust" authentication for local connections You can change this by editing pg_hba.conf or using the -A option the next time you run initdb. Success. You can now start the database server using: /u01/app/postgres/product/91/db_8/bin/postgres -D /u02/pgdata/testmig or /u01/app/postgres/product/91/db_8/bin/pg_ctl -D /u02/pgdata/testmig -l logfile start
(I am assuming that password less ssh authentication is already setup between the nodes for the following). Setup authentication:
postgres@debian6pg:/u03$ echo "host replication postgres 192.168.22.32/32 trust" >> /u02/pgdata/testmig/pg_hba.conf postgres@debian6pg:/u03$ echo "host replication postgres 192.168.22.33/32 trust" >> /u02/pgdata/testmig/pg_hba.conf
Adjust the parameters:
postgres@debian6pg:/u03$ sed -i 's/#wal_level = minimal/wal_level = hot_standby/g' /u02/pgdata/testmig/postgresql.conf postgres@debian6pg:/u03$ sed -i 's/#max_wal_senders = 0/max_wal_senders = 10/g' /u02/pgdata/testmig/postgresql.conf postgres@debian6pg:/u03$ sed -i 's/#wal_keep_segments = 0/wal_keep_segments = 100/g' /u02/pgdata/testmig/postgresql.conf postgres@debian6pg:/u03$ sed -i "s/#listen_addresses = 'localhost'/listen_addresses = '*'/g" /u02/pgdata/testmig/postgresql.conf postgres@debian6pg:/u03$ sed -i 's/#logging_collector = off/#logging_collector = on/g' /u02/pgdata/testmig/postgresql.conf postgres@debian6pg:/u03$ mkdir /u02/pgdata/testmig/pg_log
Start and stop the instance:
postgres@debian6pg:/u03$ export PATH=/u01/app/postgres/product/91/db_8/bin/:$PATH postgres@debian6pg:/u03$ pg_ctl -D /u02/pgdata/testmig/ start -l /u02/pgdata/testmig/pg_log/log.log postgres@debian6pg:/u03$ pg_ctl -D /u02/pgdata/testmig/ stop
Ready to setup the standby:
postgres@debian6pg:/u03$ cd /u02 postgres@debian6pg:/u02$ rsync -r pgdata/ 192.168.22.33:/u02/pgdata postgres@debian6pg:~$ cd /u03 postgres@debian6pg:/u03$ rsync -r pgdata/ 192.168.22.33:/u03/pgdata postgres@debian6pg:/u03$ ssh 192.168.22.33 "ln -s /u03/pgdata/testmig/ /u02/pgdata/testmig/pg_xlog" postgres@debian6pg:/u03$ scp /u02/pgdata/testmig/pg_hba.conf 192.168.22.33:/u02/pgdata/testmig/pg_hba.conf postgres@debian6pg:/u03$ ssh 192.168.22.33 "echo "standby_mode = on" >> /u02/pgdata/testmig/recovery.conf" postgres@debian6pg:/u03$ ssh 192.168.22.33 "echo "primary_conninfo = 'host=192.168.22.32 port=5432 user=postgres'" >> /u02/pgdata/testmig/recovery.conf" postgres@debian6pg:/u03$ ssh 192.168.22.33 "echo "trigger_file = '/u02/pgdata/testmig/up_slave'" >> /u02/pgdata/testmig/recovery.conf" postgres@debian6pg:/u03$ ssh 192.168.22.33 "sed -i 's/#hot_standby = off/hot_standby = on/g' /u02/pgdata/testmig/postgresql.conf" postgres@debian6pg:/u03$ ssh 192.168.22.33 "sed -i 's/#logging_collector = off/#logging_collector = on/g' /u02/pgdata/testmig/postgresql.conf" postgres@debian6pg:/u03$ ssh 192.168.22.33 "mkdir -p /u02/pgdata/testmig/pg_log"
Start the master:
postgres@debian6pg:/u03$ pg_ctl -D /u02/pgdata/testmig/ start -l /u02/pgdata/testmig/pg_log/log.log
Start the standby:
# standby side postgres@debian6pg:/u03$ export PATH=/u01/app/postgres/product/91/db_8/bin/:$PATH postgres@debian6pg:/u03$ pg_ctl -D /u02/pgdata/testmig/ start -l /u02/pgdata/testmig/pg_log/log.log
… and if everything went fine we should see this in the log of the standby instance:
LOG: database system was shut down at 2017-01-18 07:28:02 CET
LOG: entering standby mode
LOG: consistent recovery state reached at 0/16BCBB0
LOG: database system is ready to accept read only connections
LOG: record with zero length at 0/16BCBB0
LOG: streaming replication successfully connected to primary
A quick check on the standby to confirm that it is operating in recovery mode:
postgres@debian6pg:~$ psql psql (9.1.8) Type "help" for help. postgres=# select pg_is_in_recovery(); pg_is_in_recovery ------------------- t (1 row)
In the case we had at the customer there was the adminpack extension installed in the postgres database and the pg_trgm and pg_buffercache extension in the application database, so lets do the same here on the master (this will get replicated to the standby automatically):
postgres@debian6pg:/u03$ psql psql (9.1.8) Type "help" for help. postgres=# create extension adminpack; CREATE EXTENSION postgres=# create database testmig; CREATE DATABASE postgres=# c testmig You are now connected to database "testmig" as user "postgres". testmig=# create extension pg_trgm; CREATE EXTENSION testmig=# create extension pg_buffercache; CREATE EXTENSION testmig=#
Quickly confirm that it is there on the standby:
postgres=# dx List of installed extensions Name | Version | Schema | Description -----------+---------+------------+----------------------------------------- adminpack | 1.0 | pg_catalog | administrative functions for PostgreSQL plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language (2 rows) postgres=# c testmig You are now connected to database "testmig" as user "postgres". testmig=# dx List of installed extensions Name | Version | Schema | Description ----------------+---------+------------+------------------------------------------------------------------- pg_buffercache | 1.0 | public | examine the shared buffer cache pg_trgm | 1.0 | public | text similarity measurement and index searching based on trigrams plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language (3 rows)
Finally, some sample data generated with pgbench:
postgres@debian6pg:/u03$ pgbench -i testmig -s 10
Should be there on the standby as well:
testmig=# select count(*) from pgbench_accounts; count -------- 1000000 (1 row) testmig=#
This is, more or less, the situation to start from. How can we upgrade this to PostgreSQL 9.5.5 with minimal downtime and without using logical replication? Obviously we’ll need to get PostgreSQL 9.5.5 installed on both systems before we can do anything further, so:
postgres@debian6pg:~$ wget https://ftp.postgresql.org/pub/source/v9.5.5/postgresql-9.5.5.tar.bz2 postgres@debian6pg:~$ tar -axf postgresql-9.5.5.tar.bz2 postgres@debian6pg:~$ cd postgresql-9.5.5/ postgres@debian6pg:~/postgresql-9.5.5$ PGHOME=/u01/app/postgres/product/95/db_5 postgres@debian6pg:~/postgresql-9.5.5$ PGHOME=/u01/app/postgres/product/95/db_5 postgres@debian6pg:~/postgresql-9.5.5$ SEGSIZE=2 postgres@debian6pg:~/postgresql-9.5.5$ BLOCKSIZE=8 postgres@debian6pg:~/postgresql-9.5.5$ WALSEGSIZE=16 postgres@debian6pg:~/postgresql-9.5.5$ ./configure --prefix=${PGHOME} --exec-prefix=${PGHOME} --bindir=${PGHOME}/bin --libdir=${PGHOME}/lib --sysconfdir=${PGHOME}/etc --includedir=${PGHOME}/include --datarootdir=${PGHOME}/share --datadir=${PGHOME}/share --with-pgport=5432 --with-perl --with-python --with-tcl --with-openssl --with-pam --with-ldap --with-libxml --with-libxslt --with-segsize=${SEGSIZE} --with-blocksize=${BLOCKSIZE} --with-wal-segsize=${WALSEGSIZE} postgres@debian6pg:~/postgresql-9.5.5$ make world postgres@debian6pg:~/postgresql-9.5.5$ make install postgres@debian6pg:~/postgresql-9.5.5$ cd contrib postgres@debian6pg:~/postgresql-9.5.5/contrib$ make install postgres@debian6pg:~/postgresql-9.5.5/contrib$ cd ../.. postgres@debian6pg:~$ rm -rf postgresql-9.5.5*
Then we need a new cluster initialized with the new version of PostgreSQL on the master:
postgres@debian6pg:~$ /u01/app/postgres/product/95/db_5/bin/initdb -D /u02/pgdata/testmig95/ -X /u03/pgdata/testmig95/ The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locale "en_US.UTF-8". The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "english". Data page checksums are disabled. creating directory /u02/pgdata/testmig95 ... ok creating directory /u03/pgdata/testmig95 ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting dynamic shared memory implementation ... posix creating configuration files ... ok creating template1 database in /u02/pgdata/testmig95/base/1 ... ok initializing pg_authid ... ok initializing dependencies ... ok creating system views ... ok loading system objects' descriptions ... ok creating collations ... ok creating conversions ... ok creating dictionaries ... ok setting privileges on built-in objects ... ok creating information schema ... ok loading PL/pgSQL server-side language ... ok vacuuming database template1 ... ok copying template1 to template0 ... ok copying template1 to postgres ... ok syncing data to disk ... ok WARNING: enabling "trust" authentication for local connections You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb. Success. You can now start the database server using: /u01/app/postgres/product/95/db_5/bin/pg_ctl -D /u02/pgdata/testmig95/ -l logfile start
Shutdown the master and record the latest checkpoint location (This is where your downtime starts):
postgres@debian6pg:/u02$ pg_ctl -D /u02/pgdata/testmig stop -m fast LOG: received fast shutdown request LOG: aborting any active transactions LOG: autovacuum launcher shutting down LOG: shutting down waiting for server to shut down....LOG: database system is shut down done server stopped postgres@debian6pg:/u02/pgdata/testmig$ pg_controldata | grep "Latest checkpoint location" Latest checkpoint location: 0/C619840
Shutdown the slave and compare the last checkpoint:
# slave side postgres@debian6pg:/u02/pgdata/testmig$ pg_ctl -D /u02/pgdata/testmig/ stop -m fast waiting for server to shut down.... done server stopped postgres@debian6pg:/u02/pgdata/testmig$ pg_controldata | grep "Latest checkpoint location" Latest checkpoint location: 0/C619840
As both checkpoint locations match we are sure that the standby applied all changes and there is not difference in data.
Save your configuration files:
postgres@debian6pg:/u02$ cp /u02/pgdata/testmig/postgresql.conf /var/tmp postgres@debian6pg:/u02$ cp /u02/pgdata/testmig/pg_hba.conf /var/tmp postgres@debian6pg:/u02$ cp /u02/pgdata/testmig/postgresql.conf /var/tmp postgres@debian6pg:/u02$ cp /u02/pgdata/testmig/pg_hba.conf /var/tmp
Run pg_upgrade on the master with link (-k) mode (if you have many cores on your box you can use the “-j” option to parallelize pg_upgrade):
postgres@debian6pg:/u02$ export PGDATAOLD=/u02/pgdata/testmig/ postgres@debian6pg:/u02$ export PGDATANEW=/u02/pgdata/testmig95/ postgres@debian6pg:/u02$ export PGBINOLD=/u01/app/postgres/product/91/db_8/bin/ postgres@debian6pg:/u02$ export PGBINNEW=/u01/app/postgres/product/95/db_5/bin/ postgres@debian6pg:/u02$ /u01/app/postgres/product/95/db_5/bin/pg_upgrade -k
(Usually you’d do a “-c” check run before doing the real upgrade). When using link mode the files get hard-linked instead of copied which is much faster and saves disk space. The downside is that you can not revert to the old cluster in case anything goes wrong. When it goes fine, it looks like this:
Performing Consistency Checks
-----------------------------
Checking cluster versions ok
Checking database user is the install user ok
Checking database connection settings ok
Checking for prepared transactions ok
Checking for reg* system OID user data types ok
Checking for contrib/isn with bigint-passing mismatch ok
Checking for invalid "line" user columns ok
Creating dump of global objects ok
Creating dump of database schemas
ok
Checking for presence of required libraries ok
Checking database user is the install user ok
Checking for prepared transactions ok
If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.
Performing Upgrade
------------------
Analyzing all rows in the new cluster ok
Freezing all rows on the new cluster ok
Deleting files from new pg_clog ok
Copying old pg_clog to new server ok
Setting next transaction ID and epoch for new cluster ok
Deleting files from new pg_multixact/offsets ok
Setting oldest multixact ID on new cluster ok
Resetting WAL archives ok
Setting frozenxid and minmxid counters in new cluster ok
Restoring global objects in the new cluster ok
Restoring database schemas in the new cluster
ok
Setting minmxid counter in new cluster ok
Adding ".old" suffix to old global/pg_control ok
If you want to start the old cluster, you will need to remove
the ".old" suffix from /u02/pgdata/testmig/global/pg_control.old.
Because "link" mode was used, the old cluster cannot be safely
started once the new cluster has been started.
Linking user relation files
ok
Setting next OID for new cluster ok
Sync data directory to disk ok
Creating script to analyze new cluster ok
Creating script to delete old cluster ok
Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade so,
once you start the new server, consider running:
./analyze_new_cluster.sh
Running this script will delete the old cluster's data files:
./delete_old_cluster.sh
Restore the configuration files:
postgres@debian6pg:/u02$ mkdir -p /u02/pgdata/testmig95/pg_log postgres@debian6pg:/u02$ cp /var/tmp/postgresql.conf /u02/pgdata/testmig95/postgresql.conf postgres@debian6pg:/u02$ cp /var/tmp/pg_hba.conf /u02/pgdata/testmig95/pg_hba.conf
Start and stop the upgraded instance and check that everything is fine in the log file:
postgres@debian6pg:/u02$ /u01/app/postgres/product/95/db_5/bin/pg_ctl -D /u02/pgdata/testmig95/ -l /u02/pgdata/testmig95/pg_log/log.log start postgres@debian6pg:/u02$ /u01/app/postgres/product/95/db_5/bin/pg_ctl -D /u02/pgdata/testmig95/ stop
You could already keep your cluster running now and your downtime is completed when you plan to re-build the standby. When you want to do the standby now then: save the configuration files:
# standby side postgres@debian6pg:/u03$ cp /u02/pgdata/testmig/postgresql.conf /var/tmp postgres@debian6pg:/u03$ cp /u02/pgdata/testmig/pg_hba.conf /var/tmp postgres@debian6pg:/u03$ cp /u02/pgdata/testmig/recovery.conf /var/tmp
Sync the directories from the master to the standby (this will be very fast because it will create hard links on the standby server instead of copying the user files):
postgres@debian6pg:/u03$ cd /u02/pgdata postgres@debian6pg:/u02$ rsync --archive --delete --hard-links --size-only testmig testmig95 192.168.22.33:/u02/pgdata postgres@debian6pg:/u02$ cd /u03 postgres@debian6pg:/u03$ rsync -r pgdata/testmig95 192.168.22.33:/u03/pgdata/testmig95
Restore the configuration files on the standby:
postgres@debian6pg:/u03$ cp /var/tmp/postgresql.conf /u02/pgdata/testmig95/postgresql.conf postgres@debian6pg:/u03$ cp /var/tmp/pg_hba.conf /u02/pgdata/testmig95/pg_hba.conf postgres@debian6pg:/u03$ cp /var/tmp/recovery.conf /u02/pgdata/testmig95/recovery.conf
Start the master:
postgres@debian6pg:/u03$ export PATH=/u01/app/postgres/product/95/db_5/bin:$PATH postgres@debian6pg:/u03$ pg_ctl -D /u02/pgdata/testmig95/ start -l /u02/pgdata/testmig95/pg_log/log.log
Start the standby:
postgres@debian6pg:/u03$ export PATH=/u01/app/postgres/product/95/db_5/bin:$PATH postgres@debian6pg:/u03$ pg_ctl -D /u02/pgdata/testmig95/ start -l /u02/pgdata/testmig95/pg_log/log.log
Check the standby’s logfile:
LOG: database system was shut down at 2017-01-19 07:51:24 GMT
LOG: creating missing WAL directory "pg_xlog/archive_status"
LOG: entering standby mode
LOG: started streaming WAL from primary at 0/E000000 on timeline 1
LOG: consistent recovery state reached at 0/E024D38
LOG: redo starts at 0/E024D38
LOG: database system is ready to accept read only connections
Do some checks to see that everything is there on the standby:
postgres@debian6pg:~$ psql psql (9.5.5) Type "help" for help. postgres=# select pg_is_in_recovery(); pg_is_in_recovery ------------------- t (1 row) postgres=# dx List of installed extensions Name | Version | Schema | Description -----------+---------+------------+----------------------------------------- adminpack | 1.0 | pg_catalog | administrative functions for PostgreSQL plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language (2 rows) postgres=# c testmig You are now connected to database "testmig" as user "postgres". testmig=# dx List of installed extensions Name | Version | Schema | Description ----------------+---------+------------+------------------------------------------------------------------- pg_buffercache | 1.0 | public | examine the shared buffer cache pg_trgm | 1.0 | public | text similarity measurement and index searching based on trigrams plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language (3 rows) testmig=# d List of relations Schema | Name | Type | Owner --------+------------------+-------+---------- public | pg_buffercache | view | postgres public | pgbench_accounts | table | postgres public | pgbench_branches | table | postgres public | pgbench_history | table | postgres public | pgbench_tellers | table | postgres (5 rows) testmig=# select count(*) from pgbench_accounts; count --------- 1000000 (1 row)
Run the analyze_new_cluster.sh on the master:
postgres@debian6pg:~$ ./analyze_new_cluster.sh This script will generate minimal optimizer statistics rapidly so your system is usable, and then gather statistics twice more with increasing accuracy. When it is done, your system will have the default level of optimizer statistics. If you have used ALTER TABLE to modify the statistics target for any tables, you might want to remove them and restore them after running this script because they will delay fast statistics generation. If you would like default statistics as quickly as possible, cancel this script and run: "/u01/app/postgres/product/95/db_5/bin/vacuumdb" --all --analyze-only vacuumdb: processing database "postgres": Generating minimal optimizer statistics (1 target) vacuumdb: processing database "template1": Generating minimal optimizer statistics (1 target) vacuumdb: processing database "testmig": Generating minimal optimizer statistics (1 target) vacuumdb: processing database "postgres": Generating medium optimizer statistics (10 targets) vacuumdb: processing database "template1": Generating medium optimizer statistics (10 targets) vacuumdb: processing database "testmig": Generating medium optimizer statistics (10 targets) vacuumdb: processing database "postgres": Generating default (full) optimizer statistics vacuumdb: processing database "template1": Generating default (full) optimizer statistics vacuumdb: processing database "testmig": Generating default (full) optimizer statistics
Now you can delete the old cluster on the master:
postgres@debian6pg:~$ postgres@debian6pg:~$ ./delete_old_cluster.sh
Then either copy the script to the standby or delete the old standby the manual way:
postgres@debian6pg:~$ rm -rf /u02/pgdata/testmig postgres@debian6pg:~$ rm -rf /u03/pgdata/testmig
Done. Hope this helps …