Infrastructure at your Service

Daniel Westermann

From PostgreSQL 9.1.8 to PostgreSQL 9.5.5 including a standby database with minimal downtime

As you might know PostgreSQL 9.1 is out of support since last September. But, no surprise, there are still plenty of PostgreSQL 9.1 instances out there which need to be upgraded to a supported release. At one of our customers we exactly have this situation: The production environment is running on PostgreSQL 9.1.8 and needs to be upgraded to a recent release (9.5.5 in this case). To make life a little bit more fun there are more constraints: This is a PostgreSQL streaming replication configuration, so there is a standby database involved. Allowed downtime is 30 minutes and there is space pressure on the current systems and the current systems need to be reused. In this post we’ll look at how you can do the upgrade with minimal downtime (without using logical replication).

First, lets build the test environment. We need two systems, one for the master instance (192.168.22.32) and one for the standby (192.168.22.32). On both of these system we’ll need PostgreSQL 9.1.8 installed, so:

[email protected]:~$ wget https://ftp.postgresql.org/pub/source/v9.1.8/postgresql-9.1.8.tar.bz2
[email protected]:~$ tar -axf postgresql-9.1.8.tar.bz2
[email protected]:~$ cd postgresql-9.1.8/
[email protected]:~/postgresql-9.1.8$ PGHOME=/u01/app/postgres/product/91/db_8
[email protected]:~/postgresql-9.1.8$ SEGSIZE=2
[email protected]:~/postgresql-9.1.8$ BLOCKSIZE=8
[email protected]:~/postgresql-9.1.8$ WALSEGSIZE=16
[email protected]:~/postgresql-9.1.8$ ./configure --prefix=${PGHOME} \
                                                   --exec-prefix=${PGHOME} \
                                                   --bindir=${PGHOME}/bin \
                                                   --libdir=${PGHOME}/lib \
                                                   --sysconfdir=${PGHOME}/etc \
                                                   --includedir=${PGHOME}/include \
                                                   --datarootdir=${PGHOME}/share \
                                                   --datadir=${PGHOME}/share \
                                                   --with-pgport=5432 \
                                                   --with-perl \
                                                   --with-python \
                                                   --with-tcl \
                                                   --with-openssl \
                                                   --with-pam \
                                                   --with-ldap \
                                                   --with-libxml \
                                                   --with-libxslt \
                                                   --with-segsize=${SEGSIZE} \
                                                   --with-blocksize=${BLOCKSIZE} \
                                                   --with-wal-segsize=${WALSEGSIZE}
[email protected]:~/postgresql-9.1.8$ make world
[email protected]:~/postgresql-9.1.8$ make install
[email protected]:~/postgresql-9.1.8$ cd contrib
[email protected]:~/postgresql-9.1.8/contrib$ make install
[email protected]:~/postgresql-9.1.8/contrib$ cd ../..
[email protected]:~$ rm -rf postgresql-9.1.8*

Once this is available on both nodes we can initialize our master instance:

[email protected]:~$ /u01/app/postgres/product/91/db_8/bin/initdb -D /u02/pgdata/testmig -X /u03/pgdata/testmig
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale en_US.UTF-8.
The default database encoding has accordingly been set to UTF8.
The default text search configuration will be set to "english".

creating directory /u02/pgdata/testmig ... ok
creating directory /u03/pgdata/testmig ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 24MB
creating configuration files ... ok
creating template1 database in /u02/pgdata/testmig/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating collations ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
loading PL/pgSQL server-side language ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the -A option the
next time you run initdb.

Success. You can now start the database server using:

    /u01/app/postgres/product/91/db_8/bin/postgres -D /u02/pgdata/testmig
or
    /u01/app/postgres/product/91/db_8/bin/pg_ctl -D /u02/pgdata/testmig -l logfile start

(I am assuming that password less ssh authentication is already setup between the nodes for the following). Setup authentication:

[email protected]:/u03$ echo "host    replication     postgres       192.168.22.32/32        trust" >> /u02/pgdata/testmig/pg_hba.conf
[email protected]:/u03$ echo "host    replication     postgres       192.168.22.33/32        trust" >> /u02/pgdata/testmig/pg_hba.conf

Adjust the parameters:

[email protected]:/u03$ sed -i 's/#wal_level = minimal/wal_level = hot_standby/g' /u02/pgdata/testmig/postgresql.conf
[email protected]:/u03$ sed -i 's/#max_wal_senders = 0/max_wal_senders = 10/g' /u02/pgdata/testmig/postgresql.conf
[email protected]:/u03$ sed -i 's/#wal_keep_segments = 0/wal_keep_segments = 100/g' /u02/pgdata/testmig/postgresql.conf
[email protected]:/u03$ sed -i "s/#listen_addresses = 'localhost'/listen_addresses = '*'/g" /u02/pgdata/testmig/postgresql.conf
[email protected]:/u03$ sed -i 's/#logging_collector = off/#logging_collector = on/g' /u02/pgdata/testmig/postgresql.conf    
[email protected]:/u03$ mkdir /u02/pgdata/testmig/pg_log    

Start and stop the instance:

[email protected]:/u03$ export PATH=/u01/app/postgres/product/91/db_8/bin/:$PATH
[email protected]:/u03$ pg_ctl -D /u02/pgdata/testmig/ start -l /u02/pgdata/testmig/pg_log/log.log
[email protected]:/u03$ pg_ctl -D /u02/pgdata/testmig/ stop

Ready to setup the standby:

[email protected]:/u03$ cd /u02    
[email protected]:/u02$ rsync -r pgdata/ 192.168.22.33:/u02/pgdata
[email protected]:~$ cd /u03
[email protected]:/u03$ rsync -r pgdata/ 192.168.22.33:/u03/pgdata
[email protected]:/u03$ ssh 192.168.22.33 "ln -s /u03/pgdata/testmig/ /u02/pgdata/testmig/pg_xlog"
[email protected]:/u03$ scp /u02/pgdata/testmig/pg_hba.conf 192.168.22.33:/u02/pgdata/testmig/pg_hba.conf
[email protected]:/u03$ ssh 192.168.22.33 "echo \"standby_mode = on\" >> /u02/pgdata/testmig/recovery.conf"
[email protected]:/u03$ ssh 192.168.22.33 "echo \"primary_conninfo = 'host=192.168.22.32 port=5432 user=postgres'\" >> /u02/pgdata/testmig/recovery.conf"
[email protected]:/u03$ ssh 192.168.22.33 "echo \"trigger_file = '/u02/pgdata/testmig/up_slave'\" >> /u02/pgdata/testmig/recovery.conf"
[email protected]:/u03$ ssh 192.168.22.33 "sed -i 's/#hot_standby = off/hot_standby = on/g' /u02/pgdata/testmig/postgresql.conf"
[email protected]:/u03$ ssh 192.168.22.33 "sed -i 's/#logging_collector = off/#logging_collector = on/g' /u02/pgdata/testmig/postgresql.conf"
[email protected]:/u03$ ssh 192.168.22.33 "mkdir -p /u02/pgdata/testmig/pg_log"

Start the master:

[email protected]:/u03$ pg_ctl -D /u02/pgdata/testmig/ start -l /u02/pgdata/testmig/pg_log/log.log

Start the standby:

# standby side
[email protected]:/u03$ export PATH=/u01/app/postgres/product/91/db_8/bin/:$PATH
[email protected]:/u03$ pg_ctl -D /u02/pgdata/testmig/ start -l /u02/pgdata/testmig/pg_log/log.log

… and if everything went fine we should see this in the log of the standby instance:

LOG: database system was shut down at 2017-01-18 07:28:02 CET
LOG: entering standby mode
LOG: consistent recovery state reached at 0/16BCBB0
LOG: database system is ready to accept read only connections
LOG: record with zero length at 0/16BCBB0
LOG: streaming replication successfully connected to primary

A quick check on the standby to confirm that it is operating in recovery mode:

[email protected]:~$ psql
psql (9.1.8)
Type "help" for help.

postgres=# select pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 t
(1 row)

In the case we had at the customer there was the adminpack extension installed in the postgres database and the pg_trgm and pg_buffercache extension in the application database, so lets do the same here on the master (this will get replicated to the standby automatically):

[email protected]:/u03$ psql
psql (9.1.8)
Type "help" for help.

postgres=# create extension adminpack;
CREATE EXTENSION
postgres=# create database testmig;
CREATE DATABASE
postgres=# \c testmig
You are now connected to database "testmig" as user "postgres".
testmig=# create extension pg_trgm;
CREATE EXTENSION
testmig=# create extension pg_buffercache;
CREATE EXTENSION
testmig=# 

Quickly confirm that it is there on the standby:

postgres=# \dx
                        List of installed extensions
   Name    | Version |   Schema   |               Description               
-----------+---------+------------+-----------------------------------------
 adminpack | 1.0     | pg_catalog | administrative functions for PostgreSQL
 plpgsql   | 1.0     | pg_catalog | PL/pgSQL procedural language
(2 rows)

postgres=# \c testmig
You are now connected to database "testmig" as user "postgres".
testmig=# \dx
                                       List of installed extensions
      Name      | Version |   Schema   |                            Description                            
----------------+---------+------------+-------------------------------------------------------------------
 pg_buffercache | 1.0     | public     | examine the shared buffer cache
 pg_trgm        | 1.0     | public     | text similarity measurement and index searching based on trigrams
 plpgsql        | 1.0     | pg_catalog | PL/pgSQL procedural language
(3 rows)

Finally, some sample data generated with pgbench:

[email protected]:/u03$ pgbench -i testmig -s 10

Should be there on the standby as well:

testmig=# select count(*) from pgbench_accounts;
 count  
--------
1000000
(1 row)
testmig=# 

This is, more or less, the situation to start from. How can we upgrade this to PostgreSQL 9.5.5 with minimal downtime and without using logical replication? Obviously we’ll need to get PostgreSQL 9.5.5 installed on both systems before we can do anything further, so:

[email protected]:~$ wget https://ftp.postgresql.org/pub/source/v9.5.5/postgresql-9.5.5.tar.bz2
[email protected]:~$ tar -axf postgresql-9.5.5.tar.bz2 
[email protected]:~$ cd postgresql-9.5.5/
[email protected]:~/postgresql-9.5.5$ PGHOME=/u01/app/postgres/product/95/db_5
[email protected]:~/postgresql-9.5.5$ PGHOME=/u01/app/postgres/product/95/db_5
[email protected]:~/postgresql-9.5.5$ SEGSIZE=2
[email protected]:~/postgresql-9.5.5$ BLOCKSIZE=8
[email protected]:~/postgresql-9.5.5$ WALSEGSIZE=16
[email protected]:~/postgresql-9.5.5$ ./configure --prefix=${PGHOME} \
                                                   --exec-prefix=${PGHOME} \
                                                   --bindir=${PGHOME}/bin \
                                                   --libdir=${PGHOME}/lib \
                                                   --sysconfdir=${PGHOME}/etc \
                                                   --includedir=${PGHOME}/include \
                                                   --datarootdir=${PGHOME}/share \
                                                   --datadir=${PGHOME}/share \
                                                   --with-pgport=5432 \
                                                   --with-perl \
                                                   --with-python \
                                                   --with-tcl \
                                                   --with-openssl \
                                                   --with-pam \
                                                   --with-ldap \
                                                   --with-libxml \
                                                   --with-libxslt \
                                                   --with-segsize=${SEGSIZE} \
                                                   --with-blocksize=${BLOCKSIZE} \
                                                   --with-wal-segsize=${WALSEGSIZE}
[email protected]:~/postgresql-9.5.5$ make world
[email protected]:~/postgresql-9.5.5$ make install
[email protected]:~/postgresql-9.5.5$ cd contrib
[email protected]:~/postgresql-9.5.5/contrib$ make install
[email protected]:~/postgresql-9.5.5/contrib$ cd ../..
[email protected]:~$ rm -rf postgresql-9.5.5*

Then we need a new cluster initialized with the new version of PostgreSQL on the master:

[email protected]:~$ /u01/app/postgres/product/95/db_5/bin/initdb -D /u02/pgdata/testmig95/ -X /u03/pgdata/testmig95/ 
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory /u02/pgdata/testmig95 ... ok
creating directory /u03/pgdata/testmig95 ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
creating template1 database in /u02/pgdata/testmig95/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating collations ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
loading PL/pgSQL server-side language ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok
syncing data to disk ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    /u01/app/postgres/product/95/db_5/bin/pg_ctl -D /u02/pgdata/testmig95/ -l logfile start

Shutdown the master and record the latest checkpoint location (This is where your downtime starts):

[email protected]:/u02$ pg_ctl -D /u02/pgdata/testmig stop -m fast
LOG:  received fast shutdown request
LOG:  aborting any active transactions
LOG:  autovacuum launcher shutting down
LOG:  shutting down
waiting for server to shut down....LOG:  database system is shut down
 done
server stopped

[email protected]:/u02/pgdata/testmig$ pg_controldata  | grep "Latest checkpoint location"
Latest checkpoint location:           0/C619840

Shutdown the slave and compare the last checkpoint:

# slave side
[email protected]:/u02/pgdata/testmig$ pg_ctl -D /u02/pgdata/testmig/ stop -m fast
waiting for server to shut down.... done
server stopped

[email protected]:/u02/pgdata/testmig$ pg_controldata  | grep "Latest checkpoint location"
Latest checkpoint location:           0/C619840

As both checkpoint locations match we are sure that the standby applied all changes and there is not difference in data.

Save your configuration files:

[email protected]:/u02$ cp /u02/pgdata/testmig/postgresql.conf /var/tmp 
[email protected]:/u02$ cp /u02/pgdata/testmig/pg_hba.conf /var/tmp
[email protected]:/u02$ cp /u02/pgdata/testmig/postgresql.conf /var/tmp 
[email protected]:/u02$ cp /u02/pgdata/testmig/pg_hba.conf /var/tmp

Run pg_upgrade on the master with link (-k) mode (if you have many cores on your box you can use the “-j” option to parallelize pg_upgrade):

[email protected]:/u02$ export PGDATAOLD=/u02/pgdata/testmig/
[email protected]:/u02$ export PGDATANEW=/u02/pgdata/testmig95/
[email protected]:/u02$ export PGBINOLD=/u01/app/postgres/product/91/db_8/bin/
[email protected]:/u02$ export PGBINNEW=/u01/app/postgres/product/95/db_5/bin/

[email protected]:/u02$ /u01/app/postgres/product/95/db_5/bin/pg_upgrade -k 

(Usually you’d do a “-c” check run before doing the real upgrade). When using link mode the files get hard-linked instead of copied which is much faster and saves disk space. The downside is that you can not revert to the old cluster in case anything goes wrong. When it goes fine, it looks like this:


Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for reg* system OID user data types                ok
Checking for contrib/isn with bigint-passing mismatch       ok
Checking for invalid "line" user columns                    ok
Creating dump of global objects                             ok
Creating dump of database schemas
                                                            ok
Checking for presence of required libraries                 ok
Checking database user is the install user                  ok
Checking for prepared transactions                          ok

If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.

Performing Upgrade
------------------
Analyzing all rows in the new cluster                       ok
Freezing all rows on the new cluster                        ok
Deleting files from new pg_clog                             ok
Copying old pg_clog to new server                           ok
Setting next transaction ID and epoch for new cluster       ok
Deleting files from new pg_multixact/offsets                ok
Setting oldest multixact ID on new cluster                  ok
Resetting WAL archives                                      ok
Setting frozenxid and minmxid counters in new cluster       ok
Restoring global objects in the new cluster                 ok
Restoring database schemas in the new cluster
                                                            ok
Setting minmxid counter in new cluster                      ok
Adding ".old" suffix to old global/pg_control               ok

If you want to start the old cluster, you will need to remove
the ".old" suffix from /u02/pgdata/testmig/global/pg_control.old.
Because "link" mode was used, the old cluster cannot be safely
started once the new cluster has been started.

Linking user relation files
                                                            ok
Setting next OID for new cluster                            ok
Sync data directory to disk                                 ok
Creating script to analyze new cluster                      ok
Creating script to delete old cluster                       ok

Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade so,
once you start the new server, consider running:
    ./analyze_new_cluster.sh

Running this script will delete the old cluster's data files:
    ./delete_old_cluster.sh

Restore the configuration files:

[email protected]:/u02$ mkdir -p /u02/pgdata/testmig95/pg_log
[email protected]:/u02$ cp /var/tmp/postgresql.conf /u02/pgdata/testmig95/postgresql.conf  
[email protected]:/u02$ cp /var/tmp/pg_hba.conf /u02/pgdata/testmig95/pg_hba.conf 

Start and stop the upgraded instance and check that everything is fine in the log file:

[email protected]:/u02$ /u01/app/postgres/product/95/db_5/bin/pg_ctl -D /u02/pgdata/testmig95/ -l /u02/pgdata/testmig95/pg_log/log.log start    
[email protected]:/u02$ /u01/app/postgres/product/95/db_5/bin/pg_ctl -D /u02/pgdata/testmig95/ stop   

You could already keep your cluster running now and your downtime is completed when you plan to re-build the standby. When you want to do the standby now then: save the configuration files:

# standby side
[email protected]:/u03$ cp /u02/pgdata/testmig/postgresql.conf /var/tmp
[email protected]:/u03$ cp /u02/pgdata/testmig/pg_hba.conf /var/tmp
[email protected]:/u03$ cp /u02/pgdata/testmig/recovery.conf /var/tmp

Sync the directories from the master to the standby (this will be very fast because it will create hard links on the standby server instead of copying the user files):

[email protected]:/u03$ cd /u02/pgdata   
[email protected]:/u02$ rsync --archive --delete --hard-links --size-only testmig testmig95 192.168.22.33:/u02/pgdata
[email protected]:/u02$ cd /u03
[email protected]:/u03$ rsync -r pgdata/testmig95 192.168.22.33:/u03/pgdata/testmig95

Restore the configuration files on the standby:

[email protected]:/u03$ cp /var/tmp/postgresql.conf /u02/pgdata/testmig95/postgresql.conf 
[email protected]:/u03$ cp /var/tmp/pg_hba.conf /u02/pgdata/testmig95/pg_hba.conf
[email protected]:/u03$ cp /var/tmp/recovery.conf /u02/pgdata/testmig95/recovery.conf

Start the master:

[email protected]:/u03$ export PATH=/u01/app/postgres/product/95/db_5/bin:$PATH
[email protected]:/u03$ pg_ctl -D /u02/pgdata/testmig95/ start -l /u02/pgdata/testmig95/pg_log/log.log

Start the standby:

[email protected]:/u03$ export PATH=/u01/app/postgres/product/95/db_5/bin:$PATH
[email protected]:/u03$ pg_ctl -D /u02/pgdata/testmig95/ start -l /u02/pgdata/testmig95/pg_log/log.log

Check the standby’s logfile:


LOG:  database system was shut down at 2017-01-19 07:51:24 GMT
LOG:  creating missing WAL directory "pg_xlog/archive_status"
LOG:  entering standby mode
LOG:  started streaming WAL from primary at 0/E000000 on timeline 1
LOG:  consistent recovery state reached at 0/E024D38
LOG:  redo starts at 0/E024D38
LOG:  database system is ready to accept read only connections

Do some checks to see that everything is there on the standby:

[email protected]:~$ psql
psql (9.5.5)
Type "help" for help.

postgres=# select pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 t
(1 row)

postgres=# \dx
                        List of installed extensions
   Name    | Version |   Schema   |               Description               
-----------+---------+------------+-----------------------------------------
 adminpack | 1.0     | pg_catalog | administrative functions for PostgreSQL
 plpgsql   | 1.0     | pg_catalog | PL/pgSQL procedural language
(2 rows)

postgres=# \c testmig
You are now connected to database "testmig" as user "postgres".
testmig=# \dx
                                       List of installed extensions
      Name      | Version |   Schema   |                            Description                            
----------------+---------+------------+-------------------------------------------------------------------
 pg_buffercache | 1.0     | public     | examine the shared buffer cache
 pg_trgm        | 1.0     | public     | text similarity measurement and index searching based on trigrams
 plpgsql        | 1.0     | pg_catalog | PL/pgSQL procedural language
(3 rows)

testmig=# \d
              List of relations
 Schema |       Name       | Type  |  Owner   
--------+------------------+-------+----------
 public | pg_buffercache   | view  | postgres
 public | pgbench_accounts | table | postgres
 public | pgbench_branches | table | postgres
 public | pgbench_history  | table | postgres
 public | pgbench_tellers  | table | postgres
(5 rows)

testmig=# select count(*) from pgbench_accounts;
  count  
---------
 1000000
(1 row)

Run the analyze_new_cluster.sh on the master:

[email protected]:~$ ./analyze_new_cluster.sh
This script will generate minimal optimizer statistics rapidly
so your system is usable, and then gather statistics twice more
with increasing accuracy.  When it is done, your system will
have the default level of optimizer statistics.

If you have used ALTER TABLE to modify the statistics target for
any tables, you might want to remove them and restore them after
running this script because they will delay fast statistics generation.

If you would like default statistics as quickly as possible, cancel
this script and run:
    "/u01/app/postgres/product/95/db_5/bin/vacuumdb" --all --analyze-only

vacuumdb: processing database "postgres": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "template1": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "testmig": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "postgres": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "template1": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "testmig": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "postgres": Generating default (full) optimizer statistics
vacuumdb: processing database "template1": Generating default (full) optimizer statistics
vacuumdb: processing database "testmig": Generating default (full) optimizer statistics

Now you can delete the old cluster on the master:

[email protected]:~$ [email protected]:~$ ./delete_old_cluster.sh

Then either copy the script to the standby or delete the old standby the manual way:

[email protected]:~$ rm -rf /u02/pgdata/testmig
[email protected]:~$ rm -rf /u03/pgdata/testmig

Done. Hope this helps …

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure