Infrastructure at your Service

Daniel Westermann

Vertically scale your PostgreSQL infrastructure with pgpool – 2 – Automatic failover and reconfiguration

In the first post of this little series we did the basic setup of pgpool and configured the watchdog. So, as of now, pgpool uses a VIP which is active on one of the pgpool nodes and failovers to the other node in case the node where the VIP currently runs on goes down for any reason. This provides the very same address for the application or clients to connect and eliminates the single point of failure that you have when there is only one pgpool instance running. What is still missing is a mechanism on how we can automatically failover to a promoted standby instance in case the PostgreSQL master goes down. We even want to be able to reconfigure the old master as new standby that follows the new master automatically. This is what this post is about …

Lets take a look at the picture again:

pgpool-architecture

What happens when we stop the standby instance? Currently both instances are in status “up” when you ask pgpool (notice that I connect to the 192.168.22.38 address which is the VIP that is managed by pgpool):

[[email protected]_pgpool_m1 ~]$ psql -h 192.168.22.38 -c "show pool_nodes" postgres
 node_id |   hostname    | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+---------------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | 192.168.22.34 | 5432 | up     | 0.500000  | primary | 0          | true              | 0
 1       | 192.168.22.35 | 5432 | up     | 0.500000  | standby | 0          | false             | 0
(2 rows)

Lets create a new connection first to see what happens to existing connections:

[[email protected]_pgpool_m1 ~]$ psql -h 192.168.22.38 postgres
psql (9.6.2 dbi services build)
Type "help" for help.

postgres= select now();
              now              
-------------------------------
 2017-03-24 10:53:16.077351+01
(1 row)


postgres=#

Shutdown the standby instance:

[email protected]:/home/postgres/ [PG1] psql -c "select pg_is_in_recovery()" postgres
 pg_is_in_recovery 
-------------------
 t
(1 row)

[email protected]:/home/postgres/ [PG1] pg_ctl -D /u02/pgdata/PG1/ stop -m fast
waiting for server to shut down.... done
server stopped

What happened to our connection?

postgres=# select now();
              now              
-------------------------------
 2017-03-24 10:54:46.802984+01
(1 row)

Nothing, we can just keep using it. What is pgpool telling us about our instances?

[[email protected]_pgpool_m1 ~]$ psql -h 192.168.22.38 -c "show pool_nodes" postgres
 node_id |   hostname    | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+---------------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | 192.168.22.34 | 5432 | up     | 0.500000  | primary | 4          | true              | 0
 1       | 192.168.22.35 | 5432 | down   | 0.500000  | standby | 0          | false             | 0
(2 rows)

As expected the standby node is now down. This is the easy case, just start the standby again and you are fine:

[email protected]:/home/postgres/ [PG1] pg_ctl -D /u02/pgdata/PG1/ start -m fast
server starting
[email protected]:/home/postgres/ [PG1] psql -c "select pg_is_in_recovery()" postgres
 pg_is_in_recovery 
-------------------
 t
(1 row)

… and pgpool should detect this as well:

[[email protected]_pgpool_m1 ~]$ psql -h 192.168.22.38 -c "show pool_nodes" postgres
 node_id |   hostname    | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+---------------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | 192.168.22.34 | 5432 | up     | 0.500000  | primary | 5          | true              | 0
 1       | 192.168.22.35 | 5432 | down   | 0.500000  | standby | 0          | false             | 0
(2 rows)

No? What we need to do is to attach the node again:

[[email protected]_pgpool_m1 ~]$ pcp_attach_node -n 1
Password: 
pcp_attach_node -- Command Successful
[[email protected]_pgpool_m1 ~]$ psql -h 192.168.22.38 -c "show pool_nodes" postgres
 node_id |   hostname    | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+---------------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | 192.168.22.34 | 5432 | up     | 0.500000  | primary | 5          | true              | 0
 1       | 192.168.22.35 | 5432 | up     | 0.500000  | standby | 0          | false             | 0

Now comes the not so easy part: What happens if the primary instance goes down? This is where the failover_command specified in pgpool.conf comes into the game:

[[email protected]_pgpool_m1 etc]$ cat pgpool.conf | grep failover_command
failover_command = '/home/postgres/failover.sh %d "%h" %p %D %m %M "%H" %P'

When pgpool detects that the master is gone the failover_command will be executed. This is a very simple failover script (located on the pgpool hosts). The parameters are passed into the script by pgpool automatically:

#!/bin/sh -x
# Execute command by failover.
# special values:  %d = node id
#                  %h = host name
#                  %p = port number
#                  %D = database cluster path
#                  %m = new master node id
#                  %M = old master node id
#                  %H = new master node host name
#                  %P = old primary node id
#                  %% = '%' character
failed_node_id=$1
failed_host_name=$2
failed_port=$3
failed_db_cluster=$4
new_master_id=$5
old_master_id=$6
new_master_host_name=$7
old_primary_node_id=$8

if [ $failed_node_id = $old_primary_node_id ];then	# master failed
    ssh -T [email protected]$new_master_host_name /home/postgres/promote.sh
fi 

The promote.sh script is very simple as well and must be available on all PostgreSQL nodes:

#!/bin/bash
PGDATA="/u02/pgdata/PG1"
PATH="/u01/app/postgres/product/96/db_2/bin/:$PATH"
export PATH PGDATA
pg_ctl promote -D ${PGDATA} >> /var/tmp/failover.log
psql -c "select * from pg_create_physical_replication_slot('standby1')" postgres >> /var/tmp/failover.log

So once we shutdown the master this gets triggered and the standby is promoted. How can we re-attach the old master as a new standby? The magic is in here:

[[email protected]_pgpool_m1 ~]$ grep recovery_1st_stage_command /u01/app/postgres/product/pgpool-II/etc/pgpool.conf
recovery_1st_stage_command = 'recover.sh'

In the pgpool.conf file a script is specified which is called when a recovery is triggered. This script needs to be in PGDATA, otherwise it will not work. It looks like this (on both nodes, of course):

[email protected]:/u02/pgdata/PG1/ [PG1] cat $PGDATA/recover.sh 
#!/bin/sh
set -x
master_node_host_name=`hostname`
master_db_cluster=$1
recovery_node_hostname=$2
recovery_db_cluster=$3
tmp=/tmp/mytemp$$
trap "rm -f $tmp" 0 1 2 3 15

PGDATA="/u02/pgdata/PG1"
PATH="/u01/app/postgres/product/96/db_2/bin/:$PATH"

export PATH PGDATA

# make sure the old master is down
ssh -T $recovery_node_hostname /home/postgres/stop_instance.sh
# rewind the old master
ssh -T ${recovery_node_hostname} /home/postgres/rewind.sh ${master_node_host_name}

cat > $tmp <<EOF
standby_mode = 'on'
primary_conninfo = 'host=$master_node_host_name user=postgres'
primary_slot_name = 'standby1'
recovery_target_timeline = 'latest'
trigger_file='/u02/pgdata/PG1/failover_trigger'
EOF

scp $tmp $recovery_node_hostname:$recovery_db_cluster/recovery.conf

ssh -T ${recovery_node_hostname} /home/postgres/start_instance.sh

The “set -x” on the top is quite important because this script is execute from inside PostgreSQL and you’ll see the output in the PostgreSQL logfile. The scripts executed within the recovery.sh script here are:

[email protected]:/home/postgres/ [PG1] cat stop_instance.sh 
#!/bin/bash
PGDATA="/u02/pgdata/PG1"
PATH="/u01/app/postgres/product/96/db_2/bin/:$PATH"
pg_ctl -D $PGDATA stop -m immediate 

[email protected]:/home/postgres/ [PG1] cat rewind.sh 
#!/bin/bash
master_node_host_name=$1
PGDATA="/u02/pgdata/PG1"
PATH="/u01/app/postgres/product/96/db_2/bin/:$PATH"
pg_rewind --target-pgdata=${PGDATA} --source-server="port=5432 user=postgres dbname=postgres host=${master_node_host_name}" >> /var/tmp/recover.log

[email protected]:/home/postgres/ [PG1] cat start_instance.sh 
#!/bin/bash
PGDATA="/u02/pgdata/PG1"
PATH="/u01/app/postgres/product/96/db_2/bin/:$PATH"
pg_ctl -w -s -D $PGDATA start 2>/dev/null 1>/dev/null < /dev/null &

Having this in place we can rebuild the old master with:

[[email protected]_pgpool_m1 ~]$ psql -h 192.168.22.38 -c "show pool_nodes" postgres
 node_id |   hostname    | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+---------------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | 192.168.22.34 | 5432 | down   | 0.500000  | standby | 0          | false             | 0
 1       | 192.168.22.35 | 5432 | up     | 0.500000  | primary | 0          | true              | 0

[[email protected]_pgpool_m1 ~]$ pcp_recovery_node -n 0 -w
pcp_recovery_node -- Command Successful

[[email protected]_pgpool_m1 ~]$ psql -h 192.168.22.38 -c "show pool_nodes" postgres
 node_id |   hostname    | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
 
---------+---------------+------+--------+-----------+---------+------------+-------------------+------------------
-
 0       | 192.168.22.34 | 5432 | up     | 0.500000  | standby | 0          | false             | 0
 1       | 192.168.22.35 | 5432 | up     | 0.500000  | primary | 0          | true              | 0

The output in the PostgreSQL logfile of the master looks then similar to this:

++ hostname
+ master_node_host_name=pgpool2
+ master_db_cluster=/u02/pgdata/PG1
+ recovery_node_hostname=192.168.22.34
+ recovery_db_cluster=/u02/pgdata/PG1
+ tmp=/tmp/mytemp10901
+ trap 'rm -f /tmp/mytemp10901' 0 1 2 3 15
+ PGDATA=/u02/pgdata/PG1
+ PATH=/u01/app/postgres/product/96/db_2/bin/:/u01/app/postgres/product/96/db_2bin:/u01/app/postgres/product/96/db_2/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/postgres/.local/bin:/home/postgres/bin:/sbin:/usr/edb-bart/bin:/usr/edb-efm/bin:/u01/app/postgres/product/pgpool-II/bin:/u01/app/postgres/local/dmk/bin
+ export PATH PGDATA
+ ssh -T 192.168.22.34 /home/postgres/stop_instance.sh
pg_ctl: PID file "/u02/pgdata/PG1/postmaster.pid" does not exist
Is server running?
+ ssh -T 192.168.22.34 /home/postgres/rewind.sh pgpool2
+ cat
+ scp /tmp/mytemp10901 pgpool1:/u02/pgdata/PG1/recovery.conf
+ ssh 192.168.22.34 /home/postgres/start_instance.sh
+ psql -c 'select '\''done'\''' postgres
 ?column? 
----------
 done
(1 row)

Time: 0.140 ms
+ rm -f /tmp/mytemp10901

I have to admit that I needed a small hack: At the end of the recovery pgpool tries to execute $PGDATA/pgpool_remote_start and I do not want to do it that way so I just created this file on both nodes:

[email protected]:/u02/pgdata/PG1/ [PG1] cat $PGDATA/pgpool_remote_start
exit 0

In the next post we’ll add a second standby system and then teach pgpool how a remaining standby can be told to follow a new master automatically. We’ll need adjust the scripts here to be more dynamic as well then.

6 Comments

  • Rob says:

    In your recover.sh script.

    The
    scp $tmp $recovery_node_host_name:$recovery_db_cluster/recovery.conf
    should be
    scp $tmp $recovery_node_hostname:$recovery_db_cluster/recovery.conf

    or am i missing something here?

  • Gianfranco says:

    Hi Daniel,

    Do you have both databases in synchronous replication mode? Are you able to write into the master node when you turn off the standby node?

    When we write into master node with the standby node stopped, all transactions remain waiting for commit on the standby node:

    psql (10.4)
    SSL connection (protocol: TLSv1.2, cipher: AES256-GCM-SHA384, bits: 256, compression: off)
    Type “help” for help.

    postgres=# create database test;

    ^CCancel request sent
    WARNING: canceling wait for synchronous replication due to user request
    DETAIL: The transaction has already committed locally, but might not have been replicated to the standby.
    CREATE DATABASE
    postgres=#

    Thanks in advance and best regards,

    Gianfranco

    • Daniel Westermann says:

      Hi Gianfranco,

      no, in that example it was asynchronous. When you go for synchronous the master will halt when the standby is not reachable. If you want synchronous you should probably implement a quorum set of synchronous standby instances.

      Cheers,
      Daniel

  • Natalie Reno says:

    You have the same setup with recover.sh in the configuration file with no input variables, but input variables defined in the script. What am I missing here? All other scripts require the input variables (i.e. failover and follow_master). This is not currently working for me on our setup.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure