In this post we will have a look at switchover and failover of a Patroni cluster. As well as a look at the maintenance mode Patroni offers, which gives the opportunity to prevent from an automatic failover.

Switchover

There are two possibilities to run a switchover, either in scheduled mode or immediately.

1. Scheduled Switchover

postgres@patroni1:/home/postgres/ [PG1] patronictl switchover
Master [patroni1]:
Candidate ['patroni2', 'patroni3'] []: patroni2
When should the switchover take place (e.g. 2019-10-08T11:31 )  [now]: 2019-10-08T10:32
Current cluster topology
+---------+----------+----------------+--------+---------+----+-----------+
| Cluster |  Member  |      Host      |  Role  |  State  | TL | Lag in MB |
+---------+----------+----------------+--------+---------+----+-----------+
|   PG1   | patroni1 | 192.168.22.111 | Leader | running |  2 |       0.0 |
|   PG1   | patroni2 | 192.168.22.112 |        | running |  2 |       0.0 |
|   PG1   | patroni3 | 192.168.22.113 |        | running |  2 |       0.0 |
+---------+----------+----------------+--------+---------+----+-----------+
Are you sure you want to schedule switchover of cluster PG1 at 2019-10-08T10:32:00+02:00, demoting current master patroni1? [y/N]: y
2019-10-08 10:31:14.89236 Switchover scheduled
+---------+----------+----------------+--------+---------+----+-----------+
| Cluster |  Member  |      Host      |  Role  |  State  | TL | Lag in MB |
+---------+----------+----------------+--------+---------+----+-----------+
|   PG1   | patroni1 | 192.168.22.111 | Leader | running |  2 |       0.0 |
|   PG1   | patroni2 | 192.168.22.112 |        | running |  2 |       0.0 |
|   PG1   | patroni3 | 192.168.22.113 |        | running |  2 |       0.0 |
+---------+----------+----------------+--------+---------+----+-----------+
 Switchover scheduled at: 2019-10-08T10:32:00+02:00
                    from: patroni1
                      to: patroni2
postgres@patroni1:/home/postgres/ [PG1]

That’s it. At the given time, the switchover will take place. All you see in the logfile is an entry like this

Oct  8 10:32:00 patroni1 patroni: 2019-10-08 10:32:00,006 INFO: Manual scheduled failover at 2019-10-08T10:32:00+02:00
Oct  8 10:32:00 patroni1 patroni: 2019-10-08 10:32:00,016 INFO: Got response from patroni2 http://192.168.22.112:8008/patroni: {"database_system_identifier": "6745341072751547355", "postmaster_start_time": "2019-10-08 10:09:40.217 CEST", "timeline": 2, "cluster_unlocked": false, "patroni": {"scope": "PG1", "version": "1.6.0"}, "state": "running", "role": "replica", "xlog": {"received_location": 83886560, "replayed_timestamp": null, "paused": false, "replayed_location": 83886560}, "server_version": 110005}
Oct  8 10:32:00 patroni1 patroni: 2019-10-08 10:32:00,113 INFO: manual failover: demoting myself
Oct  8 10:32:01 patroni1 patroni: 2019-10-08 10:32:01,256 INFO: Leader key released
Oct  8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03,271 INFO: Local timeline=2 lsn=0/6000028
Oct  8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03,279 INFO: master_timeline=3
Oct  8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03,281 INFO: master: history=1#0110/5000098#011no recovery target specified
Oct  8 10:32:03 patroni1 patroni: 2#0110/6000098#011no recovery target specified
Oct  8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03,282 INFO: closed patroni connection to the postgresql cluster
Oct  8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03,312 INFO: postmaster pid=11537
Oct  8 10:32:03 patroni1 patroni: 192.168.22.111:5432 - no response
Oct  8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03.325 CEST - 1 - 11537 -  - @ - 0LOG:  listening on IPv4 address "192.168.22.111", port 5432
Oct  8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03.328 CEST - 2 - 11537 -  - @ - 0LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
Oct  8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03.339 CEST - 3 - 11537 -  - @ - 0LOG:  redirecting log output to logging collector process
Oct  8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03.339 CEST - 4 - 11537 -  - @ - 0HINT:  Future log output will appear in directory "pg_log".
Oct  8 10:32:04 patroni1 patroni: 192.168.22.111:5432 - accepting connections
Oct  8 10:32:04 patroni1 patroni: 192.168.22.111:5432 - accepting connections
Oct  8 10:32:04 patroni1 patroni: 2019-10-08 10:32:04,895 INFO: Lock owner: patroni2; I am patroni1
Oct  8 10:32:04 patroni1 patroni: 2019-10-08 10:32:04,895 INFO: does not have lock
Oct  8 10:32:04 patroni1 patroni: 2019-10-08 10:32:04,896 INFO: establishing a new patroni connection to the postgres cluster

2. Immediate switchover

Here you start the same way as for planned switchover, but the switchover will take place immediatelly.

postgres@patroni1:/home/postgres/ [PG1] patronictl list
+---------+----------+----------------+--------+---------+----+-----------+
| Cluster |  Member  |      Host      |  Role  |  State  | TL | Lag in MB |
+---------+----------+----------------+--------+---------+----+-----------+
|   PG1   | patroni1 | 192.168.22.111 |        | running |  1 |       0.0 |
|   PG1   | patroni2 | 192.168.22.112 | Leader | running |  1 |       0.0 |
|   PG1   | patroni3 | 192.168.22.113 |        | running |  1 |       0.0 |
+---------+----------+----------------+--------+---------+----+-----------+
postgres@patroni1:/home/postgres/ [PG1] patronictl switchover
Master [patroni2]:
Candidate ['patroni1', 'patroni3'] []: patroni1
When should the switchover take place (e.g. 2019-10-08T11:09 )  [now]:
Current cluster topology
+---------+----------+----------------+--------+---------+----+-----------+
| Cluster |  Member  |      Host      |  Role  |  State  | TL | Lag in MB |
+---------+----------+----------------+--------+---------+----+-----------+
|   PG1   | patroni1 | 192.168.22.111 |        | running |  1 |       0.0 |
|   PG1   | patroni2 | 192.168.22.112 | Leader | running |  1 |       0.0 |
|   PG1   | patroni3 | 192.168.22.113 |        | running |  1 |       0.0 |
+---------+----------+----------------+--------+---------+----+-----------+
Are you sure you want to switchover cluster PG1, demoting current master patroni2? [y/N]: y
2019-10-08 10:09:38.88046 Successfully switched over to "patroni1"
+---------+----------+----------------+--------+---------+----+-----------+
| Cluster |  Member  |      Host      |  Role  |  State  | TL | Lag in MB |
+---------+----------+----------------+--------+---------+----+-----------+
|   PG1   | patroni1 | 192.168.22.111 | Leader | running |  1 |           |
|   PG1   | patroni2 | 192.168.22.112 |        | stopped |    |   unknown |
|   PG1   | patroni3 | 192.168.22.113 |        | running |  1 |       0.0 |
+---------+----------+----------------+--------+---------+----+-----------+
postgres@patroni1:/home/postgres/ [PG1] patronictl list
+---------+----------+----------------+--------+---------+----+-----------+
| Cluster |  Member  |      Host      |  Role  |  State  | TL | Lag in MB |
+---------+----------+----------------+--------+---------+----+-----------+
|   PG1   | patroni1 | 192.168.22.111 | Leader | running |  2 |       0.0 |
|   PG1   | patroni2 | 192.168.22.112 |        | running |  2 |       0.0 |
|   PG1   | patroni3 | 192.168.22.113 |        | running |  2 |       0.0 |
+---------+----------+----------------+--------+---------+----+-----------+
postgres@patroni1:/home/postgres/ [PG1]

Failover

In difference to the switchover, the failover is executed automatically, when the Leader node is getting unavailable for unplanned reason.
You can only adjust some database parameter to affect the failover.

The parameters for failover arre also managed using patronictl. But they are not in the parameter section, they are above. so let’s say, we adjust one parameter and add one paramter to not use the default anymore.

postgres@patroni1:/u01/app/postgres/local/dmk/etc/ [PG1] patronictl edit-config
postgres@patroni1:/u01/app/postgres/local/dmk/etc/ [PG1] patronictl edit-config
---
+++
@@ -1,5 +1,6 @@
-loop_wait: 7
+loop_wait: 10
 maximum_lag_on_failover: 1048576
+master_start_timeout: 240
 postgresql:
   parameters:
     archive_command: /bin/true

Apply these changes? [y/N]: y
Configuration changed

Afterwards there is no need to restart the database. Changes take affect immediately. So the failover can be configured according to every special need. A list of all possible parameter changes can be found here .

Maintenance mode

In some cases it is necessary to do maintenance on a single node and you do not want Patroni to manage the cluster. This can be needed for e.g. release updates.
When Patroni paused, it won’t change the state of PostgeSQL. For example it will not try to start the cluster when it is stopped.

So let’s do an example. We will pause the cluster, stop the replica, upgrade from 9.6.8 to 9.6.13 and afterwards start the replica again. In case we do not pause the replica, the database will be started automatically by Patroni.

postgres@patroni1:/home/postgres/ [PG1] patronictl pause
Success: cluster management is paused
You have new mail in /var/spool/mail/opendb
postgres@patroni1:/home/postgres/ [PG1] patronictl list
+---------+----------+----------------+--------+---------+----+-----------+
| Cluster |  Member  |      Host      |  Role  |  State  | TL | Lag in MB |
+---------+----------+----------------+--------+---------+----+-----------+
|   PG1   | patroni1 | 192.168.22.111 | Leader | running |  2 |       0.0 |
|   PG1   | patroni2 | 192.168.22.112 |        | running |  2 |       0.0 |
|   PG1   | patroni3 | 192.168.22.113 |        | running |  2 |       0.0 |
+---------+----------+----------------+--------+---------+----+-----------+
 Maintenance mode: on

On the replica

postgres@patroni2:/home/postgres/ [PG1] pg_ctl stop -D /u02/pgdata/96/PG1/ -m fast

postgres@patroni2:/home/postgres/ [PG1] export PATH= /u01/app/postgres/product/PG96/db_13/bin:$PATH
postgres@patroni2:/home/postgres/ [PG1] export PORT=5432
postgres@patroni2:/home/postgres/ [PG1] which pg_ctl
/u01/app/opendb/product/PG96/db_13/bin/pg_ctl

postgres@patroni2:/home/postgres/ [PG1] pg_ctl -D /u02/pgdata/96/PG1 start
server starting
postgres@patroni2:/home/postgres/ [PG1] 2019-10-08 17:25:28.358 CEST - 1 - 23192 -  - @ - 0LOG:  redirecting log output to logging collector process
2019-10-08 17:25:28.358 CEST - 2 - 23192 -  - @ - 0HINT:  Future log output will appear in directory "pg_log".

postgres@patroni2:/home/postgres/ [PG1] psql -c "select version()" postgres
                                                           version
------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.6.13 dbi services build on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit
(1 row)

postgres@patroni2:/home/postgres/ [PG1] patronictl resume
Success: cluster management is resumed

postgres@patroni2:/home/postgres/ [PG1] patronictl list
+---------+----------+----------------+--------+---------+----+-----------+
| Cluster |  Member  |      Host      |  Role  |  State  | TL | Lag in MB |
+---------+----------+----------------+--------+---------+----+-----------+
|   PG1   | patroni1 | 192.168.22.111 | Leader | running |  5 |       0.0 |
|   PG1   | patroni2 | 192.168.22.112 |        | running |  5 |       0.0 |
|   PG1   | patroni3 | 192.168.22.113 |        | running |  5 |       0.0 |
+---------+----------+----------------+--------+---------+----+-----------+

You can do this on the other nodes as well.

Conclusion

Switchover is quite easy and for all the test I did so far it was really reliable. As well as the failover, here you just have to think about adjusting the parameters to your needs. Not in every case it is the best solution to wait 5 min for a failover.