Infrastructure at your Service

Julia Gugel

Patroni Operations – switchover and failover

In this post we will have a look at switchover and failover of a Patroni cluster. As well as a look at the maintenance mode Patroni offers, which gives the opportunity to prevent from an automatic failover.

Switchover

There are two possibilities to run a switchover, either in scheduled mode or immediately.

1. Scheduled Switchover

[email protected]:/home/postgres/ [PG1] patronictl switchover
Master [patroni1]:
Candidate ['patroni2', 'patroni3'] []: patroni2
When should the switchover take place (e.g. 2019-10-08T11:31 )  [now]: 2019-10-08T10:32
Current cluster topology
+---------+----------+----------------+--------+---------+----+-----------+
| Cluster |  Member  |      Host      |  Role  |  State  | TL | Lag in MB |
+---------+----------+----------------+--------+---------+----+-----------+
|   PG1   | patroni1 | 192.168.22.111 | Leader | running |  2 |       0.0 |
|   PG1   | patroni2 | 192.168.22.112 |        | running |  2 |       0.0 |
|   PG1   | patroni3 | 192.168.22.113 |        | running |  2 |       0.0 |
+---------+----------+----------------+--------+---------+----+-----------+
Are you sure you want to schedule switchover of cluster PG1 at 2019-10-08T10:32:00+02:00, demoting current master patroni1? [y/N]: y
2019-10-08 10:31:14.89236 Switchover scheduled
+---------+----------+----------------+--------+---------+----+-----------+
| Cluster |  Member  |      Host      |  Role  |  State  | TL | Lag in MB |
+---------+----------+----------------+--------+---------+----+-----------+
|   PG1   | patroni1 | 192.168.22.111 | Leader | running |  2 |       0.0 |
|   PG1   | patroni2 | 192.168.22.112 |        | running |  2 |       0.0 |
|   PG1   | patroni3 | 192.168.22.113 |        | running |  2 |       0.0 |
+---------+----------+----------------+--------+---------+----+-----------+
 Switchover scheduled at: 2019-10-08T10:32:00+02:00
                    from: patroni1
                      to: patroni2
[email protected]:/home/postgres/ [PG1]

That’s it. At the given time, the switchover will take place. All you see in the logfile is an entry like this

Oct  8 10:32:00 patroni1 patroni: 2019-10-08 10:32:00,006 INFO: Manual scheduled failover at 2019-10-08T10:32:00+02:00
Oct  8 10:32:00 patroni1 patroni: 2019-10-08 10:32:00,016 INFO: Got response from patroni2 http://192.168.22.112:8008/patroni: {"database_system_identifier": "6745341072751547355", "postmaster_start_time": "2019-10-08 10:09:40.217 CEST", "timeline": 2, "cluster_unlocked": false, "patroni": {"scope": "PG1", "version": "1.6.0"}, "state": "running", "role": "replica", "xlog": {"received_location": 83886560, "replayed_timestamp": null, "paused": false, "replayed_location": 83886560}, "server_version": 110005}
Oct  8 10:32:00 patroni1 patroni: 2019-10-08 10:32:00,113 INFO: manual failover: demoting myself
Oct  8 10:32:01 patroni1 patroni: 2019-10-08 10:32:01,256 INFO: Leader key released
Oct  8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03,271 INFO: Local timeline=2 lsn=0/6000028
Oct  8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03,279 INFO: master_timeline=3
Oct  8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03,281 INFO: master: history=1#0110/5000098#011no recovery target specified
Oct  8 10:32:03 patroni1 patroni: 2#0110/6000098#011no recovery target specified
Oct  8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03,282 INFO: closed patroni connection to the postgresql cluster
Oct  8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03,312 INFO: postmaster pid=11537
Oct  8 10:32:03 patroni1 patroni: 192.168.22.111:5432 - no response
Oct  8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03.325 CEST - 1 - 11537 -  - @ - 0LOG:  listening on IPv4 address "192.168.22.111", port 5432
Oct  8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03.328 CEST - 2 - 11537 -  - @ - 0LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
Oct  8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03.339 CEST - 3 - 11537 -  - @ - 0LOG:  redirecting log output to logging collector process
Oct  8 10:32:03 patroni1 patroni: 2019-10-08 10:32:03.339 CEST - 4 - 11537 -  - @ - 0HINT:  Future log output will appear in directory "pg_log".
Oct  8 10:32:04 patroni1 patroni: 192.168.22.111:5432 - accepting connections
Oct  8 10:32:04 patroni1 patroni: 192.168.22.111:5432 - accepting connections
Oct  8 10:32:04 patroni1 patroni: 2019-10-08 10:32:04,895 INFO: Lock owner: patroni2; I am patroni1
Oct  8 10:32:04 patroni1 patroni: 2019-10-08 10:32:04,895 INFO: does not have lock
Oct  8 10:32:04 patroni1 patroni: 2019-10-08 10:32:04,896 INFO: establishing a new patroni connection to the postgres cluster

2. Immediate switchover

Here you start the same way as for planned switchover, but the switchover will take place immediatelly.

[email protected]:/home/postgres/ [PG1] patronictl list
+---------+----------+----------------+--------+---------+----+-----------+
| Cluster |  Member  |      Host      |  Role  |  State  | TL | Lag in MB |
+---------+----------+----------------+--------+---------+----+-----------+
|   PG1   | patroni1 | 192.168.22.111 |        | running |  1 |       0.0 |
|   PG1   | patroni2 | 192.168.22.112 | Leader | running |  1 |       0.0 |
|   PG1   | patroni3 | 192.168.22.113 |        | running |  1 |       0.0 |
+---------+----------+----------------+--------+---------+----+-----------+
[email protected]:/home/postgres/ [PG1] patronictl switchover
Master [patroni2]:
Candidate ['patroni1', 'patroni3'] []: patroni1
When should the switchover take place (e.g. 2019-10-08T11:09 )  [now]:
Current cluster topology
+---------+----------+----------------+--------+---------+----+-----------+
| Cluster |  Member  |      Host      |  Role  |  State  | TL | Lag in MB |
+---------+----------+----------------+--------+---------+----+-----------+
|   PG1   | patroni1 | 192.168.22.111 |        | running |  1 |       0.0 |
|   PG1   | patroni2 | 192.168.22.112 | Leader | running |  1 |       0.0 |
|   PG1   | patroni3 | 192.168.22.113 |        | running |  1 |       0.0 |
+---------+----------+----------------+--------+---------+----+-----------+
Are you sure you want to switchover cluster PG1, demoting current master patroni2? [y/N]: y
2019-10-08 10:09:38.88046 Successfully switched over to "patroni1"
+---------+----------+----------------+--------+---------+----+-----------+
| Cluster |  Member  |      Host      |  Role  |  State  | TL | Lag in MB |
+---------+----------+----------------+--------+---------+----+-----------+
|   PG1   | patroni1 | 192.168.22.111 | Leader | running |  1 |           |
|   PG1   | patroni2 | 192.168.22.112 |        | stopped |    |   unknown |
|   PG1   | patroni3 | 192.168.22.113 |        | running |  1 |       0.0 |
+---------+----------+----------------+--------+---------+----+-----------+
[email protected]:/home/postgres/ [PG1] patronictl list
+---------+----------+----------------+--------+---------+----+-----------+
| Cluster |  Member  |      Host      |  Role  |  State  | TL | Lag in MB |
+---------+----------+----------------+--------+---------+----+-----------+
|   PG1   | patroni1 | 192.168.22.111 | Leader | running |  2 |       0.0 |
|   PG1   | patroni2 | 192.168.22.112 |        | running |  2 |       0.0 |
|   PG1   | patroni3 | 192.168.22.113 |        | running |  2 |       0.0 |
+---------+----------+----------------+--------+---------+----+-----------+
[email protected]:/home/postgres/ [PG1]

Failover

In difference to the switchover, the failover is executed automatically, when the Leader node is getting unavailable for unplanned reason.
You can only adjust some database parameter to affect the failover.

The parameters for failover arre also managed using patronictl. But they are not in the parameter section, they are above. so let’s say, we adjust one parameter and add one paramter to not use the default anymore.

[email protected]:/u01/app/postgres/local/dmk/etc/ [PG1] patronictl edit-config
[email protected]:/u01/app/postgres/local/dmk/etc/ [PG1] patronictl edit-config
---
+++
@@ -1,5 +1,6 @@
-loop_wait: 7
+loop_wait: 10
 maximum_lag_on_failover: 1048576
+master_start_timeout: 240
 postgresql:
   parameters:
     archive_command: /bin/true

Apply these changes? [y/N]: y
Configuration changed

Afterwards there is no need to restart the database. Changes take affect immediately. So the failover can be configured according to every special need. A list of all possible parameter changes can be found here .

Maintenance mode

In some cases it is necessary to do maintenance on a single node and you do not want Patroni to manage the cluster. This can be needed for e.g. release updates.
When Patroni paused, it won’t change the state of PostgeSQL. For example it will not try to start the cluster when it is stopped.

So let’s do an example. We will pause the cluster, stop the replica, upgrade from 9.6.8 to 9.6.13 and afterwards start the replica again. In case we do not pause the replica, the database will be started automatically by Patroni.

[email protected]:/home/postgres/ [PG1] patronictl pause
Success: cluster management is paused
You have new mail in /var/spool/mail/opendb
[email protected]:/home/postgres/ [PG1] patronictl list
+---------+----------+----------------+--------+---------+----+-----------+
| Cluster |  Member  |      Host      |  Role  |  State  | TL | Lag in MB |
+---------+----------+----------------+--------+---------+----+-----------+
|   PG1   | patroni1 | 192.168.22.111 | Leader | running |  2 |       0.0 |
|   PG1   | patroni2 | 192.168.22.112 |        | running |  2 |       0.0 |
|   PG1   | patroni3 | 192.168.22.113 |        | running |  2 |       0.0 |
+---------+----------+----------------+--------+---------+----+-----------+
 Maintenance mode: on

On the replica

[email protected]:/home/postgres/ [PG1] pg_ctl stop -D /u02/pgdata/96/PG1/ -m fast

[email protected]:/home/postgres/ [PG1] export PATH= /u01/app/postgres/product/PG96/db_13/bin:$PATH
[email protected]:/home/postgres/ [PG1] export PORT=5432
[email protected]:/home/postgres/ [PG1] which pg_ctl
/u01/app/opendb/product/PG96/db_13/bin/pg_ctl

[email protected]:/home/postgres/ [PG1] pg_ctl -D /u02/pgdata/96/PG1 start
server starting
[email protected]:/home/postgres/ [PG1] 2019-10-08 17:25:28.358 CEST - 1 - 23192 -  - @ - 0LOG:  redirecting log output to logging collector process
2019-10-08 17:25:28.358 CEST - 2 - 23192 -  - @ - 0HINT:  Future log output will appear in directory "pg_log".

[email protected]:/home/postgres/ [PG1] psql -c "select version()" postgres
                                                           version
------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.6.13 dbi services build on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit
(1 row)

[email protected]:/home/postgres/ [PG1] patronictl resume
Success: cluster management is resumed

[email protected]:/home/postgres/ [PG1] patronictl list
+---------+----------+----------------+--------+---------+----+-----------+
| Cluster |  Member  |      Host      |  Role  |  State  | TL | Lag in MB |
+---------+----------+----------------+--------+---------+----+-----------+
|   PG1   | patroni1 | 192.168.22.111 | Leader | running |  5 |       0.0 |
|   PG1   | patroni2 | 192.168.22.112 |        | running |  5 |       0.0 |
|   PG1   | patroni3 | 192.168.22.113 |        | running |  5 |       0.0 |
+---------+----------+----------------+--------+---------+----+-----------+

You can do this on the other nodes as well.

Conclusion

Switchover is quite easy and for all the test I did so far it was really reliable. As well as the failover, here you just have to think about adjusting the parameters to your needs. Not in every case it is the best solution to wait 5 min for a failover.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Julia Gugel
Julia Gugel

Consultant