Infrastructure at your Service

Daniel Westermann

EDB Failover Manager 2.1, (two) new features

In the last post we upgraded EDB EFM from version 2.0 to 2.1. In this post we’ll look at the new features. Actually we’ll look only at two of the new features in this post:

  • Failover Manager now simplifies cluster startup with the auto.allow.hosts property
  • efm promote now includes a -switchover option; the -switchover option instructs Failover Manager to perform a failover, promoting a Standby to Master, and then, return the Master node to the cluster as a Standby node. For more information

Lets go …

My failover cluster status is still fine:

[[email protected] efm-2.1]$ /usr/edb-efm/bin/efm cluster-status efm 
Cluster Status: efm
VIP: 192.168.22.250
Automatic failover is disabled.

	Agent Type  Address              Agent  DB       Info
	--------------------------------------------------------------
	Witness     192.168.22.244       UP     N/A       
	Standby     192.168.22.243       UP     UP        
	Master      192.168.22.245       UP     UP        

Allowed node host list:
	192.168.22.244 192.168.22.243 192.168.22.245

Membership coordinator: 192.168.22.244

Standby priority host list:
	192.168.22.243

Promote Status:

	DB Type     Address              XLog Loc         Info
	--------------------------------------------------------------
	Master      192.168.22.245       0/3C000220       
	Standby     192.168.22.243       0/3C000220       

	Standby database(s) in sync with master. It is safe to promote.

The first bit we’re going to change is the auto.allow.hosts on the database servers. According to the documentation this should eliminate the need to allow the hosts to join the cluster and registration should happen automatically. So, lets change it from “false” to “true” on all nodes:

[[email protected] efm-2.1]$ grep allow.hosts efm.properties
auto.allow.hosts=true

… and then lets add all nodes to the efm.nodes files on the witness:

[[email protected] efm-2.1]$ cat efm.nodes
# List of node address:port combinations separated by whitespace.
# The list should include at least the membership coordinator's address.
192.168.22.244:9998 192.168.22.243:9998 192.168.22.245:9998

When we now shutdown the EFM service on all hosts and bring it up again on the witness what is the result?

[[email protected] efm-2.1]$ systemctl stop efm-2.1.service  # do this on all hosts

Lets start on the witness again:

[[email protected] efm-2.1]$ systemctl start efm-2.1.service
[[email protected] efm-2.1]$ /usr/edb-efm/bin/efm cluster-status efm 
Cluster Status: efm
VIP: 192.168.22.250
Automatic failover is disabled.

	Agent Type  Address              Agent  DB       Info
	--------------------------------------------------------------
	Witness     192.168.22.244       UP     N/A       

Allowed node host list:
	192.168.22.244 192.168.22.243 192.168.22.245

Membership coordinator: 192.168.22.244

Standby priority host list:
	(List is empty.)

Promote Status:

Did not find XLog location for any nodes.

So far so good, all nodes are in the “Allowed” list. What happens when we start EFM on the current primary node:

[[email protected] efm-2.1]$  systemctl start efm-2.1.service
[[email protected] efm-2.1]$ 

We should see the node as a member now without explicitly allowing it to join:

[[email protected] efm-2.1]$ /usr/edb-efm/bin/efm cluster-status efm 
Cluster Status: efm
VIP: 192.168.22.250
Automatic failover is disabled.

	Agent Type  Address              Agent  DB       Info
	--------------------------------------------------------------
	Witness     192.168.22.244       UP     N/A       
	Master      192.168.22.245       UP     UP        

Allowed node host list:
	192.168.22.244 192.168.22.243 192.168.22.245

Membership coordinator: 192.168.22.244

Standby priority host list:
	(List is empty.)

Promote Status:

	DB Type     Address              XLog Loc         Info
	--------------------------------------------------------------
	Master      192.168.22.245       0/3D000060       

	No standby databases were found.

Cool, same on the standby node:

[[email protected] edb-efm]$ cat efm.nodes
# List of node address:port combinations separated by whitespace.
# The list should include at least the membership coordinator's address.
192.168.22.244:9998
[[email protected] edb-efm]$  systemctl start efm-2.1.servic

What is the status:

[[email protected] efm-2.1]$ /usr/edb-efm/bin/efm cluster-status efm 
Cluster Status: efm
VIP: 192.168.22.250
Automatic failover is disabled.

	Agent Type  Address              Agent  DB       Info
	--------------------------------------------------------------
	Witness     192.168.22.244       UP     N/A       
	Master      192.168.22.245       UP     UP        
	Standby     192.168.22.243       UP     UP        

Allowed node host list:
	192.168.22.244 192.168.22.243 192.168.22.245

Membership coordinator: 192.168.22.244

Standby priority host list:
	192.168.22.243

Promote Status:

	DB Type     Address              XLog Loc         Info
	--------------------------------------------------------------
	Master      192.168.22.245       0/3D000060       
	Standby     192.168.22.243       0/3D000060       

	Standby database(s) in sync with master. It is safe to promote.

Perfect. Makes it a bit easier and fewer things to remember to bring up a failover cluster.

Coming to the “big” new feature (at least in my opinion): Switching to the standby and making the old master automatically a new standby which follows the new master. According to the docs all we need to do is this:

[[email protected] efm-2.1]$ /usr/edb-efm/bin/efm promote efm -switchover

Does it really work?

[[email protected] efm-2.1]$ /usr/edb-efm/bin/efm promote efm -switchover
Promote/switchover command accepted by local agent. Proceeding with promotion and will reconfigure original master. Run the 'cluster-status' command for information about the new cluster state.

Hm, lets check the status:

[[email protected] efm-2.1]$ /usr/edb-efm/bin/efm cluster-status efm 
Cluster Status: efm
VIP: 192.168.22.250
Automatic failover is disabled.

	Agent Type  Address              Agent  DB       Info
	--------------------------------------------------------------
	Witness     192.168.22.244       UP     N/A       
	Master      192.168.22.243       UP     UP        
	Standby     192.168.22.245       UP     UP        

Allowed node host list:
	192.168.22.244 192.168.22.245 192.168.22.243

Membership coordinator: 192.168.22.244

Standby priority host list:
	192.168.22.245

Promote Status:

	DB Type     Address              XLog Loc         Info
	--------------------------------------------------------------
	Master      192.168.22.243       0/410000D0       
	Standby     192.168.22.245       0/410000D0       

	Standby database(s) in sync with master. It is safe to promote.

It really worked! And backwards:

[[email protected] ~]$ /usr/edb-efm/bin/efm promote efm -switchover
Promote/switchover command accepted by local agent. Proceeding with promotion and will reconfigure original master. Run the 'cluster-status' command for information about the new cluster state.

[[email protected] ~]$ /usr/edb-efm/bin/efm cluster-status efm
Cluster Status: efm
VIP: 192.168.22.250
Automatic failover is disabled.

    Agent Type  Address              Agent  DB       Info
    --------------------------------------------------------------
    Witness     192.168.22.244       UP     N/A       
    Standby     192.168.22.243       UP     UP        
    Master      192.168.22.245       UP     UP        

Allowed node host list:
    192.168.22.244 192.168.22.245 192.168.22.243

Membership coordinator: 192.168.22.244

Standby priority host list:
    192.168.22.243

Promote Status:

    DB Type     Address              XLog Loc         Info
    --------------------------------------------------------------
    Master      192.168.22.245       0/480001A8       
    Standby     192.168.22.243       0/480001A8       

    Standby database(s) in sync with master. It is safe to promote.

Cool, that is really a great new feature.

3 Comments

  • Olli Leivo says:

    Could you please share your archiving_command and recovery.conf on setup which you used?
    I think that those details are good to know when doing configs.

    • Daniel Westermann says:

      Hi Olli,

      my archive_command just copies to the BART server in this case:

      postgres=# show archive_command ;
      archive_command
      ———————————————————————
      scp %p [email protected]:/u90/pgdata/backup/pgsite1/archived_wals/%f
      (1 row)

      The recovery.conf looks like this:

      [email protected]:/u02/pgdata/PGSITE1/ [PGSITE1] cat recovery.conf
      standby_mode = ‘on’
      primary_slot_name = ‘standby1’
      primary_conninfo = ‘user=postgres password=xxxx host=192.168.22.245 port=4445 sslmode=prefer sslcompression=1’
      recovery_target_timeline = ‘latest’
      trigger_file=’/u02/pgdata/PGSITE2/trigger_file’

      Cheers,
      Daniel

  • guest says:

    Hello. I’m using EFM 2.1.
    When my master db stoped (pg_ctl stop), my standby db stand master. It’s OK.
    But when my master server stop network, standby db don’t master. Please, help me.
    My efm.properties all server:
    auto.allow.hosts=true
    auto.failover=true
    auto.reconfigure=true
    promotable=true

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure