Infrastructure at your Service

Daniel Westermann

Maintenance scenarios with EDB Failover Manager (3) – Witness node

In the last posts ( Maintenance scenarios with EDB Failover Manager (1) – Standby node , Maintenance scenarios with EDB Failover Manager (2) – Primary node ) we looked at how to perform maintenance operations on the master as well as on the standby node in a failover cluster managed by EDB Failober Manager. What is still open is how to perform maintenance operations on the witness node. So, lets go.

The current status of the cluster is fine:

[email protected]:/home/postgres/ [pg950] /usr/efm-2.0/bin/efm cluster-status efm
Cluster Status: efm
Automatic failover is disabled.

	Agent Type  Address              Agent  DB       Info
	--------------------------------------------------------------
	Standby     192.168.22.243       UP     UP        
	Witness     192.168.22.244       UP     N/A       
	Master      192.168.22.245       UP     UP        

Allowed node host list:
	192.168.22.244 192.168.22.243 192.168.22.245

Standby priority host list:
	192.168.22.243

Promote Status:

	DB Type     Address              XLog Loc         Info
	--------------------------------------------------------------
	Master      192.168.22.245       0/380000D0       
	Standby     192.168.22.243       0/380000D0       

	Standby database(s) in sync with master. It is safe to promote.

Probably the only situations you’ll need to take care of is when you want to reboot the witness node for any reason or when you need to take down the efm service. What happens when you stop the service on the witness?

[[email protected] ~] systemctl stop efm-2.0.service

Checking the status on either the master or the standby node:

[[email protected] efm-2.0] /usr/efm-2.0/bin/efm cluster-status efm
Cluster Status: efm
Automatic failover is disabled.

	Agent Type  Address              Agent  DB       Info
	--------------------------------------------------------------
	Standby     192.168.22.243       UP     UP        
	Master      192.168.22.245       UP     UP        

Allowed node host list:
	192.168.22.244 192.168.22.243 192.168.22.245

Standby priority host list:
	192.168.22.243

Promote Status:

	DB Type     Address              XLog Loc         Info
	--------------------------------------------------------------
	Master      192.168.22.245       0/380000D0       
	Standby     192.168.22.243       0/380000D0       

	Standby database(s) in sync with master. It is safe to promote.

The witness disappeared. How to bring it back? Re-create the efm.nodes file on the witness node:

[[email protected] efm-2.0] pwd
/etc/efm-2.0
[[email protected] efm-2.0] cat efm.nodes
# List of node address:port combinations separated by whitespace.
192.168.22.243:9998 192.168.22.244:9998 192.168.22.245:9998

Start the service:

[[email protected] efm-2.0] systemctl start efm-2.0.service

… and you’ll be back in business:

[[email protected] efm-2.0] /usr/edb-efm/bin/efm cluster-status efm 
Cluster Status: efm
Automatic failover is disabled.

	Agent Type  Address              Agent  DB       Info
	--------------------------------------------------------------
	Master      192.168.22.245       UP     UP        
	Witness     192.168.22.244       UP     N/A       
	Standby     192.168.22.243       UP     UP        

Allowed node host list:
	192.168.22.244 192.168.22.243 192.168.22.245

Standby priority host list:
	192.168.22.243

Promote Status:

	DB Type     Address              XLog Loc         Info
	--------------------------------------------------------------
	Master      192.168.22.245       0/380000D0       
	Standby     192.168.22.243       0/380000D0       

	Standby database(s) in sync with master. It is safe to promote.

What do you need to do in case you need to reboot? Exactly the same 🙂 Quite easy when you want to do maintenance on the witness node.

Btw: If you want you can meet us at the Swiss PGDAY, Agenda here

2 Comments

  • PC says:

    Hi Daniel, what will happen if the witness node has failed or is down and then the master node crashes. In that scenario, will the slave promotion to master happen automatically or no? We tested and we actually found that promotion did happen. But as you have shown above and also as per docs, the promotion (failover) should not have happened.
    Thanks

    • Daniel Westermann says:

      Hi PC,

      good question. In the example above the master always was there, so it not exactly the scenario you’re describing. I would say when the witness and the master are down what you want is that the standby/replica takes over, at least this is what I would expect. I’ll do some more tests when I have time.

      Cheers,
      Daniel

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure