Infrastructure at your Service

Some days ago EnterpriseDB released a new version of its EDB Failover Manager which brings one feature that really sounds great: “Controlled switchover and switchback for easier maintenance and disaster recovery tests”. This is exactly what you want when you are used to operate Oracle DataGuard. Switching back and forward as you like without caring much about the old master. The old master shall just be converted to a standby which follows the new master automatically. This post is about upgrading EFM from version 2.0 to 2.1.

As I still have the environment available which was used for describing the maintenance scenarios with EDB Failover Manager (Maintenance scenarios with EDB Failover Manager (1) – Standby node, Maintenance scenarios with EDB Failover Manager (2) – Primary node and Maintenance scenarios with EDB Failover Manager (3) – Witness node) I will use the same environment to upgrade to the new release. Lets start …

This is the current status of my failover cluster:

[[email protected] ~]$ /usr/edb-efm/bin/efm cluster-status efm 
Cluster Status: efm
Automatic failover is disabled.

	Agent Type  Address              Agent  DB       Info
	--------------------------------------------------------------
	Master      192.168.22.245       UP     UP        
	Witness     192.168.22.244       UP     N/A       
	Standby     192.168.22.243       UP     UP        

Allowed node host list:
	192.168.22.244 192.168.22.245 192.168.22.243

Standby priority host list:
	192.168.22.243

Promote Status:

	DB Type     Address              XLog Loc         Info
	--------------------------------------------------------------
	Master      192.168.22.245       0/3B01C5E0       
	Standby     192.168.22.243       0/3B01C5E0       

	Standby database(s) in sync with master. It is safe to promote.
[[email protected] ~]$ 

Obviously you have to download the new version to begin the upgrade. Once the rpm is available on all nodes simply install it on all the nodes:

[[email protected] tmp]$ yum localinstall efm21-2.1.0-1.rhel7.x86_64.rpm

EFM 2.1 comes with an utility command that helps in upgrading a cluster. You should invoke it on each node:

[[email protected] tmp]$ /usr/efm-2.1/bin/efm upgrade-conf efm
Processing efm.properties file.
Setting new property node.timeout to 40 (sec) based on existing timeout 5000 (ms) and max tries 8.

Processing efm.nodes file.

Upgrade of files is finished. Please ensure that the new file permissions match those of the template files before starting EFM.
The db.service.name property should be set before starting a non-witness agent.

This created a new configuration file in the new directory under /etc which was created when the new version was installed:

[[email protected] tmp]$ ls /etc/efm-2.1
efm.nodes  efm.nodes.in  efm.properties  efm.properties.in

All the values from the old EFM cluster should be there in the new configuration files:

[[email protected] efm-2.1]$ pwd
/etc/efm-2.1
[[email protected] efm-2.1]$ cat efm.properties | grep daniel
user.email=daniel.westermann...

Before going further check the new configuration parameters for EFM 2.1, which are:

auto.allow.hosts
auto.resume.period
db.service.name
jvm.options
minimum.standbys
node.timeout
promotable
recovery.check.period
script.notification
script.resumed

I’ll leave everything as it was before for now. Notice that a new service got created:

[[email protected] efm-2.1]$ systemctl list-unit-files | grep efm
efm-2.0.service                             enabled 
efm-2.1.service                             disabled

Lets try to shutdown the old service on all nodes and then start the new one. Step 1 (on all nodes):

[[email protected] efm-2.1]$ systemctl stop efm-2.0.service
[[email protected] efm-2.1]$ systemctl disable efm-2.0.service
rm '/etc/systemd/system/multi-user.target.wants/efm-2.0.service'

Then enable the new service:

[[email protected] efm-2.1]$ systemctl enable efm-2.1.service
ln -s '/usr/lib/systemd/system/efm-2.1.service' '/etc/systemd/system/multi-user.target.wants/efm-2.1.service'
[[email protected] efm-2.1]$ systemctl list-unit-files | grep efm
efm-2.0.service                             disabled
efm-2.1.service                             enabled 

Make sure your efm.nodes file contains all the nodes which make up the cluster, in my case:

[[email protected] efm-2.1]$ cat efm.nodes
# List of node address:port combinations separated by whitespace.
# The list should include at least the membership coordinator's address.
192.168.22.243:9998 192.168.22.244:9998 192.168.22.245:9998

Lets try to start the new service on the witness node first:

[[email protected] efm-2.1]$ systemctl start efm-2.1.service
[[email protected] efm-2.1]$ /usr/edb-efm/bin/efm cluster-status efm 
Cluster Status: efm
VIP: 192.168.22.250
Automatic failover is disabled.

	Agent Type  Address              Agent  DB       Info
	--------------------------------------------------------------
	Witness     192.168.22.244       UP     N/A       

Allowed node host list:
	192.168.22.244

Membership coordinator: 192.168.22.244

Standby priority host list:
	(List is empty.)

Promote Status:

Did not find XLog location for any nodes.

Looks good. Are we really running the new version?

[[email protected] efm-2.1]$ /usr/edb-efm/bin/efm -v
Failover Manager, version 2.1.0

Looks fine as well. Time to add the other nodes:

[[email protected] efm-2.1]$ /usr/edb-efm/bin/efm add-node efm 192.168.22.243
add-node signal sent to local agent.
[[email protected] efm-2.1]$ /usr/edb-efm/bin/efm add-node efm 192.168.22.245
add-node signal sent to local agent.
[[email protected] efm-2.1]$ /usr/edb-efm/bin/efm cluster-status efm 
Cluster Status: efm
VIP: 192.168.22.250
Automatic failover is disabled.

	Agent Type  Address              Agent  DB       Info
	--------------------------------------------------------------
	Witness     192.168.22.244       UP     N/A       

Allowed node host list:
	192.168.22.244 192.168.22.243

Membership coordinator: 192.168.22.244

Standby priority host list:
	(List is empty.)

Promote Status:

Did not find XLog location for any nodes.

Proceed on the master:

[[email protected] efm-2.1]$ systemctl start efm-2.1.service
[[email protected] efm-2.1]$ systemctl status efm-2.1.service
efm-2.1.service - EnterpriseDB Failover Manager 2.1
   Loaded: loaded (/usr/lib/systemd/system/efm-2.1.service; enabled)
   Active: active (running) since Thu 2016-09-08 12:04:11 CEST; 25s ago
  Process: 4020 ExecStart=/bin/bash -c /usr/efm-2.1/bin/runefm.sh start ${CLUSTER} (code=exited, status=0/SUCCESS)
 Main PID: 4075 (java)
   CGroup: /system.slice/efm-2.1.service
           └─4075 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.77-0.b03.el7_2.x86_64/jre/bin/java -cp /usr/e...

Sep 08 12:04:07 ppasstandby systemd[1]: Starting EnterpriseDB Failover Manager 2.1...
Sep 08 12:04:08 ppasstandby sudo[4087]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/efm-... efm
Sep 08 12:04:08 ppasstandby sudo[4098]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/efm-... efm
Sep 08 12:04:08 ppasstandby sudo[4114]: efm : TTY=unknown ; PWD=/ ; USER=postgres ; COMMAND=/usr/... efm
Sep 08 12:04:08 ppasstandby sudo[4125]: efm : TTY=unknown ; PWD=/ ; USER=postgres ; COMMAND=/usr/... efm
Sep 08 12:04:10 ppasstandby sudo[4165]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/efm-...9998
Sep 08 12:04:10 ppasstandby sudo[4176]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/efm-...4075
Sep 08 12:04:11 ppasstandby systemd[1]: Started EnterpriseDB Failover Manager 2.1.
Hint: Some lines were ellipsized, use -l to show in full.

And then continue on the standby:

[[email protected] efm-2.1]$ systemctl start efm-2.1.service
[[email protected] efm-2.1]$ systemctl status efm-2.1.service
efm-2.1.service - EnterpriseDB Failover Manager 2.1
   Loaded: loaded (/usr/lib/systemd/system/efm-2.1.service; enabled)
   Active: active (running) since Thu 2016-09-08 12:05:28 CEST; 3s ago
  Process: 3820 ExecStart=/bin/bash -c /usr/efm-2.1/bin/runefm.sh start ${CLUSTER} (code=exited, status=0/SUCCESS)
 Main PID: 3875 (java)
   CGroup: /system.slice/efm-2.1.service
           └─3875 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.77-0.b03.el7_2.x86_64/jre/bin/jav...

Sep 08 12:05:24 edbppas systemd[1]: Starting EnterpriseDB Failover Manager 2.1...
Sep 08 12:05:25 edbppas sudo[3887]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/u...efm
Sep 08 12:05:25 edbppas sudo[3898]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/u...efm
Sep 08 12:05:25 edbppas sudo[3914]: efm : TTY=unknown ; PWD=/ ; USER=postgres ; COMMAN...efm
Sep 08 12:05:25 edbppas sudo[3925]: efm : TTY=unknown ; PWD=/ ; USER=postgres ; COMMAN...efm
Sep 08 12:05:25 edbppas sudo[3945]: efm : TTY=unknown ; PWD=/ ; USER=postgres ; COMMAN...efm
Sep 08 12:05:28 edbppas sudo[3981]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/u...998
Sep 08 12:05:28 edbppas sudo[3994]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/u...875
Sep 08 12:05:28 edbppas systemd[1]: Started EnterpriseDB Failover Manager 2.1.
Hint: Some lines were ellipsized, use -l to show in full.

What is the cluster status now?:

[[email protected] efm-2.1]$ /usr/edb-efm/bin/efm cluster-status efm 
Cluster Status: efm
VIP: 192.168.22.250
Automatic failover is disabled.

	Agent Type  Address              Agent  DB       Info
	--------------------------------------------------------------
	Master      192.168.22.245       UP     UP        
	Witness     192.168.22.244       UP     N/A       
	Standby     192.168.22.243       UP     UP        

Allowed node host list:
	192.168.22.244 192.168.22.243 192.168.22.245

Membership coordinator: 192.168.22.244

Standby priority host list:
	192.168.22.243

Promote Status:

	DB Type     Address              XLog Loc         Info
	--------------------------------------------------------------
	Master      192.168.22.245       0/3B01C7A0       
	Standby     192.168.22.243       0/3B01C7A0       

	Standby database(s) in sync with master. It is safe to promote.

Cool. Back in operation on the new release. Quite easy.

PS: Remember to re-point your symlinks in /etc and /usr if you created symlinks for easy of use.

2 Comments

  • Bobby Bissett says:

    Very nice writeup. One minor suggestion is that, when stopping the old EFM cluster, you might want to use the “/usr/efm-2.0/bin/efm stop-cluster efm” command. It will save you a little time, but more importantly will leave all of the old efm.nodes files intact instead of rewriting them as nodes leave the cluster one at a time. You don’t need those old files obviously, but they might be useful for reference.

    Cheers,
    Bobby

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Daniel Westermann
Daniel Westermann

Principal Consultant & Technology Leader Open Infrastructure