Some days ago EnterpriseDB released a new version of its EDB Failover Manager which brings one feature that really sounds great: “Controlled switchover and switchback for easier maintenance and disaster recovery tests”. This is exactly what you want when you are used to operate Oracle DataGuard. Switching back and forward as you like without caring much about the old master. The old master shall just be converted to a standby which follows the new master automatically. This post is about upgrading EFM from version 2.0 to 2.1.
As I still have the environment available which was used for describing the maintenance scenarios with EDB Failover Manager (Maintenance scenarios with EDB Failover Manager (1) – Standby node, Maintenance scenarios with EDB Failover Manager (2) – Primary node and Maintenance scenarios with EDB Failover Manager (3) – Witness node) I will use the same environment to upgrade to the new release. Lets start …
This is the current status of my failover cluster:
[[email protected] ~]$ /usr/edb-efm/bin/efm cluster-status efm Cluster Status: efm Automatic failover is disabled. Agent Type Address Agent DB Info -------------------------------------------------------------- Master 192.168.22.245 UP UP Witness 192.168.22.244 UP N/A Standby 192.168.22.243 UP UP Allowed node host list: 192.168.22.244 192.168.22.245 192.168.22.243 Standby priority host list: 192.168.22.243 Promote Status: DB Type Address XLog Loc Info -------------------------------------------------------------- Master 192.168.22.245 0/3B01C5E0 Standby 192.168.22.243 0/3B01C5E0 Standby database(s) in sync with master. It is safe to promote. [[email protected] ~]$
Obviously you have to download the new version to begin the upgrade. Once the rpm is available on all nodes simply install it on all the nodes:
[[email protected] tmp]$ yum localinstall efm21-2.1.0-1.rhel7.x86_64.rpm
EFM 2.1 comes with an utility command that helps in upgrading a cluster. You should invoke it on each node:
[[email protected] tmp]$ /usr/efm-2.1/bin/efm upgrade-conf efm Processing efm.properties file. Setting new property node.timeout to 40 (sec) based on existing timeout 5000 (ms) and max tries 8. Processing efm.nodes file. Upgrade of files is finished. Please ensure that the new file permissions match those of the template files before starting EFM. The db.service.name property should be set before starting a non-witness agent.
This created a new configuration file in the new directory under /etc which was created when the new version was installed:
[[email protected] tmp]$ ls /etc/efm-2.1 efm.nodes efm.nodes.in efm.properties efm.properties.in
All the values from the old EFM cluster should be there in the new configuration files:
[[email protected] efm-2.1]$ pwd /etc/efm-2.1 [[email protected] efm-2.1]$ cat efm.properties | grep daniel user.email=daniel.westermann...
Before going further check the new configuration parameters for EFM 2.1, which are:
auto.allow.hosts auto.resume.period db.service.name jvm.options minimum.standbys node.timeout promotable recovery.check.period script.notification script.resumed
I’ll leave everything as it was before for now. Notice that a new service got created:
[[email protected] efm-2.1]$ systemctl list-unit-files | grep efm efm-2.0.service enabled efm-2.1.service disabled
Lets try to shutdown the old service on all nodes and then start the new one. Step 1 (on all nodes):
[[email protected] efm-2.1]$ systemctl stop efm-2.0.service [[email protected] efm-2.1]$ systemctl disable efm-2.0.service rm '/etc/systemd/system/multi-user.target.wants/efm-2.0.service'
Then enable the new service:
[[email protected] efm-2.1]$ systemctl enable efm-2.1.service ln -s '/usr/lib/systemd/system/efm-2.1.service' '/etc/systemd/system/multi-user.target.wants/efm-2.1.service' [[email protected] efm-2.1]$ systemctl list-unit-files | grep efm efm-2.0.service disabled efm-2.1.service enabled
Make sure your efm.nodes file contains all the nodes which make up the cluster, in my case:
[[email protected] efm-2.1]$ cat efm.nodes # List of node address:port combinations separated by whitespace. # The list should include at least the membership coordinator's address. 192.168.22.243:9998 192.168.22.244:9998 192.168.22.245:9998
Lets try to start the new service on the witness node first:
[[email protected] efm-2.1]$ systemctl start efm-2.1.service [[email protected] efm-2.1]$ /usr/edb-efm/bin/efm cluster-status efm Cluster Status: efm VIP: 192.168.22.250 Automatic failover is disabled. Agent Type Address Agent DB Info -------------------------------------------------------------- Witness 192.168.22.244 UP N/A Allowed node host list: 192.168.22.244 Membership coordinator: 192.168.22.244 Standby priority host list: (List is empty.) Promote Status: Did not find XLog location for any nodes.
Looks good. Are we really running the new version?
[[email protected] efm-2.1]$ /usr/edb-efm/bin/efm -v Failover Manager, version 2.1.0
Looks fine as well. Time to add the other nodes:
[[email protected] efm-2.1]$ /usr/edb-efm/bin/efm add-node efm 192.168.22.243 add-node signal sent to local agent. [[email protected] efm-2.1]$ /usr/edb-efm/bin/efm add-node efm 192.168.22.245 add-node signal sent to local agent. [[email protected] efm-2.1]$ /usr/edb-efm/bin/efm cluster-status efm Cluster Status: efm VIP: 192.168.22.250 Automatic failover is disabled. Agent Type Address Agent DB Info -------------------------------------------------------------- Witness 192.168.22.244 UP N/A Allowed node host list: 192.168.22.244 192.168.22.243 Membership coordinator: 192.168.22.244 Standby priority host list: (List is empty.) Promote Status: Did not find XLog location for any nodes.
Proceed on the master:
[[email protected] efm-2.1]$ systemctl start efm-2.1.service [[email protected] efm-2.1]$ systemctl status efm-2.1.service efm-2.1.service - EnterpriseDB Failover Manager 2.1 Loaded: loaded (/usr/lib/systemd/system/efm-2.1.service; enabled) Active: active (running) since Thu 2016-09-08 12:04:11 CEST; 25s ago Process: 4020 ExecStart=/bin/bash -c /usr/efm-2.1/bin/runefm.sh start ${CLUSTER} (code=exited, status=0/SUCCESS) Main PID: 4075 (java) CGroup: /system.slice/efm-2.1.service └─4075 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.77-0.b03.el7_2.x86_64/jre/bin/java -cp /usr/e... Sep 08 12:04:07 ppasstandby systemd[1]: Starting EnterpriseDB Failover Manager 2.1... Sep 08 12:04:08 ppasstandby sudo[4087]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/efm-... efm Sep 08 12:04:08 ppasstandby sudo[4098]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/efm-... efm Sep 08 12:04:08 ppasstandby sudo[4114]: efm : TTY=unknown ; PWD=/ ; USER=postgres ; COMMAND=/usr/... efm Sep 08 12:04:08 ppasstandby sudo[4125]: efm : TTY=unknown ; PWD=/ ; USER=postgres ; COMMAND=/usr/... efm Sep 08 12:04:10 ppasstandby sudo[4165]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/efm-...9998 Sep 08 12:04:10 ppasstandby sudo[4176]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/efm-...4075 Sep 08 12:04:11 ppasstandby systemd[1]: Started EnterpriseDB Failover Manager 2.1. Hint: Some lines were ellipsized, use -l to show in full.
And then continue on the standby:
[[email protected] efm-2.1]$ systemctl start efm-2.1.service [[email protected] efm-2.1]$ systemctl status efm-2.1.service efm-2.1.service - EnterpriseDB Failover Manager 2.1 Loaded: loaded (/usr/lib/systemd/system/efm-2.1.service; enabled) Active: active (running) since Thu 2016-09-08 12:05:28 CEST; 3s ago Process: 3820 ExecStart=/bin/bash -c /usr/efm-2.1/bin/runefm.sh start ${CLUSTER} (code=exited, status=0/SUCCESS) Main PID: 3875 (java) CGroup: /system.slice/efm-2.1.service └─3875 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.77-0.b03.el7_2.x86_64/jre/bin/jav... Sep 08 12:05:24 edbppas systemd[1]: Starting EnterpriseDB Failover Manager 2.1... Sep 08 12:05:25 edbppas sudo[3887]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/u...efm Sep 08 12:05:25 edbppas sudo[3898]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/u...efm Sep 08 12:05:25 edbppas sudo[3914]: efm : TTY=unknown ; PWD=/ ; USER=postgres ; COMMAN...efm Sep 08 12:05:25 edbppas sudo[3925]: efm : TTY=unknown ; PWD=/ ; USER=postgres ; COMMAN...efm Sep 08 12:05:25 edbppas sudo[3945]: efm : TTY=unknown ; PWD=/ ; USER=postgres ; COMMAN...efm Sep 08 12:05:28 edbppas sudo[3981]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/u...998 Sep 08 12:05:28 edbppas sudo[3994]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/u...875 Sep 08 12:05:28 edbppas systemd[1]: Started EnterpriseDB Failover Manager 2.1. Hint: Some lines were ellipsized, use -l to show in full.
What is the cluster status now?:
[[email protected] efm-2.1]$ /usr/edb-efm/bin/efm cluster-status efm Cluster Status: efm VIP: 192.168.22.250 Automatic failover is disabled. Agent Type Address Agent DB Info -------------------------------------------------------------- Master 192.168.22.245 UP UP Witness 192.168.22.244 UP N/A Standby 192.168.22.243 UP UP Allowed node host list: 192.168.22.244 192.168.22.243 192.168.22.245 Membership coordinator: 192.168.22.244 Standby priority host list: 192.168.22.243 Promote Status: DB Type Address XLog Loc Info -------------------------------------------------------------- Master 192.168.22.245 0/3B01C7A0 Standby 192.168.22.243 0/3B01C7A0 Standby database(s) in sync with master. It is safe to promote.
Cool. Back in operation on the new release. Quite easy.
PS: Remember to re-point your symlinks in /etc and /usr if you created symlinks for easy of use.
Very nice writeup. One minor suggestion is that, when stopping the old EFM cluster, you might want to use the “/usr/efm-2.0/bin/efm stop-cluster efm” command. It will save you a little time, but more importantly will leave all of the old efm.nodes files intact instead of rewriting them as nodes leave the cluster one at a time. You don’t need those old files obviously, but they might be useful for reference.
Cheers,
Bobby
Thanks for the hint, Bobby
Cheers,
Daniel