By Mouhamadou Diaw
An observer is an OCI client that connects to the primary and target standby databases using the same SYS credentials you used when you connected to the Oracle Data Guard configuration with DGMGRL.
The observer is highly recommended in a Data Guard environment. But it is mandatory if a Fast-Start Failover is configured.
Since Oracle 12.2 we can have up to 3 observers and the maximum number of observers is increased to 4 since Oracle 21c. One important thing is that even if we have multiple observers, only one observer is the master and all other are backup observers. Only the master observer can initiate a fast-start failover process.
The question we often ask is where to host my observers. Does the support of multiple observers close this question?
In this blog I am trying to test many scenarios so that we will have an idea of where to put my observers.
I will suppose that I have 3 datacenters
-The primary datacenter hosting the primary server oraadserver
-The secondary datacenter hosting the primary server oraadserver1
-The third datacenter where I have the server oraadserver3 I can use for observer for example
The fast-start failover is already configured, and I have 3 observers
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
|
DGMGRL> show configuration verbose Configuration - db21 Protection Mode: MaxPerformance Members: DB21_SITE1 - Primary database DB21_SITE2 - (*) Physical standby database (*) Fast-Start Failover target Properties: FastStartFailoverThreshold = '15' OperationTimeout = '30' TraceLevel = 'USER' FastStartFailoverLagLimit = '30' CommunicationTimeout = '180' ObserverReconnect = '0' ObserverPingInterval = '0' ObserverPingRetry = '0' FastStartFailoverAutoReinstate = 'TRUE' FastStartFailoverPmyShutdown = 'TRUE' BystandersFollowRoleChange = 'ALL' ObserverOverride = 'FALSE' ExternalDestination1 = '' ExternalDestination2 = '' PrimaryLostWriteAction = 'CONTINUE' ConfigurationWideServiceName = 'DB21_CFG' ConfigurationSimpleName = 'db21' DrainTimeout = '0' Fast-Start Failover: Enabled in Potential Data Loss Mode Lag Limit: 30 seconds Threshold: 15 seconds Ping Interval: 3000 milliseconds Ping Retry: 0 Active Target: DB21_SITE2 Potential Targets: "DB21_SITE2" DB21_SITE2 valid Observers: (*) oraadserver1 oraadserver21 oraadserver31 Shutdown Primary : TRUE Auto-reinstate: TRUE Observer Reconnect: (none) Observer Override: FALSE Configuration Status: SUCCESS DGMGRL> |
Case 1 : The master observer is running on oraadserver so the observer is located in the primary datacenter
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
|
DGMGRL> show observer Configuration - db21 Fast-Start Failover: ENABLED Primary : DB21_SITE1 Active Target: DB21_SITE2 Observer "oraadserver1" - Master Host Name : oraadserver Last Ping to Primary : 0 seconds ago Last Ping to Target: 0 seconds ago Log File: /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver.log State File: /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.dat Observer "oraadserver21" - Backup Host Name : oraadserver2 Last Ping to Primary : 2 seconds ago Last Ping to Target: 2 seconds ago Log File: /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver2.log State File: /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.dat Observer "oraadserver31" - Backup Host Name : oraadserver3 Last Ping to Primary : 0 seconds ago Last Ping to Target: 2 seconds ago Log File: /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver3.log State File: /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.dat DGMGRL> |
The first test I am doing is to simulate the loss of the first datacenter and to see if a fast-start failover will happen. The loss of the primary datacenter means that I lose both primary database and master observer
Ok let’s poweroff the primary server
1
|
[root@oraadserver ~] # poweroff |
In the logfile of one observer located in a remaining datacenter (oraadserver3) we can see following lines
1
2
3
4
5
6
7
8
9
10
11
|
[W000 2022-04-15T12:48:16.563+02:00] Primary database cannot be reached. [W000 2022-04-15T12:48:16.563+02:00] Fast-Start Failover threshold has not exceeded. Retry for the next 2 seconds [W000 2022-04-15T12:48:17.563+02:00] Try to connect to the primary. [W000 2022-04-15T12:48:19.891+02:00] Primary database cannot be reached. [W000 2022-04-15T12:48:19.891+02:00] Fast-Start Failover threshold has expired. [W000 2022-04-15T12:48:19.891+02:00] Try to connect to the standby. [W000 2022-04-15T12:48:19.891+02:00] Check if the standby is ready for failover. [W000 2022-04-15T12:48:19.899+02:00] Fast-Start Failover is not possible because this observer is not the master. [W000 2022-04-15T12:48:20.902+02:00] Try to connect to the primary. [W000 2022-04-15T12:48:28.908+02:00] Primary database cannot be reached. [W000 2022-04-15T12:48:28.908+02:00] Fast-Start Failover threshold has not exceeded. Retry for the next 7 seconds |
As expected, the fast_start failover did not happen because the master observer was down. But the question is why another observer was not promoted as a master. Yes I have 3 observers, I am expecting that when the master crash that a backup observer will become the master.
I then restart the primary server and confirm that the db_site1 is still the primary database
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
DGMGRL> show configuration Configuration - db21 Protection Mode: MaxPerformance Members: DB21_SITE1 - Primary database DB21_SITE2 - (*) Physical standby database Fast-Start Failover: Enabled in Potential Data Loss Mode Configuration Status: SUCCESS (status updated 31 seconds ago) DGMGRL> |
Ok, we restart everything and still have the master observer in the primary datacenter
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
DGMGRL> show fast_start failover Fast-Start Failover: Enabled in Potential Data Loss Mode Protection Mode: MaxPerformance Lag Limit: 30 seconds Threshold: 15 seconds Ping Interval: 3000 milliseconds Ping Retry: 0 Active Target: DB21_SITE2 Potential Targets: "DB21_SITE2" DB21_SITE2 valid Observers: (*) oraadserver1 oraadserver21 oraadserver31 Shutdown Primary : TRUE Auto-reinstate: TRUE Observer Reconnect: (none) Observer Override: FALSE Configurable Failover Conditions Health Conditions: Corrupted Controlfile YES Corrupted Dictionary YES Inaccessible Logfile NO Stuck Archiver NO Datafile Write Errors YES Oracle Error Conditions: (none) DGMGRL> |
And let’s kill the observer without crashing the datacenter (we only crash the observer not the primary database)
1
2
3
4
5
6
7
8
|
[oracle@oraadserver ~]$ ps -ef | grep -i observer oracle 12816 1 0 12:55 ? 00:00:01 /u01/app/oracle/product/dbhome_1/bin/dgmgrl START OBSERVER NONAME FILE IS 'fsfo.dat' oracle 12988 12959 0 12:57 pts /2 00:00:00 grep --color=auto -i observer [oracle@oraadserver ~]$ [oracle@oraadserver ~]$ kill -9 12816 [oracle@oraadserver ~]$ |
We can see in this case that the observer located in another datacenter was promoted to a master one as few minutes after. A fast-start failover will happen if now we crash the primary datacenter.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
DGMGRL> show fast_start failover Fast-Start Failover: Enabled in Potential Data Loss Mode Protection Mode: MaxPerformance Lag Limit: 30 seconds Threshold: 15 seconds Ping Interval: 3000 milliseconds Ping Retry: 0 Active Target: DB21_SITE2 Potential Targets: "DB21_SITE2" DB21_SITE2 valid Observers: (*) oraadserver21 oraadserver1 oraadserver31 Shutdown Primary : TRUE Auto-reinstate: TRUE Observer Reconnect: (none) Observer Override: FALSE Configurable Failover Conditions Health Conditions: Corrupted Controlfile YES Corrupted Dictionary YES Inaccessible Logfile NO Stuck Archiver NO Datafile Write Errors YES Oracle Error Conditions: (none) DGMGRL> |
So seems that if we lose at the same time the master observer and the primary database, no backup observer is promoted to a master.
Case 2 : The master observer is running on oraadserver2 so the observer is located in the secondary datacenter
In this second test, the master observer is in the same datacenter that the standby database. Let’s simulate a crash of the secondary datacenter by crashing the standby server and see what happens
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
DGMGRL> show fast_start failover; Fast-Start Failover: Enabled in Potential Data Loss Mode Protection Mode: MaxPerformance Lag Limit: 30 seconds Threshold: 15 seconds Ping Interval: 3000 milliseconds Ping Retry: 0 Active Target: DB21_SITE2 Potential Targets: "DB21_SITE2" DB21_SITE2 valid Observers: (*) oraadserver21 oraadserver1 oraadserver31 Shutdown Primary : TRUE Auto-reinstate: TRUE Observer Reconnect: (none) Observer Override: FALSE Configurable Failover Conditions Health Conditions: Corrupted Controlfile YES Corrupted Dictionary YES Inaccessible Logfile NO Stuck Archiver NO Datafile Write Errors YES Oracle Error Conditions: (none) DGMGRL> |
Let’s poweroff the standby server
1
|
[root@oraadserver2 ~] # poweroff |
As expected, there was not a fast-start failover as I lose both standby database and observer because no backup observer was promoted.
And what is also important is that my primary database was shut down by Oracle. Indeed if the alert log of the primary database we can see following lines
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
|
Thread 1 advanced to log sequence 33 (LGWR switch), current SCN: 77729099 Current log # 1 seq# 33 mem# 0: /u01/app/oracle/oradata/DB21/onlinelog/o1_mf_1_hx1xy9yc_.log Current log # 1 seq# 33 mem# 1: /u01/app/oracle/fast_recovery_area/DB21/onlinelog/o1_mf_1_hx1xybv4_.log 2022-04-15T13:09:26.832279+02:00 ARC0 (PID:12144): Archived Log entry 907 added for B-1101901028.T-1.S-32 ID 0x465cfcd1 LAD:1 [krse.c:4912] 2022-04-15T13:10:01.882983+02:00 Fast-Start Failover reconfiguration in progress. 2022-04-15T13:10:04.874482+02:00 DMON: FSFP network call timeout. Killing process FSFP. 2022-04-15T13:10:04.898659+02:00 Process termination requested for pid 12003 [ source = rdbms], [info = 2] [request issued by pid: 11934, uid: 54323] 2022-04-15T13:10:07.914848+02:00 Starting background process FSFP 2022-04-15T13:10:07.986554+02:00 FSFP started with pid=7, OS id =13725 2022-04-15T13:10:11.906564+02:00 Primary has heard from neither observer nor target standby within FastStartFailoverThreshold seconds. It is likely an automatic failover has already occurred. Primary is shutting down. 2022-04-15T13:10:11.911704+02:00 Errors in file /u01/app/oracle/diag/rdbms/db21_site1/DB21/trace/DB21_lg00_11908 .trc: ORA-16830: primary isolated from fast-start failover partners longer than FastStartFailoverThreshold seconds: shutting down USER (ospid: 11908): terminating the instance due to ORA error 16830 2022-04-15T13:10:12.031189+02:00 System state dump requested by (instance=1, osid=11908 (LG00)), summary=[abnormal instance termination]. 2022-04-15T13:10:12.031406+02:00 Memory (Avail / Total) = 792.82M / 3789.53M Swap (Avail / Total) = 3072.00M / 3072.00M 2022-04-15T13:10:12.125885+02:00 System State dumped to trace file /u01/app/oracle/diag/rdbms/db21_site1/DB21/trace/DB21_diag_11877 .trc 2022-04-15T13:10:12.699552+02:00 Dumping diagnostic data in directory=[cdmp_20220415131012], requested by (instance=1, osid=11908 (LG00)), summary=[abnormal instance termination]. 2022-04-15T13:10:13.866769+02:00 Instance terminated by USER, pid = 11908 2022-04-15T13:12:58.049262+02:00 |
This means that if your master observer is located in the same datacenter that the standby server, if your standby datacenter crash,
-No automatic failover will happen
-Your primary database will be shutdown
Case 3 : The master observer is running on oraadserver3 so the observer is located in the third datacenter
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
|
DGMGRL> show observer Configuration - db21 Fast-Start Failover: ENABLED Primary : DB21_SITE1 Active Target: DB21_SITE2 Observer "oraadserver31" - Master Host Name : oraadserver3 Last Ping to Primary : 1 second ago Last Ping to Target: 1 second ago Log File: /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver3.log State File: /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.dat Observer "oraadserver1" - Backup Host Name : oraadserver Last Ping to Primary : 1 second ago Last Ping to Target: 0 seconds ago Log File: /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver.log State File: /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.dat Observer "oraadserver21" - Backup Host Name : oraadserver2 Last Ping to Primary : 1 second ago Last Ping to Target: 0 seconds ago Log File: /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver2.log State File: /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.dat DGMGRL> |
Now let’s crash the third datacenter which only host the master observer, no primary or standby database is running on this datacenter.
1
|
[root@oraadserver3 ~] # poweroff |
A few minutes after, a backup observer was automatically promoted to a master one.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
|
DGMGRL> show observer Configuration - db21 Fast-Start Failover: ENABLED Primary : DB21_SITE1 Active Target: DB21_SITE2 Observer "oraadserver1" - Master Host Name : oraadserver Last Ping to Primary : 0 seconds ago Last Ping to Target: 2 seconds ago Log File: /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver.log State File: /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.dat Observer "oraadserver21" - Backup Host Name : oraadserver2 Last Ping to Primary : 0 seconds ago Last Ping to Target: 2 seconds ago Log File: /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver2.log State File: /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.dat Observer "oraadserver31" - Backup Host Name : oraadserver3 Last Ping to Primary : 59 seconds ago Last Ping to Target: 59 seconds ago Log File: /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver3.log State File: /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.dat |
To resume we can see that
Prmiary database and master observer in the same datacenter
-loss of datacenter = No automatic failover because no master observer promoted
Standby database and master observer in the same datacenter
-loss of datacenter = No automatic failover because no master observer promoted + shutdown of primary database
Master observer in a third datacenter
-loss of datacenter = a backup observer will be promoted to a master one.
Conclusion
I will conclude with a question
Where will you put your master observer if you have
2 datacenters?
3 datacenter?
Hope this blog will help
Saurabh
01.11.2023I will conclude with a question
Where will you put your master observer if you have
2 datacenters? Answer:oraadserver2
3 datacenter? Answer:oraadserver3