By Mouhamadou Diaw

An observer is an OCI client that connects to the primary and target standby databases using the same SYS credentials you used when you connected to the Oracle Data Guard configuration with DGMGRL.
The observer is highly recommended in a Data Guard environment. But it is mandatory if a Fast-Start Failover is configured.

Since Oracle 12.2 we can have up to 3 observers and the maximum number of observers is increased to 4 since Oracle 21c. One important thing is that even if we have multiple observers, only one observer is the master and all other are backup observers. Only the master observer can initiate a fast-start failover process.

The question we often ask is where to host my observers. Does the support of multiple observers close this question?
In this blog I am trying to test many scenarios so that we will have an idea of where to put my observers.

I will suppose that I have 3 datacenters
-The primary datacenter hosting the primary server oraadserver
-The secondary datacenter hosting the primary server oraadserver1
-The third datacenter where I have the server oraadserver3 I can use for observer for example

The fast-start failover is already configured, and I have 3 observers

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
DGMGRL> show configuration verbose
Configuration - db21
  Protection Mode: MaxPerformance
  Members:
  DB21_SITE1 - Primary database
    DB21_SITE2 - (*) Physical standby database
  (*) Fast-Start Failover target
  Properties:
    FastStartFailoverThreshold      = '15'
    OperationTimeout                = '30'
    TraceLevel                      = 'USER'
    FastStartFailoverLagLimit       = '30'
    CommunicationTimeout            = '180'
    ObserverReconnect               = '0'
    ObserverPingInterval            = '0'
    ObserverPingRetry               = '0'
    FastStartFailoverAutoReinstate  = 'TRUE'
    FastStartFailoverPmyShutdown    = 'TRUE'
    BystandersFollowRoleChange      = 'ALL'
    ObserverOverride                = 'FALSE'
    ExternalDestination1            = ''
    ExternalDestination2            = ''
    PrimaryLostWriteAction          = 'CONTINUE'
    ConfigurationWideServiceName    = 'DB21_CFG'
    ConfigurationSimpleName         = 'db21'
    DrainTimeout                    = '0'
Fast-Start Failover: Enabled in Potential Data Loss Mode
  Lag Limit:          30 seconds
  Threshold:          15 seconds
  Ping Interval:      3000 milliseconds
  Ping Retry:         0
  Active Target:      DB21_SITE2
  Potential Targets:  "DB21_SITE2"
    DB21_SITE2 valid
  Observers:      (*) oraadserver1
                      oraadserver21
                      oraadserver31
  Shutdown Primary:   TRUE
  Auto-reinstate:     TRUE
  Observer Reconnect: (none)
  Observer Override:  FALSE
Configuration Status:
SUCCESS
DGMGRL>

Case 1 : The master observer is running on oraadserver so the observer is located in the primary datacenter

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
DGMGRL> show observer
Configuration - db21
  Fast-Start Failover:     ENABLED
  Primary:            DB21_SITE1
  Active Target:      DB21_SITE2
Observer "oraadserver1" - Master
  Host Name:                    oraadserver
  Last Ping to Primary:         0 seconds ago
  Last Ping to Target:          0 seconds ago
  Log File:                     /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver.log
  State File:                   /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.dat
Observer "oraadserver21" - Backup
  Host Name:                    oraadserver2
  Last Ping to Primary:         2 seconds ago
  Last Ping to Target:          2 seconds ago
  Log File:                     /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver2.log
  State File:                   /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.dat
Observer "oraadserver31" - Backup
  Host Name:                    oraadserver3
  Last Ping to Primary:         0 seconds ago
  Last Ping to Target:          2 seconds ago
  Log File:                     /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver3.log
  State File:                   /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.dat
DGMGRL>

The first test I am doing is to simulate the loss of the first datacenter and to see if a fast-start failover will happen. The loss of the primary datacenter means that I lose both primary database and master observer

Ok let’s poweroff the primary server

1
[root@oraadserver ~]# poweroff

In the logfile of one observer located in a remaining datacenter (oraadserver3) we can see following lines

1
2
3
4
5
6
7
8
9
10
11
[W000 2022-04-15T12:48:16.563+02:00] Primary database cannot be reached.
[W000 2022-04-15T12:48:16.563+02:00] Fast-Start Failover threshold has not exceeded. Retry for the next 2 seconds
[W000 2022-04-15T12:48:17.563+02:00] Try to connect to the primary.
[W000 2022-04-15T12:48:19.891+02:00] Primary database cannot be reached.
[W000 2022-04-15T12:48:19.891+02:00] Fast-Start Failover threshold has expired.
[W000 2022-04-15T12:48:19.891+02:00] Try to connect to the standby.
[W000 2022-04-15T12:48:19.891+02:00] Check if the standby is ready for failover.
[W000 2022-04-15T12:48:19.899+02:00] Fast-Start Failover is not possible because this observer is not the master.
[W000 2022-04-15T12:48:20.902+02:00] Try to connect to the primary.
[W000 2022-04-15T12:48:28.908+02:00] Primary database cannot be reached.
[W000 2022-04-15T12:48:28.908+02:00] Fast-Start Failover threshold has not exceeded. Retry for the next 7 seconds

As expected, the fast_start failover did not happen because the master observer was down. But the question is why another observer was not promoted as a master. Yes I have 3 observers, I am expecting that when the master crash that a backup observer will become the master.

I then restart the primary server and confirm that the db_site1 is still the primary database

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
DGMGRL> show configuration
Configuration - db21
  Protection Mode: MaxPerformance
  Members:
  DB21_SITE1 - Primary database
    DB21_SITE2 - (*) Physical standby database
Fast-Start Failover: Enabled in Potential Data Loss Mode
Configuration Status:
SUCCESS   (status updated 31 seconds ago)
DGMGRL>

Ok, we restart everything and still have the master observer in the primary datacenter

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
DGMGRL> show fast_start failover
Fast-Start Failover: Enabled in Potential Data Loss Mode
  Protection Mode:    MaxPerformance
  Lag Limit:          30 seconds
  Threshold:          15 seconds
  Ping Interval:      3000 milliseconds
  Ping Retry:         0
  Active Target:      DB21_SITE2
  Potential Targets:  "DB21_SITE2"
    DB21_SITE2 valid
  Observers:      (*) oraadserver1
                      oraadserver21
                      oraadserver31
  Shutdown Primary:   TRUE
  Auto-reinstate:     TRUE
  Observer Reconnect: (none)
  Observer Override:  FALSE
Configurable Failover Conditions
  Health Conditions:
    Corrupted Controlfile          YES
    Corrupted Dictionary           YES
    Inaccessible Logfile            NO
    Stuck Archiver                  NO
    Datafile Write Errors          YES
  Oracle Error Conditions:
    (none)
DGMGRL>

And let’s kill the observer without crashing the datacenter (we only crash the observer not the primary database)

1
2
3
4
5
6
7
8
[oracle@oraadserver ~]$ ps -ef | grep -i observer
oracle   12816     1  0 12:55 ?        00:00:01 /u01/app/oracle/product/dbhome_1/bin/dgmgrl START OBSERVER NONAME FILE IS 'fsfo.dat'
oracle   12988 12959  0 12:57 pts/2    00:00:00 grep --color=auto -i observer
[oracle@oraadserver ~]$
[oracle@oraadserver ~]$ kill -9 12816
[oracle@oraadserver ~]$

We can see in this case that the observer located in another datacenter was promoted to a master one as few minutes after. A fast-start failover will happen if now we crash the primary datacenter.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
DGMGRL> show fast_start failover
Fast-Start Failover: Enabled in Potential Data Loss Mode
  Protection Mode:    MaxPerformance
  Lag Limit:          30 seconds
  Threshold:          15 seconds
  Ping Interval:      3000 milliseconds
  Ping Retry:         0
  Active Target:      DB21_SITE2
  Potential Targets:  "DB21_SITE2"
    DB21_SITE2 valid
  Observers:      (*) oraadserver21
                      oraadserver1
                      oraadserver31
  Shutdown Primary:   TRUE
  Auto-reinstate:     TRUE
  Observer Reconnect: (none)
  Observer Override:  FALSE
Configurable Failover Conditions
  Health Conditions:
    Corrupted Controlfile          YES
    Corrupted Dictionary           YES
    Inaccessible Logfile            NO
    Stuck Archiver                  NO
    Datafile Write Errors          YES
  Oracle Error Conditions:
    (none)
DGMGRL>

So seems that if we lose at the same time the master observer and the primary database, no backup observer is promoted to a master.

Case 2 : The master observer is running on oraadserver2 so the observer is located in the secondary datacenter

In this second test, the master observer is in the same datacenter that the standby database. Let’s simulate a crash of the secondary datacenter by crashing the standby server and see what happens

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
DGMGRL> show fast_start failover;
Fast-Start Failover: Enabled in Potential Data Loss Mode
  Protection Mode:    MaxPerformance
  Lag Limit:          30 seconds
  Threshold:          15 seconds
  Ping Interval:      3000 milliseconds
  Ping Retry:         0
  Active Target:      DB21_SITE2
  Potential Targets:  "DB21_SITE2"
    DB21_SITE2 valid
  Observers:      (*) oraadserver21
                      oraadserver1
                      oraadserver31
  Shutdown Primary:   TRUE
  Auto-reinstate:     TRUE
  Observer Reconnect: (none)
  Observer Override:  FALSE
Configurable Failover Conditions
  Health Conditions:
    Corrupted Controlfile          YES
    Corrupted Dictionary           YES
    Inaccessible Logfile            NO
    Stuck Archiver                  NO
    Datafile Write Errors          YES
  Oracle Error Conditions:
    (none)
DGMGRL>

Let’s poweroff the standby server

1
[root@oraadserver2 ~]# poweroff

As expected, there was not a fast-start failover as I lose both standby database and observer because no backup observer was promoted.
And what is also important is that my primary database was shut down by Oracle. Indeed if the alert log of the primary database we can see following lines

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Thread 1 advanced to log sequence 33 (LGWR switch),  current SCN: 77729099
  Current log# 1 seq# 33 mem# 0: /u01/app/oracle/oradata/DB21/onlinelog/o1_mf_1_hx1xy9yc_.log
  Current log# 1 seq# 33 mem# 1: /u01/app/oracle/fast_recovery_area/DB21/onlinelog/o1_mf_1_hx1xybv4_.log
2022-04-15T13:09:26.832279+02:00
ARC0 (PID:12144): Archived Log entry 907 added for B-1101901028.T-1.S-32 ID 0x465cfcd1 LAD:1 [krse.c:4912]
2022-04-15T13:10:01.882983+02:00
Fast-Start Failover reconfiguration in progress.
2022-04-15T13:10:04.874482+02:00
DMON: FSFP network call timeout. Killing process FSFP.
2022-04-15T13:10:04.898659+02:00
Process termination requested for pid 12003 [source = rdbms], [info = 2] [request issued by pid: 11934, uid: 54323]
2022-04-15T13:10:07.914848+02:00
Starting background process FSFP
2022-04-15T13:10:07.986554+02:00
FSFP started with pid=7, OS id=13725
2022-04-15T13:10:11.906564+02:00
Primary has heard from neither observer nor target standby within FastStartFailoverThreshold seconds.
It is likely an automatic failover has already occurred. Primary is shutting down.
2022-04-15T13:10:11.911704+02:00
Errors in file /u01/app/oracle/diag/rdbms/db21_site1/DB21/trace/DB21_lg00_11908.trc:
ORA-16830: primary isolated from fast-start failover partners longer than FastStartFailoverThreshold seconds: shutting down
USER (ospid: 11908): terminating the instance due to ORA error 16830
2022-04-15T13:10:12.031189+02:00
System state dump requested by (instance=1, osid=11908 (LG00)), summary=[abnormal instance termination].
2022-04-15T13:10:12.031406+02:00
Memory (Avail / Total) = 792.82M / 3789.53M
Swap (Avail / Total) = 3072.00M /  3072.00M
2022-04-15T13:10:12.125885+02:00
System State dumped to trace file /u01/app/oracle/diag/rdbms/db21_site1/DB21/trace/DB21_diag_11877.trc
2022-04-15T13:10:12.699552+02:00
Dumping diagnostic data in directory=[cdmp_20220415131012], requested by (instance=1, osid=11908 (LG00)), summary=[abnormal instance termination].
2022-04-15T13:10:13.866769+02:00
Instance terminated by USER, pid = 11908
2022-04-15T13:12:58.049262+02:00

This means that if your master observer is located in the same datacenter that the standby server, if your standby datacenter crash,
-No automatic failover will happen
-Your primary database will be shutdown

Case 3 : The master observer is running on oraadserver3 so the observer is located in the third datacenter

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
DGMGRL> show observer
Configuration - db21
  Fast-Start Failover:     ENABLED
  Primary:            DB21_SITE1
  Active Target:      DB21_SITE2
Observer "oraadserver31" - Master
  Host Name:                    oraadserver3
  Last Ping to Primary:         1 second ago
  Last Ping to Target:          1 second ago
  Log File:                     /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver3.log
  State File:                   /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.dat
Observer "oraadserver1" - Backup
  Host Name:                    oraadserver
  Last Ping to Primary:         1 second ago
  Last Ping to Target:          0 seconds ago
  Log File:                     /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver.log
  State File:                   /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.dat
Observer "oraadserver21" - Backup
  Host Name:                    oraadserver2
  Last Ping to Primary:         1 second ago
  Last Ping to Target:          0 seconds ago
  Log File:                     /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver2.log
  State File:                   /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.dat
DGMGRL>

Now let’s crash the third datacenter which only host the master observer, no primary or standby database is running on this datacenter.

1
[root@oraadserver3 ~]# poweroff

A few minutes after, a backup observer was automatically promoted to a master one.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
DGMGRL> show observer
Configuration - db21
  Fast-Start Failover:     ENABLED
  Primary:            DB21_SITE1
  Active Target:      DB21_SITE2
Observer "oraadserver1" - Master
  Host Name:                    oraadserver
  Last Ping to Primary:         0 seconds ago
  Last Ping to Target:          2 seconds ago
  Log File:                     /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver.log
  State File:                   /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.dat
Observer "oraadserver21" - Backup
  Host Name:                    oraadserver2
  Last Ping to Primary:         0 seconds ago
  Last Ping to Target:          2 seconds ago
  Log File:                     /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver2.log
  State File:                   /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.dat
Observer "oraadserver31" - Backup
  Host Name:                    oraadserver3
  Last Ping to Primary:         59 seconds ago
  Last Ping to Target:          59 seconds ago
  Log File:                     /u01/app/oracle/admin/prod20/broker_files/config_db21/log/observer_oraadserver3.log
  State File:                   /u01/app/oracle/admin/prod20/broker_files/config_db21/dat/fsfo.dat

To resume we can see that

Prmiary database and master observer in the same datacenter
-loss of datacenter = No automatic failover because no master observer promoted

Standby database and master observer in the same datacenter
-loss of datacenter = No automatic failover because no master observer promoted + shutdown of primary database

Master observer in a third datacenter
-loss of datacenter = a backup observer will be promoted to a master one.

Conclusion

I will conclude with a question
Where will you put your master observer if you have
2 datacenters?
3 datacenter?

Hope this blog will help