Infrastructure at your Service

Franck Pachot

How many members for standby redo logs?

I see lot of databases with two members for redo logs and also two members for standby redo logs. Why not, but when asking I realized that there are some mis-comprehension about it. And what was recommended 10 years ago may be different today.

Online and Stanbdy redo logs

Your transactions happen on the primary database and are written to the online redo logs before the modification is done on datafiles. And when you commit you wait to be sure that the redo is on persistence storage. If you loose the current redo log group, then your database crashes and you loose the latest transactions. This is why we multiplex the online redo logs. Even if you are 100% confident on your storage high availability the risk of human error dropping a file exists and is considerably lower if there a two files.
For additional protection, in case you loose all the primary redo members, Data Guard synchronizes the transaction to a second site by shipping the redo stream. There, on the standby site, the redo is written to the standby redo logs.

The online redo logs are used only on the primary site, and should better be named primary redo logs. You create them on the standby site only to be prepared for failover, when it will become the primary and opened read-write. But let’s be clear: online redo logs are not used when database is not online, and mount is not online.

The standby redo logs are not standby at all. They are actively used on the standby site and this is why thew are called ‘standby. On the primary, they are not used, just there to be ready when the primary becomes a standby after a failover.

Members

We have seen why we multiplex the online redo logs:

  • it protects the transactions because without multiplexing you loose transactions when loosing one group
  • it protects the instance availability because without multiplexing you crash the instance when loosing one group

But this is different with standby redo logs.

  • it is an additional protection. Transactions are still persistent on the primary even if you loose a standby log group.
  • the primary is still available even if one standby cannot be SYNC

Of course, if in Maximum Protection mode the availability of the primary is compromised when the standby cannot apply the redo in SYNC. But in this protection mode you probably have multiple standby and the loss of one standby redo log on one standby site it not a problem.

Redo transport and redo apply

I said that transactions are still persisted on the primary, but even without standby redo logs they are still shipped to standby site, but in ASYNC mode. This means that in order to loose transactions in case of the loss of a standby redo log group, you need to experience this file loss, and primary site failure and network failure at the same time. The probability for this is very low and having an additional member do not lower the risk.

Of course, I’ve tested what happens. I have two standby redo log members and I removed all of them one minute ago:

DGMGRL for Linux: Release 12.2.0.1.0 - Production on Fri Mar 17 14:47:45 2017
 
Copyright (c) 1982, 2016, Oracle and/or its affiliates. All rights reserved.
 
Welcome to DGMGRL, type "help" for information.
Connected to "ORCLA"
Connected as SYSDBA.
 
Database - orclb
 
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 0 seconds ago)
Apply Lag: 1 minute 30 seconds (computed 0 seconds ago)
Average Apply Rate: 0 Byte/s
Active Apply Rate: 0 Byte/s
Maximum Apply Rate: 0 Byte/s
Real Time Query: ON
Instance(s):
ORCLB
 
Database Warning(s):
ORA-16853: apply lag has exceeded specified threshold
ORA-16857: member disconnected from redo source for longer than specified threshold
ORA-16826: apply service state is inconsistent with the DelayMins property
ORA-16789: standby redo logs configured incorrectly

As you can see, when there is no member remaining, the APPLY is stuck but transport still happens, in ASYNC to archived logs.
The standby alert log mentions the failure:
2017-03-17T14:51:21.209611+01:00
Errors in file /u01/app/oracle/diag/rdbms/orclb/ORCLB/trace/ORCLB_rsm0_6568.trc:
ORA-00313: open failed for members of log group 5 of thread 1
ORA-00312: online log 5 thread 1: '/u01/oradata/ORCLB/onlinelog/m5.log'
ORA-27048: skgfifi: file header information is invalid
Additional information: 2
ORA-00312: online log 5 thread 1: '/u01/oradata/ORCLB/onlinelog/o1_mf_5_dbvmxd52_.log'
ORA-27048: skgfifi: file header information is invalid
Additional information: 2

and the SYNC mode cannot continue without standby redo logs:

03/17/2017 14:58:15
Failed to open SRL files. ORA-313
Redo Apply is running without NODELAY option while DelayMins=0
Failed to open SRL files. ORA-313
Error: LogXptMode value 'SYNC' of requires this database to have standby redo logs, but they are not configured correctly.
03/17/2017 14:59:15
Failed to open SRL files. ORA-313
Redo Apply is running without NODELAY option while DelayMins=0
Failed to open SRL files. ORA-313
Error: LogXptMode value 'SYNC' of requires this database to have standby redo logs, but they are not configured correctly.

Sure, you don’t want to loose the standby redo member. But the risk is not higher than loosing any other files, and this is why there is no reason to multiplex it. Standby redo logs are not the same as the primary online redo logs. On similar hardware, you need same size and you need one more group, but no reason to multiplex the same.

Documentation

The confusion may come from old documentation. The 10g documentation says:
For increased availability, consider multiplexing the standby redo log files, similar to the way that online redo log files are multiplexed.
This documentation dates from 2005 and systems have changed about availability of files.

More recent documentation is the white paper on Best Practices for Synchronous Redo Transport which mentions: It is critical for performance that standby redo log groups only contain a single member

So what?

At the time of 10g we had a LUN for redo logs and were not concerned by the size, but more by its unavailability. Things change. Losing a file, and only one file, today is extremely rare. We are more concerned about consolidation and performance. Having 4 online groups, 200MB or 500MB, and 5 standby groups, all multiplexed, for many databases will take space. And this space you want to allocate it on the fastest disks because user commits wait on log writes (on primary and standby except in Max Performance). You don’t want to over-allocate the space here. Better have larger online redo logs. And your goal is that network shipping + standby log writing takes not longer than local write to online redo logs, so that Data Guard protection do not increase commit latency. Multiplexing standby redo logs increases the risk to get longer writes on standby site.

So if you have your standby redo logs multiplexed, it’s not wrong. But things change and today you may prefer to save space and performance overhead with only one member.

Before writing this blog post, my poll on twitter had 40 votes. Only 28% mentioned no multiplexing. But twitter poll is not exact science as you can see that 8 people answered 42 members ;)

 

Leave a Reply


6 × = forty eight

Franck Pachot
Franck Pachot

Technology Leader