Infrastructure at your Service

William Sescu

Oracle 12c – Why you shouldn’t do a crosscheck archivelog all in your regular RMAN backup scripts

Crosschecking in RMAN is quite cool stuff. With the RMAN crosscheck you can update an outdated RMAN repository about backups or archivelogs whose repository records do not match their physical status.

For example, if a user removes archived logs from disk with an operating system command, the repository (RMAN controlfile or RMAN catalog) still indicates that the logs are on disk, when in fact they are not. It is important to know, that the RMAN CROSSCHECK command never deletes any operating system files or removes any repository records, it just updates the repository with the correct information. In case you really want to delete something, you must use the DELETE command for these operations.

Manually removing archived logs or anything else out of the fast recovery area is something you should never do, however, in reality it still happens.

But when it happens, you want know which files are not on their physical location. So why not running a crosscheck archivelog all regularly in your backup scripts? Is it not a good idea?

From my point of view it is not. For two reason:

  • Your backup script runs slower because you do an extra step
  • But for and foremost you will not notice if an archived log is missing

Let’s run a little test case. I simply move one archived log away and run the backup archivelog all command afterwards.

oracle@dbidg03:/u03/fast_recovery_area/CDB/archivelog/2017_03_30/ [CDB (CDB$ROOT)] mv o1_mf_1_61_dfso8r7p_.arc o1_mf_1_61_dfso8r7p_.arc.20170413a

RMAN> backup archivelog all;

Starting backup at 13-APR-2017 08:03:14
current log archived
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=281 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=44 device type=DISK
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup command at 04/13/2017 08:03:17
RMAN-06059: expected archived log not found, loss of archived log compromises recoverability
ORA-19625: error identifying file /u03/fast_recovery_area/CDB/archivelog/2017_03_30/o1_mf_1_61_dfso8r7p_.arc
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 7

This is exactly what I have expected. I want to have a clear error message in case an archived log is missing. I don’t want Oracle to skip over it and just continue as if nothing has happened. But what happens if I run a crosscheck archivelog all before running my backup command?

RMAN> crosscheck archivelog all;

released channel: ORA_DISK_1
released channel: ORA_DISK_2
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=281 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=44 device type=DISK
validation succeeded for archived log
archived log file name=/u03/fast_recovery_area/CDB/archivelog/2017_03_28/o1_mf_1_56_dfmzywt1_.arc RECID=73 STAMP=939802622
validation succeeded for archived log
archived log file name=/u03/fast_recovery_area/CDB/archivelog/2017_03_28/o1_mf_1_57_dfo40o1g_.arc RECID=74 STAMP=939839542
validation succeeded for archived log
archived log file name=/u03/fast_recovery_area/CDB/archivelog/2017_03_29/o1_mf_1_58_dfovy7cj_.arc RECID=75 STAMP=939864041
validation succeeded for archived log
archived log file name=/u03/fast_recovery_area/CDB/archivelog/2017_03_29/o1_mf_1_59_dfq7pcwz_.arc RECID=76 STAMP=939908847
validation succeeded for archived log
archived log file name=/u03/fast_recovery_area/CDB/archivelog/2017_03_30/o1_mf_1_60_dfrg8f8o_.arc RECID=77 STAMP=939948334
validation succeeded for archived log
archived log file name=/u03/fast_recovery_area/CDB/archivelog/2017_03_31/o1_mf_1_62_dfv0kybr_.arc RECID=79 STAMP=940032607
validation succeeded for archived log
archived log file name=/u03/fast_recovery_area/CDB/archivelog/2017_03_31/o1_mf_1_63_dfw5s2l8_.arc RECID=80 STAMP=940070724
validation succeeded for archived log
archived log file name=/u03/fast_recovery_area/CDB/archivelog/2017_04_12/o1_mf_1_64_dgw5mgsl_.arc RECID=81 STAMP=941119119
validation succeeded for archived log
archived log file name=/u03/fast_recovery_area/CDB/archivelog/2017_04_13/o1_mf_1_65_dgy552z0_.arc RECID=82 STAMP=941184196
Crosschecked 9 objects

validation failed for archived log
archived log file name=/u03/fast_recovery_area/CDB/archivelog/2017_03_30/o1_mf_1_61_dfso8r7p_.arc RECID=78 STAMP=939988281
Crosschecked 1 objects
RMAN>

The crosscheck validation failed for the archived log which I have moved beforehand. Perfect, the crosscheck has found the issue.

RMAN> list expired backup;

specification does not match any backup in the repository

RMAN> list expired archivelog all;

List of Archived Log Copies for database with db_unique_name CDB
=====================================================================

Key     Thrd Seq     S Low Time
------- ---- ------- - --------------------
78      1    61      X 30-MAR-2017 00:45:33
        Name: /u03/fast_recovery_area/CDB/archivelog/2017_03_30/o1_mf_1_61_dfso8r7p_.arc

However, If I run the backup archivelog all afterwards, RMAN continues as if nothing has ever happened, and in case you are not monitoring expired archived logs or backups, you will never notice it.

RMAN> backup archivelog all;

Starting backup at 13-APR-2017 08:05:01
current log archived
using channel ORA_DISK_1
using channel ORA_DISK_2
channel ORA_DISK_1: starting compressed archived log backup set
channel ORA_DISK_1: specifying archived log(s) in backup set
input archived log thread=1 sequence=56 RECID=73 STAMP=939802622
input archived log thread=1 sequence=57 RECID=74 STAMP=939839542
input archived log thread=1 sequence=58 RECID=75 STAMP=939864041
input archived log thread=1 sequence=59 RECID=76 STAMP=939908847
input archived log thread=1 sequence=60 RECID=77 STAMP=939948334
channel ORA_DISK_1: starting piece 1 at 13-APR-2017 08:05:01
channel ORA_DISK_2: starting compressed archived log backup set
channel ORA_DISK_2: specifying archived log(s) in backup set
input archived log thread=1 sequence=62 RECID=79 STAMP=940032607
input archived log thread=1 sequence=63 RECID=80 STAMP=940070724
input archived log thread=1 sequence=64 RECID=81 STAMP=941119119
input archived log thread=1 sequence=65 RECID=82 STAMP=941184196
input archived log thread=1 sequence=66 RECID=83 STAMP=941184301
channel ORA_DISK_2: starting piece 1 at 13-APR-2017 08:05:01
channel ORA_DISK_2: finished piece 1 at 13-APR-2017 08:05:47
piece handle=/u03/fast_recovery_area/CDB/backupset/2017_04_13/o1_mf_annnn_TAG20170413T080501_dgy58fz7_.bkp tag=TAG20170413T080501 comment=NONE
channel ORA_DISK_2: backup set complete, elapsed time: 00:00:46
channel ORA_DISK_1: finished piece 1 at 13-APR-2017 08:06:07
piece handle=/u03/fast_recovery_area/CDB/backupset/2017_04_13/o1_mf_annnn_TAG20170413T080501_dgy58fy4_.bkp tag=TAG20170413T080501 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:01:06
Finished backup at 13-APR-2017 08:06:07

Starting Control File and SPFILE Autobackup at 13-APR-2017 08:06:07
piece handle=/u03/fast_recovery_area/CDB/autobackup/2017_04_13/o1_mf_s_941184367_dgy5bh7w_.bkp comment=NONE
Finished Control File and SPFILE Autobackup at 13-APR-2017 08:06:08

RMAN>

But is this really what I want? Probably not. Whenever an archived log is missing, RMAN should stop right away and throw an error message. This gives me the chance to check what was going wrong and the possibility to correct it.

Conclusion

I don’t recommend to run the crosscheck archivelog all in your regular RMAN backup scripts. This is a command that should be run manually in case it is needed. You just make your backup slower (ok, not too much but still), and you will probably never notice when an archived log is missing, which can lead to a database which can only be recovered to the point before the missing archived log.

 

3 Comments

  • Jan Schnackenberg says:

    Hi William

    I usually only nod silently to your posts, but I’m not sure if I concur with you here. I agree, that I need to know about missing archivelogs. But even more I need those backups to continue. Assuming, that the backups are configured as homogeneously as possible, to ease maintenance, I have several reasons for running archivelog-backups:

    1. Of course: to have a valid backup
    2. To be able to remove the backed up archivelogs.

    Depending on the throughput of the database the second reason may be critical if my backup jobs will crash as soon as they come across a missing archivelog. I’d much rather have a backup that’s missing one archivelog, than risking an ARCHIVER STUCK error. Especially because I usually cannot really put the missing archivelog back, anyway.

    My recommendation would be to monitor the RMAN output for the message “validation failed for”. This will allow the DBA to run an INC0/INC1 backup (to fix the problem of not being able to restore to a current time) while still making sure that the archivelog backups keep running and the archivelogs are still being removed (might be, that the poor DBA on call during the holidays has more severe problems to fix).

    Best regards,
    Jan

     
    • William Sescu says:

      Hello Jan,

      I fully understand your concerns and arguments. However, in the end, it all comes down to the SLA, RPO and RTO. How much dataloss can you afford, and what is the maximum allowable tolerable outage? And how valuable is your data? These days, an archived log can easily be 1G, 2G or much bigger than that. And losing just one of them, means losing quite a lot of transactions. So I want immediate action to be taken if one of those are gone. From my experience, Warning messages are often ignored, because people get too many of them. However, when an error popps up, people usually react on that one. That’s why I prefer to have an error, and to keep the crosschecking of the archivelogs disabled. But this is just my personal opinion.

      Cheers,
      William

       
      • Jan Schnackenberg says:

        Hi William,

        yes, this would probably be one of those “depends” situations. Depending on the possibilities of the monitoring I, too, would parse the “validation failed for” string to a critical error. But since that’s not always possible your way might be the better solution.

        Regards,
        Jan

         

Leave a Reply


× two = 4

William Sescu
William Sescu

Consultant