By Franck Pachot

.
The title of this blog post was: “Tracing DBMS_RCVMAN for reclaimable archivelogs” until I started to write the conclusion…

In a previous post I mentioned that there’s a bug with archivelog deletion policy when you want to mention both the ‘BACKED UP … TIMES TO …’ and ‘APPLIED’ or ‘SHIPPED’ as conditions for archived logs to be reclaimable. I opened a SR, they didn’t even try to reproduce it (and I can guarantee you can reproduce it in 2 minutes on any currently supported version) so I traced it myself to understand the bug and suggest the fix.

I traced the DBMS_RCVMAN with Kernel Recovery Area function SQL Tracing:


SQL> alter session set events 'trace[kra_sql] disk high, memory disable';
SQL> exec dbms_backup_restore.refreshAgedFiles;
SQL> alter session set events 'trace[kra_sql] off';

I know refreshAgedFiles checks for reclaimable file in FRA since it was an old bug where we had to run it manually on databases in mount.

I compared the traces when changing the order of ‘APPLIED’ and ‘BACKED UP’ and found the following:


< *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: setRedoLogDeletionPolicy with policy = TO BACKED UP 1 TIMES TO DISK APPLIED ON ALL STANDBY
---
> *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: setRedoLogDeletionPolicy with policy = TO APPLIED ON ALL STANDBY BACKED UP 1 TIMES TO DISK
5340c5340
< *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: policy = TO BACKED UP 1 TIMES TO DISK APPLIED ON ALL STANDBY
---
> *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: policy = TO APPLIED ON ALL STANDBY BACKED UP 1 TIMES TO DISK
5343c5343
< *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: EXITING setRedoLogDeletionPolicy with policy = TO BACKED UP 1 TIMES TO DISK APPLIED ON ALL STANDBY with alldest = 1
---
> *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: EXITING setRedoLogDeletionPolicy with policy = TO APPLIED ON ALL STANDBY BACKED UP 1 TIMES TO DISK with alldest = 1
5350,5351c5350,5351
< *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: parseBackedUpOption devtype=DISK
< *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: parseBackedUpOption backed up conf - devtype=DISK , backups=1
---
> *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: parseBackedUpOption devtype=DISK
> *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: parseBackedUpOption backed up conf - devtype=DISK, backups=1
5363c5363
< *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: EXITING getBackedUpAl with TRUE
---
> *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: EXITING getBackedUpAl with key = 128 stamp = 958068130
5367c5367
< *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: EXITING getBackedUpFiles with: no_data_found
---
> *:KRA_SQL:kraq.c@1035:kraqgdbg(): DBGRCVMAN: EXITING getBackedUpFiles

You see at the top the difference in the way I mentioned the deletion policy. You see at the bottom that the first one (starting with ‘BACKED UP’) didn’t find archivelogs being backed up (no_data_found). But the second one (starting with ‘APPLIED’) mentioned the sequence# 128.

But if you look carefully, you see another difference in the middle: the “devtype=DISK” has an additional space before the comma in the first case.

So I traced a bit further, including SQL_TRACE and I found that the deletion policy is just using some INSTR and SUBSTR parsing on the deletion policy text to find the policy, the backup times, and the device type. For sure, looking for backups with DEVICE_TYPE=’DISK ‘ instead of ‘DISK’ will not find anything and this is the reason for the bug: no archived logs backed up means no archived log reclaimable.

If you look closer at DBMS_RCVMAN you will find that the device type is extracted with SUBSTR(:1, 1, INSTR(:1, ‘ ‘)) when the device type is followed by a space, which is the reason of this additional space. The correct extraction should be SUBSTR(:1, 1, INSTR(:1, ‘ ‘)-1) and this is what I suggested on the SR.

So what?

Writing the conclusion made me change the title. Currently, a lot of people are advocating for unstructured data. Because it is easy (which rhymes with ‘lazy’). Store information as it comes and postpone the parsing to a more structured data type until you need to process it. This seems to be how the RMAN configuration is stored: as the text we entered. And it is parsed later with simple text function as INSTR(), SUBSTR(), and LIKE. But you can see how a little bug, such as reading an additional character, has big consequences. If you look at the archivelog deletion policy syntax, you have 50% chances to run into this bug on a Data Guard configuration. The Recovery Area will fill up and your database will be blocked. The controlfile grows. Or you noticed it before and you run a ‘delete archivelog’ statement without knowing the reason. You waste space, removing some recovery files from local storage, which could have been kept for longer. If the deletion policy was parsed immediately when entered, like SQL DDL or PL/SQL APIs, the issue would have been detected a long time ago. Structure and strong typing is the way to build robust applications.