By William Sescu

Relying to much on the RMAN Data Recovery Advisor is not always the best idea. In a lot of situations,  it tells you the right things, however, sometimes it tells you not the optimal things, and sometimes, RMAN list failure does not show any failure at all, even if there is one.

So … let’s simulate quickly a loss of a datafile during the normal runtime of the database. The result is a clear error message which says that the datafile 5 is missing.

SQL> select count(*) from hr.employees;
select count(*) from hr.employees
                        *
ERROR at line 1:
ORA-01116: error in opening database file 5
ORA-01110: data file 5: '/u01/oradata/DBTEST1/hrDBTEST01.dbf'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

Of course, the error message is immediately reflected in the alert.log as well where it clearly says that Oracle in unable to open file number 5.

Errors in file /u00/app/oracle/diag/rdbms/dbtest1/DBTEST1/trace/DBTEST1_smon_17115.trc:
ORA-01116: error in opening database file 5
ORA-01110: data file 5: '/u01/oradata/DBTEST1/hrDBTEST01.dbf'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory

Only the RMAN Data Recovery advisor does not know what it’s going on.

RMAN> list failure;

using target database control file instead of recovery catalog
Database Role: PRIMARY

no failures found that match specification

Of course, I could shutdown the DB, and then startup again which would trigger a Health Check, but shutting down an instance is not always so easy on production systems. Especially when only one datafile is missing, but all others are available and only a part of the application is affected.

The solution to that issue, is to run a manual health check. Quite a lot of health checks can be run manually, like show in the following documentation.

https://docs.oracle.com/database/121/ADMIN/diag.htm#ADMIN11269

I start with the DB Structure Integrity Check. This check verifies the integrity of database files and reports failures if these files are inaccessible, corrupt or inconsistent.

SQL> begin
  2  dbms_hm.run_check ('DB Structure Integrity Check','Williams Check 00000001');
  3  end;
  4  /

PL/SQL procedure successfully completed.

After running the Health Check, Oracle finds the failure and in the alter.log you will see an entry like the following:

Checker run found 1 new persistent data failures

If you want to take a look what exactly the Health check found, you can invoke the ADRCI and execute the “show hm_run” command.

oracle@vmoratest1:/oracle/workshop/bombs/ [DBTEST1] adrci

ADRCI: Release 12.1.0.2.0 - Production on Tue Feb 7 16:02:21 2017

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

ADR base = "/u00/app/oracle"
adrci> show homes
ADR Homes:
diag/clients/user_oracle/host_1833655127_82
diag/tnslsnr/vmoratest1/listener
diag/rdbms/cdb1p/CDB1P
diag/rdbms/dbtest1/DBTEST1
diag/rdbms/rcat/RCAT

adrci> set home diag/rdbms/dbtest1/DBTEST1

adrci> show hm_run

ADR Home = /u00/app/oracle/diag/rdbms/dbtest1/DBTEST1:
*************************************************************************

...
...

**********************************************************
HM RUN RECORD 9
**********************************************************
   RUN_ID                        206
   RUN_NAME                      Williams Check 00000001
   CHECK_NAME                    DB Structure Integrity Check
   NAME_ID                       2
   MODE                          0
   START_TIME                    2017-02-07 16:03:44.431601 +01:00
   RESUME_TIME                   <NULL>
   END_TIME                      2017-02-07 16:03:44.478127 +01:00
   MODIFIED_TIME                 2017-02-07 16:03:44.478127 +01:00
   TIMEOUT                       0
   FLAGS                         0
   STATUS                        5
   SRC_INCIDENT_ID               0
   NUM_INCIDENTS                 0
   ERR_NUMBER                    0
   REPORT_FILE                   <NULL>
9 rows fetched

adrci>

However, if you take a look at the HM RUN report, is gives you an error.

adrci> show report hm_run 'Williams Check 00000001'
DIA-48415: Syntax error found in string [show report hm_run 'Williams Check 00000001'] at column [44]

This is not a bug. The HM run name must be only alphanumeric and underscore.  So … better don’t use spaces in between your name. The following would have been better.

SQL> begin
  2  dbms_hm.run_check ('DB Structure Integrity Check','WilliamsCheck');
  3  end;
  4  /

PL/SQL procedure successfully completed.

In case, the “adrci show report hm_run” does not work for you, it is not the end of the story. We still can look up the v$hm_finding view.

select RUN_ID, TIME_DETECTED, STATUS, DESCRIPTION, DAMAGE_DESCRIPTION from v$hm_finding where run_id = '206';

SQL> select RUN_ID, TIME_DETECTED, STATUS, DESCRIPTION, DAMAGE_DESCRIPTION from v$hm_finding where run_id = '206';

RUN_ID TIME_DETECTED                STATUS       DESCRIPTION                                  DAMAGE_DESCRIPTION
------ ---------------------------- ------------ -------------------------------------------- --------------------------------------------
   206 07-FEB-17 04.03.44.475000 PM OPEN         Datafile 5: '/u01/oradata/DBTEST1/hrDBTEST01 Some objects in tablespace HR might be unava
                                                 .dbf' is missing                             ilable

Now let’s check the RMAN “list failure” again.

RMAN> list failure;

Database Role: PRIMARY

List of Database Failures
=========================

Failure ID Priority Status    Time Detected        Summary
---------- -------- --------- -------------------- -------
2          HIGH     OPEN      07-FEB-2017 15:39:38 One or more non-system datafiles are missing


RMAN> advise failure;
...
Automated Repair Options
========================
Option Repair Description
------ ------------------
1      Restore and recover datafile 5
  Strategy: The repair includes complete media recovery with no data loss
  Repair script: /u00/app/oracle/diag/rdbms/dbtest1/DBTEST1/hm/reco_668410907.hm

  
RMAN> repair failure preview;

Strategy: The repair includes complete media recovery with no data loss
Repair script: /u00/app/oracle/diag/rdbms/dbtest1/DBTEST1/hm/reco_668410907.hm

contents of repair script:
   # restore and recover datafile
   sql 'alter database datafile 5 offline';
   restore ( datafile 5 );
   recover datafile 5;
   sql 'alter database datafile 5 online';

Conclusion

The Oracle Data Recovery Advisor is quite good, but sometimes you need to push it into the right direction. Besides that, take care of the naming convention that you use for your health check runs. 😉