By William Sescu
Relying to much on the RMAN Data Recovery Advisor is not always the best idea. In a lot of situations, it tells you the right things, however, sometimes it tells you not the optimal things, and sometimes, RMAN list failure does not show any failure at all, even if there is one.
So … let’s simulate quickly a loss of a datafile during the normal runtime of the database. The result is a clear error message which says that the datafile 5 is missing.
SQL> select count(*) from hr.employees; select count(*) from hr.employees * ERROR at line 1: ORA-01116: error in opening database file 5 ORA-01110: data file 5: '/u01/oradata/DBTEST1/hrDBTEST01.dbf' ORA-27041: unable to open file Linux-x86_64 Error: 2: No such file or directory Additional information: 3
Of course, the error message is immediately reflected in the alert.log as well where it clearly says that Oracle in unable to open file number 5.
Errors in file /u00/app/oracle/diag/rdbms/dbtest1/DBTEST1/trace/DBTEST1_smon_17115.trc: ORA-01116: error in opening database file 5 ORA-01110: data file 5: '/u01/oradata/DBTEST1/hrDBTEST01.dbf' ORA-27041: unable to open file Linux-x86_64 Error: 2: No such file or directory
Only the RMAN Data Recovery advisor does not know what it’s going on.
RMAN> list failure; using target database control file instead of recovery catalog Database Role: PRIMARY no failures found that match specification
Of course, I could shutdown the DB, and then startup again which would trigger a Health Check, but shutting down an instance is not always so easy on production systems. Especially when only one datafile is missing, but all others are available and only a part of the application is affected.
The solution to that issue, is to run a manual health check. Quite a lot of health checks can be run manually, like show in the following documentation.
https://docs.oracle.com/database/121/ADMIN/diag.htm#ADMIN11269
I start with the DB Structure Integrity Check. This check verifies the integrity of database files and reports failures if these files are inaccessible, corrupt or inconsistent.
SQL> begin 2 dbms_hm.run_check ('DB Structure Integrity Check','Williams Check 00000001'); 3 end; 4 / PL/SQL procedure successfully completed.
After running the Health Check, Oracle finds the failure and in the alter.log you will see an entry like the following:
Checker run found 1 new persistent data failures
If you want to take a look what exactly the Health check found, you can invoke the ADRCI and execute the “show hm_run” command.
oracle@vmoratest1:/oracle/workshop/bombs/ [DBTEST1] adrci ADRCI: Release 12.1.0.2.0 - Production on Tue Feb 7 16:02:21 2017 Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved. ADR base = "/u00/app/oracle" adrci> show homes ADR Homes: diag/clients/user_oracle/host_1833655127_82 diag/tnslsnr/vmoratest1/listener diag/rdbms/cdb1p/CDB1P diag/rdbms/dbtest1/DBTEST1 diag/rdbms/rcat/RCAT adrci> set home diag/rdbms/dbtest1/DBTEST1 adrci> show hm_run ADR Home = /u00/app/oracle/diag/rdbms/dbtest1/DBTEST1: ************************************************************************* ... ... ********************************************************** HM RUN RECORD 9 ********************************************************** RUN_ID 206 RUN_NAME Williams Check 00000001 CHECK_NAME DB Structure Integrity Check NAME_ID 2 MODE 0 START_TIME 2017-02-07 16:03:44.431601 +01:00 RESUME_TIME <NULL> END_TIME 2017-02-07 16:03:44.478127 +01:00 MODIFIED_TIME 2017-02-07 16:03:44.478127 +01:00 TIMEOUT 0 FLAGS 0 STATUS 5 SRC_INCIDENT_ID 0 NUM_INCIDENTS 0 ERR_NUMBER 0 REPORT_FILE <NULL> 9 rows fetched adrci>
However, if you take a look at the HM RUN report, is gives you an error.
adrci> show report hm_run 'Williams Check 00000001' DIA-48415: Syntax error found in string [show report hm_run 'Williams Check 00000001'] at column [44]
This is not a bug. The HM run name must be only alphanumeric and underscore. So … better don’t use spaces in between your name. The following would have been better.
SQL> begin 2 dbms_hm.run_check ('DB Structure Integrity Check','WilliamsCheck'); 3 end; 4 / PL/SQL procedure successfully completed.
In case, the “adrci show report hm_run” does not work for you, it is not the end of the story. We still can look up the v$hm_finding view.
select RUN_ID, TIME_DETECTED, STATUS, DESCRIPTION, DAMAGE_DESCRIPTION from v$hm_finding where run_id = '206'; SQL> select RUN_ID, TIME_DETECTED, STATUS, DESCRIPTION, DAMAGE_DESCRIPTION from v$hm_finding where run_id = '206'; RUN_ID TIME_DETECTED STATUS DESCRIPTION DAMAGE_DESCRIPTION ------ ---------------------------- ------------ -------------------------------------------- -------------------------------------------- 206 07-FEB-17 04.03.44.475000 PM OPEN Datafile 5: '/u01/oradata/DBTEST1/hrDBTEST01 Some objects in tablespace HR might be unava .dbf' is missing ilable
Now let’s check the RMAN “list failure” again.
RMAN> list failure; Database Role: PRIMARY List of Database Failures ========================= Failure ID Priority Status Time Detected Summary ---------- -------- --------- -------------------- ------- 2 HIGH OPEN 07-FEB-2017 15:39:38 One or more non-system datafiles are missing RMAN> advise failure; ... Automated Repair Options ======================== Option Repair Description ------ ------------------ 1 Restore and recover datafile 5 Strategy: The repair includes complete media recovery with no data loss Repair script: /u00/app/oracle/diag/rdbms/dbtest1/DBTEST1/hm/reco_668410907.hm RMAN> repair failure preview; Strategy: The repair includes complete media recovery with no data loss Repair script: /u00/app/oracle/diag/rdbms/dbtest1/DBTEST1/hm/reco_668410907.hm contents of repair script: # restore and recover datafile sql 'alter database datafile 5 offline'; restore ( datafile 5 ); recover datafile 5; sql 'alter database datafile 5 online';
Conclusion
The Oracle Data Recovery Advisor is quite good, but sometimes you need to push it into the right direction. Besides that, take care of the naming convention that you use for your health check runs. 😉