Infrastructure at your Service

Joël Cattin

“FRA full” alerts flood the Alert Log

We discovered a strange behavior in the Alert Log when the Fast Recovery Area (FRA) is full and the database wants to write something inside it (for example an archivelog). This case concern Oracle 11.2.0.3 databases and higher.

Here is a demo with a 12c database (12.1.0.2) :
First, to reproduce the behavior, I set a very small size to the FRA :

SQL> alter system set db_recovery_file_dest_size = 1M;

System altered.

SQL>

As you can see, the FRA is full :

SQL> select * from v$flash_recovery_area_usage;

FILE_TYPE               PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES     CON_ID
----------------------- ------------------ ------------------------- --------------- ----------
CONTROL FILE                             0                         0               0          0
REDO LOG                                 0                         0               0          0
ARCHIVED LOG                      15871.24                         0              41          0
BACKUP PIECE                             0                         0               0          0
IMAGE COPY                               0                         0               0          0
FLASHBACK LOG                            0                         0               0          0
FOREIGN ARCHIVED LOG                     0                         0               0          0
AUXILIARY DATAFILE COPY                  0                         0               0          0

8 rows selected.

SQL>

Next, I execute a “switch logfile” to force a writing of an archive inside the FRA :

SQL> alter system switch logfile;

System altered.

SQL>

The command works fine but a lot of trace files are generated on the diag_dest and… take a look in the alert.log ! :

...
...
...
Tue Aug 25 16:56:34 2015
************************************************************************
You have following choices to free up space from recovery area:
1. Consider changing RMAN RETENTION POLICY. If you are using Data Guard,
   then consider changing RMAN ARCHIVELOG DELETION POLICY.
2. Back up files to tertiary device such as tape using RMAN
   BACKUP RECOVERY AREA command.
3. Add disk space and increase db_recovery_file_dest_size parameter to
   reflect the new space.
4. Delete unnecessary files using RMAN DELETE command. If an operating
   system command was used to delete files, then use RMAN CROSSCHECK and
   DELETE EXPIRED commands.
************************************************************************
Tue Aug 25 16:56:34 2015
Errors in file /u01/app/oracle/diag/rdbms/jocdb1_site1/JOCDB1/trace/JOCDB1_arc1_2767.trc:
ORA-19809: limit exceeded for recovery files
ORA-19804: cannot reclaim 20405760 bytes disk space from 1048576 limit
ARC1: Error 19809 Creating archive log file to '/u90/fast_recovery_area/JOCDB1_SITE1/archivelog/2015_08_25/o1_mf_1_268_%u_.arc'
Tue Aug 25 16:56:34 2015
Errors in file /u01/app/oracle/diag/rdbms/jocdb1_site1/JOCDB1/trace/JOCDB1_arc3_2771.trc:
ORA-19815: WARNING: db_recovery_file_dest_size of 1048576 bytes is 100.00% used, and has 0 remaining bytes available.
Tue Aug 25 16:56:34 2015
************************************************************************
You have following choices to free up space from recovery area:
1. Consider changing RMAN RETENTION POLICY. If you are using Data Guard,
   then consider changing RMAN ARCHIVELOG DELETION POLICY.
2. Back up files to tertiary device such as tape using RMAN
   BACKUP RECOVERY AREA command.
3. Add disk space and increase db_recovery_file_dest_size parameter to
   reflect the new space.
4. Delete unnecessary files using RMAN DELETE command. If an operating
   system command was used to delete files, then use RMAN CROSSCHECK and
   DELETE EXPIRED commands.
************************************************************************
Tue Aug 25 16:56:34 2015
Errors in file /u01/app/oracle/diag/rdbms/jocdb1_site1/JOCDB1/trace/JOCDB1_arc3_2771.trc:
ORA-19809: limit exceeded for recovery files
ORA-19804: cannot reclaim 20405760 bytes disk space from 1048576 limit
ARC3: Error 19809 Creating archive log file to '/u90/fast_recovery_area/JOCDB1_SITE1/archivelog/2015_08_25/o1_mf_1_268_%u_.arc'
Tue Aug 25 16:56:34 2015
Errors in file /u01/app/oracle/diag/rdbms/jocdb1_site1/JOCDB1/trace/JOCDB1_arc0_2765.trc:
ORA-19815: WARNING: db_recovery_file_dest_size of 1048576 bytes is 100.00% used, and has 0 remaining bytes available.
Tue Aug 25 16:56:34 2015
************************************************************************
You have following choices to free up space from recovery area:
1. Consider changing RMAN RETENTION POLICY. If you are using Data Guard,
   then consider changing RMAN ARCHIVELOG DELETION POLICY.
2. Back up files to tertiary device such as tape using RMAN
   BACKUP RECOVERY AREA command.
3. Add disk space and increase db_recovery_file_dest_size parameter to
   reflect the new space.
4. Delete unnecessary files using RMAN DELETE command. If an operating
   system command was used to delete files, then use RMAN CROSSCHECK and
   DELETE EXPIRED commands.
************************************************************************
Tue Aug 25 16:56:34 2015
Errors in file /u01/app/oracle/diag/rdbms/jocdb1_site1/JOCDB1/trace/JOCDB1_arc0_2765.trc:
ORA-19809: limit exceeded for recovery files
ORA-19804: cannot reclaim 20405760 bytes disk space from 1048576 limit
ARC0: Error 19809 Creating archive log file to '/u90/fast_recovery_area/JOCDB1_SITE1/archivelog/2015_08_25/o1_mf_1_268_%u_.arc'
Tue Aug 25 16:56:34 2015
Errors in file /u01/app/oracle/diag/rdbms/jocdb1_site1/JOCDB1/trace/JOCDB1_arc1_2767.trc:
ORA-19815: WARNING: db_recovery_file_dest_size of 1048576 bytes is 100.00% used, and has 0 remaining bytes available.
...
...
...

As you can see, the “FRA full” message is written a lot of times per SECONDS !
This behavior can have important consequences :

  • Because of the high frequency of trace files generation (and in the same time the filling of the alert.log), the file system where the diag_dest is (here /u01 in my demo) can be full before that the DBA have time to clean up or resize the FRA.
  • Most of the time the diag_dest is in the same file system that the databases. And if this file system is full, databases will no longer run :-(

Below, you can see the size evolution of the alert and trace folders after 15 seconds of “FRA full” :

Before :

oracle@srvtest:/u01/app/oracle/diag/rdbms/jocdb1_site1/JOCDB1/ [JOCDB1] du -sh alert trace
12K alert
5.4M trace

After 15 seconds :

oracle@srvtest:/u01/app/oracle/diag/rdbms/jocdb1_site1/JOCDB1/ [JOCDB1] du -sh alert trace
5.6M alert
7.2M trace

I also monitored the system calls of the Arc process (with strace, during 2 minutes) :

oracle@srvtest:/home/oracle/ [JOCDB1] ps -aef | grep ora | grep arc
oracle    2783     1  1 10:03 ?        00:00:17 ora_arc0_JOCDB1
oracle    2785     1  0 10:03 ?        00:00:10 ora_arc1_JOCDB1
oracle    2787     1  0 10:03 ?        00:00:00 ora_arc2_JOCDB1
oracle    2789     1  0 10:03 ?        00:00:10 ora_arc3_JOCDB1
oracle    4014  3232  0 10:22 pts/0    00:00:00 grep arc
oracle@srvtest:/home/oracle/ [JOCDB1] strace -o strace.trc -tt -p 2783
Process 2783 attached - interrupt to quit
^CProcess 2783 detached
oracle@srvtest:/home/oracle/ [JOCDB1]

As you can see bellow, the ora_arc0 process writes approximately every 57 milliseconds :

oracle@srvtest:/home/oracle/ [JOCDB1] cat strace.trc | grep ORA-19815
10:12:01.884104 write(7, "ORA-19815: WARNING: db_recovery_"..., 117) = 117
10:12:01.939699 write(7, "ORA-19815: WARNING: db_recovery_"..., 117) = 117
10:12:02.000230 write(7, "ORA-19815: WARNING: db_recovery_"..., 117) = 117
10:12:02.060336 write(7, "ORA-19815: WARNING: db_recovery_"..., 117) = 117
10:12:02.120346 write(7, "ORA-19815: WARNING: db_recovery_"..., 117) = 117
...
...

After discovering this “bug” I tried to reproduce it on a 11.2.0.2 database and I noticed that this problem doesn’t exist with this version.
I also tested this case on a Windows environment with a 12.1.0.1 database and this time the “bug” is the same as on a Linux system.

I created an Oracle Service Request and Oracle decided to open a bug (currently under development).

A solution is to create a separate file system on the server for the diag_dest of each database.
Indeed, if a database can’t write on his alert.log because of the alert.log of an other database filled the file system, some critical errors will not be detected.

P.S : many thanks to Franck Pachot for his help and advises.

4 Comments

Leave a Reply

Joël Cattin
Joël Cattin

Consultant