Infrastructure at your Service

Clemens Bleile

Direct NFS, ODM 4.0 in 12.2: archiver stuck situation after a shutdown abort and restart

A customer had an interesting case recently. Since Oracle 12.2. he got archiver stuck situations after a shutdown abort and restart. I reproduced the issue and it is caused by direct NFS since running ODM 4.0 (i.e. since 12.2.). The issue also reproduced on 18.5. When direct NFS is enabled then the archiver-process writes to a file with a preceding dot in its name. E.g.


.arch_1_90_985274359.arc

When the file has been fully copied from the online redolog, then it is renamed to not contain the preceding dot anymore. I.e. using the previous example:


arch_1_90_985274359.arc

When I do a “shutdown abort” while the archiver is in process of writing to the archive-file (with the leading dot in its name) and I do restart the database then Oracle is not able to cope with that file. I.e. in the alert-log I do get the following errors:


2019-04-17T10:22:33.190330+02:00
ARC0 (PID:12598): Unable to create archive log file '/arch_backup/gen183/archivelog/arch_1_90_985274359.arc'
2019-04-17T10:22:33.253476+02:00
Errors in file /u01/app/oracle/diag/rdbms/gen183/gen183/trace/gen183_arc0_12598.trc:
ORA-19504: failed to create file "/arch_backup/gen183/archivelog/arch_1_90_985274359.arc"
ORA-17502: ksfdcre:8 Failed to create file /arch_backup/gen183/archivelog/arch_1_90_985274359.arc
ORA-17500: ODM err:File exists
2019-04-17T10:22:33.254078+02:00
ARC0 (PID:12598): Error 19504 Creating archive log file to '/arch_backup/gen183/archivelog/arch_1_90_985274359.arc'
ARC0 (PID:12598): Stuck archiver: inactive mandatory LAD:1
ARC0 (PID:12598): Stuck archiver condition declared

The DB continues to operate normal until it has to overwrite the online redologfile, which has not been fully archived yet. At that point the archiver becomes stuck and modifications on the DB are no longer possible.

When I remove the incomplete archive-file then the DB continues to operate normally:


rm .arch_1_90_985274359.arc

Using a 12.1-Database with ODM 3.0 I didn’t see that behavior. I.e. I could also see an archived redologfile with a preceding dot in its name, but when I shutdown abort and restart then Oracle removed the file itself and there was no archiver problem.

Testcase:

1.) make sure you have direct NFS enabled


cd $ORACLE_HOME/rdbms/lib
make -f ins_rdbms.mk dnfs_on

2.) configure a mandatory log archive destination pointing to a NFS-mounted filesystem. E.g.


[root]# mount -t nfs -o rw,bg,hard,rsize=32768,wsize=32768,vers=3,nointr,timeo=600,proto=tcp,suid,nolock,noac nfs_server:/arch_backup /arch_backup
 
SQL> alter system set log_archive_dest_1='location=/arch_backup/gen183/archivelog mandatory reopen=30';

3.) Produce some DML-load on the DB

I created 2 tables t3 and t4 as a copy of all_objects with approx 600’000 rows:


SQL> create table t3 as select * from all_objects;
SQL> insert into t3 select * from t3;
SQL> -- repeat above insert until you have 600K rows in t3
SQL> commit;
SQL> create table t4 as select * from t3;

Run the following PLSQL-block to produce redo:


begin
for i in 1..20 loop
delete from t3;
commit;
insert into t3 select * from t4;
commit;
end loop;
end;
/

4.) While the PLSQL-block of 3.) is running check the archive-files produced in your log archive destination


ls -ltra /arch_backup/gen183/archivelog

Once you see a file created with a preceding dot in its name then shutdown abort the database:


[email protected]:/arch_backup/gen183/archivelog/ [gen183] ls -ltra /arch_backup/gen183/archivelog
total 2308988
drwxr-xr-x. 3 oracle oinstall 23 Apr 17 10:13 ..
-r--r-----. 1 oracle oinstall 2136861184 Apr 24 18:24 arch_1_104_985274359.arc
drwxr-xr-x. 2 oracle oinstall 69 Apr 24 18:59 .
-rw-r-----. 1 oracle oinstall 2090587648 Apr 24 18:59 .arch_1_105_985274359.arc
 
SQL> shutdown abort

5.) If the file with the preceding dot is still there after the shutdown then you reproduced the issue. Just startup the DB and “tail -f” your alert-log-file.


[email protected]:/arch_backup/gen183/archivelog/ [gen183] cdal
[email protected]:/u01/app/oracle/diag/rdbms/gen183/gen183/trace/ [gen183] tail -f alert_gen183.log
...
2019-04-24T19:01:24.775991+02:00
Oracle instance running with ODM: Oracle Direct NFS ODM Library Version 4.0
...
2019-04-24T19:01:43.770196+02:00
ARC0 (PID:8876): Unable to create archive log file '/arch_backup/gen183/archivelog/arch_1_105_985274359.arc'
2019-04-24T19:01:43.790546+02:00
Errors in file /u01/app/oracle/diag/rdbms/gen183/gen183/trace/gen183_arc0_8876.trc:
ORA-19504: failed to create file "/arch_backup/gen183/archivelog/arch_1_105_985274359.arc"
ORA-17502: ksfdcre:8 Failed to create file /arch_backup/gen183/archivelog/arch_1_105_985274359.arc
ORA-17500: ODM err:File exists
ARC0 (PID:8876): Error 19504 Creating archive log file to '/arch_backup/gen183/archivelog/arch_1_105_985274359.arc'
ARC0 (PID:8876): Stuck archiver: inactive mandatory LAD:1
ARC0 (PID:8876): Stuck archiver condition declared
...

This is a serious problem, because it may cause an archiver stuck problem after a crash. I opened a Service Request at Oracle. The SR has been assigned to the ODM-team now. Once I get a resolution I’ll update this Blog.

UPDATE 14-May-2019:
Oracle Support updated the SR that the issue is caused by internal bug 27044169. A patch for Linux x86-64 is currently available for
12.2.0.1.190115 DBJAN2019RU
12.2.0.1.190416 DBAPR2019RU
12.2.0.1.181016 DBOCT2018RU
12.2.0.1.170814 DBRU
12.2.0.1.0

See also MOS Note “The instance hangs in RAC environment with archiver issues ORA-19504,ORA-17502,ORA-17500 (Doc ID 2378546.1)”

2 Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Clemens Bleile
Clemens Bleile

Technology Leader & Principal Consultant