If I had to rank my favorite Oracle-related tools and software, Dbvisit Standby would likely be at the top of the list.
You are reading this post, so you probably know that Dbvisit Standby is a Disaster/Recovery solution for Oracle Database Standard Edition (aka Data Guard for poor people 😛 ).
The reasons why I like this product are mostly related to the following points (non-exhaustive list) :
- Ease of installation and configuration
- Ease of use
- Continuous evolution (new features)
- Documentation quality
- Technical support efficiency
However, when running NFS, you have to ensure the share is always reachable for Dbvisit Standby to operate correctly.
In this blog post, I will describe a NFS issue I encountered on a customer running Dbvisit Standby version 10.1 and how it is resolved.
The following error message appeared while running the dbvctl command to transfer and the apply archive logs :
Dbvisit Standby process for preposs still running on odaprep01 (pid=53835). See trace file 53835_dbvctl_preposs_202112081932.trc for more details. Exceeded RUNNING_MAX_TIMES_TRIED=1 attempts. (if Dbvisit Standby process is no longer running, then delete lock file /u01/app/dbvisit/standby/pid/dbvisit_preposs.pid)
This seems to indicate that the archive logs transfer process did not succeed properly and got stuck on the server.
Let’s have a look to the existing processes :
[[email protected] ~]$ ps -ef | grep dbvisit oracle 33833 14053 0 Dec08 ? 00:00:00 sh -c /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_3_12003.3.dbvisit.202112081923.sqlplus.dbv 2>/dev/null oracle 33834 33833 0 Dec08 ? 00:00:00 /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_3_12003.3.dbvisit.202112081923.sqlplus.dbv oracle 34813 34812 0 Dec08 ? 00:00:00 /bin/sh -c /u01/app/dbvisit/standby/dbvctl -d preposs >/tmp/dbvisit_apply_logs_preposs.log 2>&1 oracle 34814 34813 0 Dec08 ? 00:00:00 /u01/app/dbvisit/standby/dbvctl -d preposs oracle 35239 34814 0 Dec08 ? 00:00:00 sh -c /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/34814.0.dbvisit.202112081924.sqlplus.dbv 2>/dev/null oracle 35240 35239 0 Dec08 ? 00:00:00 /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/34814.0.dbvisit.202112081924.sqlplus.dbv oracle 36929 1 0 Sep17 ? 02:54:14 /u01/app/dbvisit/standby/dbvctl -d preposs -D start oracle 38142 14053 0 Dec08 ? 00:00:00 sh -c /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_4_1.4.dbvisit.202112082033.sqlplus.dbv 2>/dev/null oracle 38143 38142 0 Dec08 ? 00:00:00 /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_4_1.4.dbvisit.202112082033.sqlplus.dbv oracle 41184 1 0 Sep03 ? 01:17:16 /u01/app/dbvisit/dbvnet/dbvnet -d start oracle 44524 14053 0 Dec08 ? 00:00:00 sh -c /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_5_1.5.dbvisit.202112082143.sqlplus.dbv 2>/dev/null oracle 44525 44524 0 Dec08 ? 00:00:00 /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_5_1.5.dbvisit.202112082143.sqlplus.dbv oracle 45780 14053 0 Dec08 ? 00:00:00 sh -c /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_6_1.6.dbvisit.202112082253.sqlplus.dbv 2>/dev/null oracle 45781 45780 0 Dec08 ? 00:00:00 /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_6_1.6.dbvisit.202112082253.sqlplus.dbv oracle 51450 14053 0 00:03 ? 00:00:00 sh -c /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_7_1.7.dbvisit.202112090003.sqlplus.dbv 2>/dev/null oracle 51451 51450 0 00:03 ? 00:00:00 /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_7_1.7.dbvisit.202112090003.sqlplus.dbv oracle 53234 14053 0 01:13 ? 00:00:00 sh -c /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_8_1.8.dbvisit.202112090113.sqlplus.dbv 2>/dev/null oracle 53235 53234 0 01:13 ? 00:00:00 /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_8_1.8.dbvisit.202112090113.sqlplus.dbv oracle 54442 14053 0 02:23 ? 00:00:00 sh -c /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_9_1.9.dbvisit.202112090223.sqlplus.dbv 2>/dev/null oracle 54443 54442 0 02:23 ? 00:00:00 /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_9_1.9.dbvisit.202112090223.sqlplus.dbv oracle 59799 14053 0 03:33 ? 00:00:00 sh -c /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_10_1.10.dbvisit.202112090333.sqlplus.dbv 2>/dev/null oracle 59800 59799 0 03:33 ? 00:00:00 /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_10_1.10.dbvisit.202112090333.sqlplus.dbv oracle 60840 14053 0 04:43 ? 00:00:00 sh -c /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_11_1.11.dbvisit.202112090443.sqlplus.dbv 2>/dev/null oracle 60841 60840 0 04:43 ? 00:00:00 /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_11_1.11.dbvisit.202112090443.sqlplus.dbv oracle 61931 14053 0 05:53 ? 00:00:00 sh -c /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_12_1.12.dbvisit.202112090553.sqlplus.dbv 2>/dev/null oracle 61932 61931 0 05:53 ? 00:00:00 /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_12_1.12.dbvisit.202112090553.sqlplus.dbv oracle 64979 14053 0 07:03 ? 00:00:00 sh -c /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_13_1.13.dbvisit.202112090703.sqlplus.dbv 2>/dev/null oracle 64980 64979 0 07:03 ? 00:00:00 /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_13_1.13.dbvisit.202112090703.sqlplus.dbv oracle 67322 14053 0 08:13 ? 00:00:00 sh -c /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_14_1.14.dbvisit.202112090813.sqlplus.dbv 2>/dev/null oracle 67323 67322 0 08:13 ? 00:00:00 /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_14_1.14.dbvisit.202112090813.sqlplus.dbv oracle 68513 14053 0 09:23 ? 00:00:00 sh -c /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_15_1.15.dbvisit.202112090923.sqlplus.dbv 2>/dev/null oracle 68514 68513 0 09:23 ? 00:00:00 /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_15_1.15.dbvisit.202112090923.sqlplus.dbv oracle 71517 14053 0 10:33 ? 00:00:00 sh -c /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_16_1.16.dbvisit.202112091033.sqlplus.dbv 2>/dev/null oracle 71518 71517 0 10:33 ? 00:00:00 /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_16_1.16.dbvisit.202112091033.sqlplus.dbv oracle 72932 14053 0 11:43 ? 00:00:00 sh -c /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_17_1.17.dbvisit.202112091143.sqlplus.dbv 2>/dev/null oracle 72933 72932 0 11:43 ? 00:00:00 /usr/sbin/fuser /u01/app/dbvisit/standby/tmp/14053_17_1.17.dbvisit.202112091143.sqlplus.dbv oracle 78878 1 0 Sep03 ? 00:09:45 /u01/app/dbvisit/dbvagent/dbvagent -d start [[email protected] ~]$
Mmmh, quite a big mess here, isn’t ?
You certainly noticed that there is a lot of blocked processes running the fuser command. Actually, before transferring an archive logs from the primary server to the standby one, fuser is executed to ensure that the concerned archive log is not being used by another process (e.g RMAN backup). This behavior can be confirmed by analyzing the trace file (<dbvisit_home>/traces directory) generated by the dbvctl command :
209 11:08:41 main::UTIL_UNIX_is_file_open: run command: /usr/sbin/fuser /u03/app/oracle/fast_recovery_area/PRODOSS_DC41/archivelog/2021_12_09/o1_mf_1_162040_jv3oc7xb_.arc 20211209 11:08:41 main::UTIL_run_command: ORACLE_HOME: /u01/app/oracle/product/184.108.40.206/dbhome_2 20211209 11:08:41 main::UTIL_run_command: PATH: /usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/sbin:/sbin:/u01/app/oracle/product/220.127.116.11/dbhome_2/bin 20211209 11:08:41 main::UTIL_run_command: LD_LIBRARY_PATH: /u01/app/dbvisit/standby/lib:/u01/app/oracle/product/18.104.22.168/dbhome_2/lib
The first action I took was to disable the Dbvisit jobs from the crontab – on both sides – to avoid some new processes to be generated.
Then I killed all the blocked processes listed above to start from a clean situation, and I deleted the PID file stored in the <dbvisit_home>/pid > directory.
Finally I executed manually the fuser command against an archive log and there I could see that the command remained stuck, without any result to return. Ctrl-C was needed to stop it. This strange behavior was also the same with some other commands like df or lsof.
By analyzing the system logs (/var/log/messages), I could observe that a NFS share mounted on the server was no longer reachable :
Dec 8 19:38:08 odaprep01 kernel: nfs: server 10.84.48.100 not responding, timed out Dec 8 19:41:08 odaprep01 kernel: nfs: server 10.84.48.100 not responding, timed out Dec 8 19:41:14 odaprep01 kernel: nfs: server 10.84.48.100 not responding, timed out
And that’s why the fuser command triggered by dbvctl never ended.
Now, you are probably wondering “the archive logs are not stored on the NFS, so why did this have an impact ?“.
Let me explain…
fuser is designed to access all processes at one time to determine if any of their files is stored on the local file system. To discover that, a stat() call is used to identify the attributes of the processes’ executable. If the executable is stored on a NFS, the stat() call result depends on the NFS availability to be successful and can hang if it’s not reachable.
To solve the issue, I had of course to unmount the unreachable NFS mount point (umount -l <mount_point>).
As Dbvisit were not working properly since several hours, the standby database was out of sync because of archive logs gap. Unfortunately, the missing archive logs where not present anymore into the Fast Recovery Area, so I had to restore them from the RMAN backup :
RMAN> restore archivelog from logseq=26160 until logseq=26190; Starting restore at 09-DEC-2021 12:26:39 allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=365 device type=DISK channel ORA_DISK_1: starting archived log restore to default destination channel ORA_DISK_1: restoring archived log archived log thread=1 sequence=26169 channel ORA_DISK_1: restoring archived log archived log thread=1 sequence=26170 channel ORA_DISK_1: reading from backup piece /u03/app/oracle/fast_recovery_area/PREPOSS_DC41/backupset/2021_12_09/o1_mf_annnn_TAG20211209T002732_jv2hv54g_.bkp channel ORA_DISK_1: piece handle=/u03/app/oracle/fast_recovery_area/PREPOSS_DC41/backupset/2021_12_09/o1_mf_annnn_TAG20211209T002732_jv2hv54g_.bkp tag=TAG20211209T002732 channel ORA_DISK_1: restored backup piece 1 channel ORA_DISK_1: restore complete, elapsed time: 00:00:15 channel ORA_DISK_1: starting archived log restore to default destination channel ORA_DISK_1: restoring archived log archived log thread=1 sequence=26160 channel ORA_DISK_1: restoring archived log archived log thread=1 sequence=26161 channel ORA_DISK_1: restoring archived log archived log thread=1 sequence=26162 channel ORA_DISK_1: restoring archived log archived log thread=1 sequence=26163 ... ... ... channel ORA_DISK_1: restoring archived log archived log thread=1 sequence=26188 channel ORA_DISK_1: restoring archived log archived log thread=1 sequence=26189 channel ORA_DISK_1: restoring archived log archived log thread=1 sequence=26190 channel ORA_DISK_1: reading from backup piece /u01/app/oracle/admin/preposs/backup/20211209_111002_arc_PREPOSS_533362994_s2246_p1.bck channel ORA_DISK_1: piece handle=/u01/app/oracle/admin/preposs/backup/20211209_111002_arc_PREPOSS_533362994_s2246_p1.bck tag=ARC_20211209_111002 channel ORA_DISK_1: restored backup piece 1 channel ORA_DISK_1: restore complete, elapsed time: 00:00:07 Finished restore at 09-DEC-2021 12:27:17 RMAN>
And finally, I was able to restart the dbvctl command to resolve the gap by transferring and applying the restored archive logs :
[email protected]:/u01/app/dbvisit/standby/trace/ [preposs] dbvctl -d preposs ============================================================= Dbvisit Standby Database Technology (10.1.0_0_gba3a9e08) (pid 15459) dbvctl started on odaprep01: Thu Dec 9 12:28:33 2021 ============================================================= >>> Obtaining information from standby database (RUN_INSPECT=Y)... done Thread: 1 Archive log gap: 35. Transfer log gap: 35 >>> Transferring Log file(s) from preposs on odaprep01 to odaprep02: thread 1 sequence 26160 (o1_mf_1_26160_jv3t012k_.arc)... done thread 1 sequence 26161 (o1_mf_1_26161_jv3szzng_.arc)... done thread 1 sequence 26162 (o1_mf_1_26162_jv3t00lo_.arc)... done thread 1 sequence 26163 (o1_mf_1_26163_jv3szzpx_.arc)... done ... ... ... thread 1 sequence 26188 (o1_mf_1_26188_jv3t0jgx_.arc)... done thread 1 sequence 26189 (o1_mf_1_26189_jv3t0jpb_.arc)... done thread 1 sequence 26190 (o1_mf_1_26190_jv3t0gr6_.arc)... done >>> Dbvisit Archive Management Module (AMM) Config: number of archives to keep = 0 Config: number of days to keep archives = 1 Config: archive backup count = 1 Config: diskspace full threshold = 80% ========== Total number of archive logs : 35 Current disk percent full (/u03/app/oracle/fast_recovery_area/) = 10% ========== Current disk percent full (FRA) = 0% ========== Number of archive logs deleted = 0 ============================================================= dbvctl ended on odaprep01: Thu Dec 9 12:30:19 2021 =============================================================
That was not a Dbvisit issue. As I said at the very beginning, Dbvisit is a great tool 😉 .
Hint: if you want to get the fuser, df, lsof commands working even when a mounted NFS is temporarily unreachable, a solution would be to mount it with the soft option :
mount -o rw,soft host.server.com/share /mymountpoint
According to the documentation :
Generates a soft mount of the NFS file system. If an error occurs, the stat() function returns with an error.
If the option hard is used, stat() does not return until the file system is available.”
Hope this helps.