Infrastructure at your Service

William Sescu

Oracle – Suspending/Resuming an Oracle PID with “oradebug suspend/resume” or the OS kill command

Sometimes, if you kick off a huge load or transformation job, your Archive Destination might run full, faster than your RMAN backup job cleans up the destination. To avoid an Archiver Stuck in such situations, the “oradebug suspend/resume” can be helpful, or the UNIX kill command. Usually, the “oradebug suspend” and the UNIX kill command work quite well.

Before kicking off you SQL script, get all the information you need from your session.

select s.username as Username, 
       s.machine as Machine, 
	   s.client_info as Client_Info, 
	   s.module as Module, 
	   s.action as Action, 
	   s.sid as SessionID,
	   p.pid as ProcessID,
	   p.spid as "UNIX ProcessID"
from
v$session s, v$process p
where s.sid = sys_context ('userenv','sid')
and s.PADDR = p.ADDR;

USERNAME     MACHINE      CLIENT_INFO  MODULE                           ACTION        SESSIONID  PROCESSID UNIX ProcessID
------------ ------------ ------------ -------------------------------- ------------ ---------- ---------- ------------------------
SYS          oel001                    sqlplus@oel001 (TNS V1-V3)                           148         69 7186

In another SQL session you can now set the PID with setorapid or the Server Process ID with setospid. HINT: When the Oracle multiprocess/multithread feature is enabled, RDBMS processes are mapped to threads running in operating system processes, and the SPID identifier is not unique for RDBMS processes. When the Oracle multiprocess/multithread feature is not enabled on UNIX systems, the SPID identifier is unique for RDBMS processes.

SQL> oradebug setorapid 69
Oracle pid: 69, Unix process pid: 7186, image: oracle@oel001 (TNS V1-V3)
SQL> -- oradebug setospid 7186
SQL> oradebug suspend
Statement processed.

-- Now your session is suspended and any command executed by the suspended session is hanging, even select's
-- SQL> select * from dual;

-- Now you can take your time and clean up the archive destination e.g. by moving all archivelogs
-- to tape and delete those in the archive destination afterwards "RMAN> backup archivelog all delete all input;"
-- After the job is done, resume your operation.

SQL> oradebug resume
Statement processed.

-- Now the "select * from dual" comes back.
SQL> select * from dual;

D
-
X

In case you are running 11.2.0.2, it might happen that you see an ORA-600 after running oradebug suspend. No problem, in those cases we can achieve the same thing with the UNIX kill command as well.

On Linux you would run:

$ kill -sigstop $SPID
$ kill -sigcont $SPID

$ kill -l
 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL       5) SIGTRAP
 6) SIGABRT      7) SIGBUS       8) SIGFPE       9) SIGKILL     10) SIGUSR1
11) SIGSEGV     12) SIGUSR2     13) SIGPIPE     14) SIGALRM     15) SIGTERM
16) SIGSTKFLT   17) SIGCHLD     18) SIGCONT     19) SIGSTOP     20) SIGTSTP
21) SIGTTIN     22) SIGTTOU     23) SIGURG      24) SIGXCPU     25) SIGXFSZ
26) SIGVTALRM   27) SIGPROF     28) SIGWINCH    29) SIGIO       30) SIGPWR
31) SIGSYS      34) SIGRTMIN    35) SIGRTMIN+1  36) SIGRTMIN+2  37) SIGRTMIN+3
38) SIGRTMIN+4  39) SIGRTMIN+5  40) SIGRTMIN+6  41) SIGRTMIN+7  42) SIGRTMIN+8
43) SIGRTMIN+9  44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9  56) SIGRTMAX-8  57) SIGRTMAX-7
58) SIGRTMAX-6  59) SIGRTMAX-5  60) SIGRTMAX-4  61) SIGRTMAX-3  62) SIGRTMAX-2
63) SIGRTMAX-1  64) SIGRTMAX

On AIX you would run:

$ kill -17 $SPID
$ kill -19 $SPID

$ kill -l
1) HUP                  14) ALRM                 27) MSG                  40) bad trap             53) bad trap
2) INT                  15) TERM                 28) WINCH                41) bad trap             54) bad trap
3) QUIT                 16) URG                  29) PWR                  42) bad trap             55) bad trap
4) ILL                  17) STOP                 30) USR1                 43) bad trap             56) bad trap
5) TRAP                 18) TSTP                 31) USR2                 44) bad trap             57) bad trap
6) ABRT                 19) CONT                 32) PROF                 45) bad trap             58) RECONFIG
7) EMT                  20) CHLD                 33) DANGER               46) bad trap             59) CPUFAIL
8) FPE                  21) TTIN                 34) VTALRM               47) bad trap             60) GRANT
9) KILL                 22) TTOU                 35) MIGRATE              48) bad trap             61) RETRACT
10) BUS                 23) IO                   36) PRE                  49) bad trap             62) SOUND
11) SEGV                24) XCPU                 37) VIRT                 50) bad trap             63) SAK
12) SYS                 25) XFSZ                 38) ALRM1                51) bad trap
13) PIPE                26) bad trap             39) WAITING              52) bad trap

Be very careful, different UNIX systems have different mappings between the signal number and the signal itself. Make sure you look it up first with “kill -l” to get the correct one. From my point of view, the suspend/resume feature, either with the Oracle oradebug or the UNIX kill command  is very useful.

Cheers,
William

 

Leave a Reply


five − 4 =

William Sescu
William Sescu

Consultant