Infrastructure at your Service

Oracle Team

Oracle 12c – Is VKTM always your top process?

By William Sescu

If VKTM is always your top cpu consuming process, then this blog might be something for you. Especially in virtual environments, I have seen often the VKTM process as the top process, even if the VM was idle. So, I am burning CPU without any obvious benefit. So what is the reason for the high CPU consumption? Well … it can a combination of many things like not correctly working NTP, missing VMware Tools, but for and foremost Oracle Bugs. I really don’t know why, but quite a lot of issues have been raised regarding the VKTM process, like the following.

  • Bug 20693049 – 12C VKTM CONSUMING MORE CPU THAN IN 11GR2
  • Bug 20552573 – VKTM PROCESSES FROM ASM AND A CDB DATABASE CONSUME 12% CPU PERMANENTLY.
  • BUG 12883034 – CHANGE THE INTERVAL WHEN VKTM PROCESS WAKES U
  • Bug 20542107 – WARNING: VKTM DETECTED A TIME DRIFT
  • Bug 20138957 – VKTM PROCESS CONSUMING HIGH CPU EVEN AFTER PATCH 18499306
  • Bug 11837095 – “TIME DRIFT DETECTED” APPEARS INTERMITTENTLY IN ALERT LOG, THO’ EVENT 10795 SET.

If you search around in MOS, you probably find even more. Usually VKTM and VKRM issues come together, at least when you are using the resource manager. The VTKM is the Virtual Keeper of Time Process. The VKTM acts as a time publisher for an Oracle instance.  VKTM publishes two sets of time: a wall clock time using a seconds interval and a higher resolution time (which is not wall clock time) for interval measurements.  The VKTM process is a process that is available in ASM instances and RDBMS instances. So if you see issues with VKTM process, it usually popps up on both. VKTM usage is affected mainly by two hidden parameters _timer_precision and _disable_highres_ticks. So, tuning these parameters can bring down VKTM CPU consumption.

The VKRM process is the Virtual Scheduler for Resource Manager Process, and it serves as centralized scheduler for Resource Manager activity. As there is no resource manager on a ASM instance, you will see this process only on RDBMS instances.

Ok .. my test environment is OEL 6.8 with 12.1.0.2 ASM and a 12.1.0.2 database on a virtual guest. To be more precise, it is a Database with PSU 12.1.0.2.161018 and Grid Infrastructure PSU 12.1.0.2.161018. The patch level plays a quite important role. For 11gR2 database, you might need patch 20531619.

So let’s start fixing the ASM VKTM issue first. It warns me all the time, that is has detected a time drift.

$ cat alert_+ASM.log | grep -i "Warning: VKTM detected a time drift."
Warning: VKTM detected a time drift.
Warning: VKTM detected a time drift.
Warning: VKTM detected a time drift.
Warning: VKTM detected a time drift.
Warning: VKTM detected a time drift.
...
...
Warning: VKTM detected a time drift.

Before continuing, it is important that you really don’t have a time drift. On VMware, you might want to consult the following knowledge base article and the time keeping pdf. Both are very good resources. KB Article: 1006427 Timekeeping best practices for Linux guests and http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/Timekeeping-In-VirtualMachines.pdf

Next, check that your ntp is in sync.

$ ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+aquila.init7.ne 162.23.41.10     2 u   16   64  377   16.375   25.656   3.873
*ntp0.as34288.ne 85.158.25.74     2 u   17   64  377   12.987   27.874   4.045

If your ntp is not in sync, you should stop already here, and correct it. Because in that case, the warning message from Oracle is correct, that VKTM has detected a time drift. Ok. Let’s continue with checking the ASM instance. The 12.1.0.2 ASM defaults regarding VKTM are the following:

Parameter                           Session_Value  Instance_Value Description
----------------------------------- -------------- -------------- --------------------------------------------
_disable_highres_ticks              FALSE          FALSE          disable high-res tick counter
_high_priority_processes            LMS*           LMS*           High Priority Process Name Mask
_highest_priority_processes         VKTM           VKTM           Highest Priority Process Name Mask
_timer_precision                    10             10             VKTM timer precision in milli-sec
_vkrm_schedule_interval             10             10             VKRM scheduling interval
_vktm_assert_thresh                 30             30             soft assert threshold VKTM timer drift

Because I don’t need high resolution ticks on my ASM instance, I am going to disable it, and besides that I am going to disable the excessiv trace by the VKTM process which is done with the 10795 event.

SQL> alter system set "_disable_highres_ticks"=true scope=spfile;

System altered.

SQL> alter system set event="10795 trace name context forever, level 2" scope=spfile;

System altered.

Unfortunately, these changes can’t be done online, and so I have to bounce my ASM instance.

$ srvctl stop asm -f
$ srvctl start asm

My alert.log does not report time drift issues anymore and the VKTM process from my ASM instance  disappeared  from my top process list. As soon as the ASM VKTM process went away, the one from the database popped up. 🙂

[email protected]:/home/oracle/ [OCM121] top -c
top - 11:17:18 up  1:21,  2 users,  load average: 0.69, 0.77, 0.98
Tasks: 229 total,   2 running, 227 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.7%us,  0.4%sy,  0.0%ni, 98.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  10021556k total,  3816024k used,  6205532k free,   209916k buffers
Swap:  6160380k total,        0k used,  6160380k free,   672856k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5696 oracle    -2   0 1880m  46m  44m S 22.7  0.5   0:23.49 ora_vktm_OCM121

Ok. Let’s fix that one as well. The RDBMS defaults regarding VKTM with 12.1.0.2 are the same as with ASM.

Parameter                           Session_Value  Instance_Value Description
----------------------------------- -------------- -------------- --------------------------------------------
_disable_highres_ticks              FALSE          FALSE          disable high-res tick counter
_high_priority_processes            LMS*           LMS*           High Priority Process Name Mask
_highest_priority_processes         VKTM           VKTM           Highest Priority Process Name Mask
_timer_precision                    10             10             VKTM timer precision in milli-sec
_vkrm_schedule_interval             10             10             VKRM scheduling interval
_vktm_assert_thresh                 30             30             soft assert threshold VKTM timer drift

Without any changes, the tracing for the VKTM and VKRM background processes are enabled, and quite a lot  of information go into these trace files.

Tracing for the VKRM process can be disabled via the following event:

alter system set events '10720 trace name context forever, level 0x10000000';

Tracing for the VKRM process can be disabled via the following event:

alter system set events '10795 trace name context forever, level 2';

Because I don’t need any of those, I am going to disable both in one shot.

SQL> alter system set event='10720 trace name context forever, level 0x10000000','10795 trace name context forever, level 2' comment='Turn off VKRM tracing and turn off VKTM tracing' scope=spfile;

System altered.

And like on the ASM instance, I don’t need the high-resolution ticks here as well.

SQL> alter system set "_disable_highres_ticks"=true scope=spfile;

System altered.

After the restart of the database, the extensive traceing and cpu usage went away.

[email protected]:/home/oracle/ [OCM121] srvctl stop database -d OCM121
[email protected]:/home/oracle/ [OCM121] srvctl start database -d OCM121

I am not seeing the VKTM process in my top processes anymore. Beforehand, even on an idle system, the VKTM from the ASM and the one from the RDBMS instance have always been at the top.

[email protected]:/home/oracle/ [OCM121] top -c
top - 11:29:06 up  1:33,  2 users,  load average: 0.19, 0.69, 0.69
Tasks: 233 total,   2 running, 231 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.8%us,  0.5%sy,  0.0%ni, 98.7%id,  0.1%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  10021556k total,  3839340k used,  6182216k free,   211884k buffers
Swap:  6160380k total,        0k used,  6160380k free,   686380k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3643 grid      20   0 1559m  84m  39m S  1.3  0.9   0:55.64 /u00/app/grid/12.1.0.2/bin/oraagent.bin
 3660 grid      20   0  329m  28m  22m S  1.0  0.3   0:28.15 /u00/app/grid/12.1.0.2/bin/evmd.bin
 3517 grid      20   0 1507m  69m  48m S  0.7  0.7   0:34.74 /u00/app/grid/12.1.0.2/bin/ohasd.bin reboot
 3738 grid      20   0  262m  26m  21m S  0.7  0.3   0:28.03 /u00/app/grid/12.1.0.2/bin/evmlogger.bin -o /u00/app/grid/12.1
 3757 grid      20   0  791m  32m  23m S  0.7  0.3   0:31.30 /u00/app/grid/12.1.0.2/bin/cssdagent

Conclusion

Especially in virtual environment I have seen often a quite high cpu usage by the VKTM process, so take care that your time keeping via NTP is setup correctly. After NTP is running smoothly, you might want to disable the high-resolution ticks and disable the extensive tracing by the VKTM and VKRM processes. Obviously, this is not a general recommendation. You should test it yourself.

8 Comments

  • Hi William,

    this is a really great article! I noticed also a higher CPU usage on every VMWare hosted database server, caused by the VKM* processes. As soon as I get the approval for the mainteance window, I’ll try your workaround.

    Thanks for your effort and investigation!

    Dejan

    • William Sescu says:

      Hello Dejan, you are very welcome. Please be aware that, in case you are using 11gR2, a OnOff patch is required to activate the events which disables the extensive tracing. Cheers, William

  • dbatk says:

    Thanks for a nice summary of this issue. Saved me a lot time culling Metalink notes!

  • Uwe Teichmann says:

    Hi William,

    Thanks for this good article.

    In our environment we have the situation that at 18:10 the virtual machine is backed up. One step is to take a snaphot,
    which can up to 10 seconds froze the virtual machine. After the backup of the virtual machine Oracle reports afterwards the time drift. Oracle Client sessions connected during this time and not actively doing something can get an ORA-12170,
    causing the session to abort.

    Any tips to prevent this?

    Regards,
    Uwe

    • William Sescu says:

      Hello Uwe, in case your backup strategy is based on VMware shapshots, I would create a Data Gaurd environment in Maximum Availability Mode and do the snapshots on the Standby. Or don’t do any shapshots at all, and install e.g. a Networker client inside the VM and run RMAN backups. These are some ideas that comes to my mind. Cheers, William

  • Fernando says:

    Hi William,
    Many thanks for your article. In 12c r2 there’s also a “management database” (-MGMTDB). Can we set “_disable_highres_ticks”=true in that database?

  • laurent dhilly says:

    thank you so much for this clear explanation

  • Sreekanth says:

    *** 2020-05-05T22:03:24.232663+00:00 (CDB$ROOT(1))
    kstmchkdrift (kstmrmtickcntkeeper:lowres): Time jumped forward by (14000000)usec at (1588716204) whereas (5000000) is allowed
    kstmchkdrift (kstmrmtickcntkeeper:highres): Time jumped forward by (5483949)usec at (7588185546305) whereas (1000000) is allowed

    Do you suggest anything for above alert on alert log & was this affect any connectivity to application servers

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Oracle Team
Oracle Team