When doing a performance review I often do talk with the DBA about the CPU utilization of the server. How reliable is the server CPU utilization with tools like top or the host CPU utilization in the AWR-report? E.g. on an Linux Intel x86-64 server with 8 Cores and 16 logical CPUs (Intel Hyperthreading), what does a utilization of 50% mean?
As I had an ODA X7-M in a test lab available, I thought I’ll do some tests on that.

In my old days at Oracle Support we used a small script to test the CPU single thread performance of an Oracle DB-server:


set echo on
set linesize 120
set timing on time on
with t as ( SELECT rownum FROM dual CONNECT BY LEVEL <= 60 )
select /*+ ALL_ROWS */ count(*) from t,t,t,t,t
/

The SQL just burns a CPU-Core for around 20 seconds. Depending on your CPU single thread performance it may take a bit longer or completes faster.

On the ODA X7-M I have 16 Cores enabled and as hyperthreading enabled I do get 32 CPUs in /proc/cpuinfo:


oracle@dbi-oda01:/home/oracle/cbleile/ [CBL122] grep processor /proc/cpuinfo | wc -l
32
oracle@dbi-oda01:/home/oracle/cbleile/ [CBL122] lscpu | egrep "Thread|Core|Socket|Model name"
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
Model name: Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz

The CPU-speed was at 2.3 GHZ all the time:


[root@dbi-oda01 ~]# for a in `ls -l /sys/devices/system/cpu/cpu*/cpufreq | grep cpufreq | cut -d "/" -f6 | cut -d "u" -f2`; do echo "scale=3;`cat /sys/devices/system/cpu/cpu${a}/cpufreq/cpuinfo_cur_freq`/1000000" | bc; done
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301

The CPU is capable of running up to 3.7 GHZ, but that did not happen on my machine.

Running my SQL-script on the ODA X7-M actually took 17.49 seconds:


18:44:00 SQL> with t as ( SELECT rownum FROM dual CONNECT BY LEVEL <= 60 )
18:44:00 2 select /*+ ALL_ROWS */ count(*) from t,t,t,t,t
18:44:00 3 /
 
COUNT(*)
----------
777600000
 
Elapsed: 00:00:17.49

I continued to do the following tests (a job means running above SQL-script):
– 1 Job alone
– 2 Jobs concurrently
– 4 Jobs concurrently
– 8 Jobs concurrently
– 16 Jobs concurrently
– 24 Jobs concurrently
– 32 Jobs concurrently
– 40 Jobs concurrently
– 50 Jobs concurrently
– 60 Jobs concurrently
– 64 Jobs concurrently
– 128 Jobs concurrently

Here the result:


Jobs Min Time Max Time Avg Time Jobs/Cores Jobs/Threads Avg/Single-Time Thread utilization
 
1 17.49 17.49 17.49 0.06 0.03 1.00 1.00
2 17.51 17.58 17.55 0.13 0.06 1.00 1.00
4 17.47 17.86 17.62 0.25 0.13 1.01 0.99
8 17.47 17.66 17.55 0.50 0.25 1.00 1.00
16 17.64 21.65 18.50 1.00 0.50 1.06 0.95
24 18 27.38 24.20 1.50 0.75 1.38 0.72
32 32.65 34.57 33.21 2.00 1.00 1.90 0.53
40 34.76 42.74 40.31 2.50 1.25 2.30 0.54
50 48.26 52.64 51.21 3.13 1.56 2.93 0.53
60 52.4 63.6 60.63 3.75 1.88 3.47 0.54
64 54.2 68.4 64.27 4.00 2.00 3.67 0.54
128 119.49 134.34 129.01 8.00 4.00 7.38 0.54

When running with 16 Jobs top showed a utilization of around 50-52%. However running more than 16 Jobs showed an increase of the average time a job takes. I.e. with 16 Jobs the 16-Cores-Server is already almost fully utilized. Running with 32 Jobs results in an average elapsed time of 1.9 times compared to running 16 jobs (or less) concurrently. As it is 1.9 times and not 2 times I can conclude that there is an advantage of running with hyperthreading enabled, but it’s only around 5-10%.

So when calculating the utilization of your server then base it on the number of cores and not on the number of threads. When looking at your host CPU-utilization in top or in the AWR-report on an hyperthreaded-enabled server then it’s a good idea to multiply the server-utilization by 1.9.