By Franck Pachot

.
Last year, I measured the CPU performance for an Oracle Database on several types of AWS instances. Just by curiosity, I’ve run the same test (SLOB cached reads) now that Amazon has applied all Spectre and Meltdown mitigation patches.

I must admit that I wanted to test this on the Oracle Cloud first. I’ve updated a IaaS instance to the latest kernel but the Oracle Unbreakable Enterprise Kernel does not include the Meltdown fix yet, and booting on the Red Hat Compatible Kernel quickly goes to a kernel panic not finding the root LVM.

This is not a benchmark you can rely on to estimate the CPU usage overhead on your application. This test is not doing system calls (so the KPTI fix should be at its minimal impact). If your application is bound on system calls (network roundtrips, physical reads) the consequences can be worse. But in that case, you have a design problem which was just masked by hardware, optimized, but insecure, by a processor running the code before testing.

Figures from last year: M4.xlarge: 4vCPU, 16GB RAM

M4 is hyper-threaded so with 2 Oracle processor licenses we can use 4 vCPU.
Here I was on Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz, 2 cores with 2 threads each.

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               1.0              13.1      0.00      5.46
              DB CPU(s):               1.0              13.1      0.00      5.46
  Logical read (blocks):         874,326.7      11,420,189.2
Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               2.0              27.3      0.00      9.24
              DB CPU(s):               2.0              27.2      0.00      9.22
  Logical read (blocks):       1,540,116.9      21,047,307.6
Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               4.0              54.6      0.00     14.46
              DB CPU(s):               4.0              54.3      0.00     14.39
  Logical read (blocks):       1,779,361.3      24,326,538.0

Jan. 2018 with Spectre and Meltdown mitigation:

Same CPU now with the latest RedHat kernel.


[ec2-user@ip-172-31-15-31 ~]$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.11.6.el7.x86_64 root=UUID=3e11801e-5277-4d87-be4c-0a9a61fbc3da ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 crashkernel=auto LANG=en_US.UTF-8

Here is the LIOPS result for the same runs.

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               1.0              13.7      0.00      4.69
              DB CPU(s):               1.0              13.7      0.00      4.69
  Logical read (blocks):         808,954.0      11,048,988.1
Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
            DB Time(s):                2.0              27.3      0.00      8.00
              DB CPU(s):               2.0              27.1      0.00      7.96
  Logical read (blocks):       1,343,662.0      18,351,369.1
Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               4.0              42.9      0.00     13.49
              DB CPU(s):               4.0              42.5      0.00     13.37
  Logical read (blocks):       1,684,204.6      18,106,823.6

Jan. 2018, with Spectre and Meltdown patches, but disabled IBRS, IBPB, KPTI

The RedHat kernel has options to disable Indirect Branch Restricted Speculation, Indirect Branch Prediction Barriers and Kernel Page Table Isolation


[ec2-user@ip-172-31-15-31 ~]$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.11.6.el7.x86_64 root=UUID=3e11801e-5277-4d87-be4c-0a9a61fbc3da ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 crashkernel=auto LANG=en_US.UTF-8 nopti noibrs noibpb

Here are the same runs after rebooting with nopti noibrs noibpb kernel options:

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               1.0              30.1      0.00      4.86
              DB CPU(s):               1.0              29.8      0.00      4.80
  Logical read (blocks):         861,138.5      25,937,061.0
Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               2.0              27.3      0.00      8.00
              DB CPU(s):               2.0              27.0      0.00      7.92
  Logical read (blocks):       1,493,336.8      20,395,790.6
Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               4.0              42.9      0.00     13.49
              DB CPU(s):               4.0              42.4      0.00     13.34
  Logical read (blocks):       1,760,218.4      18,911,346.0
       Read IO requests:              33.5             360.2

Then with only KPTI disabled, but all Spectre mitigation enabled

Here only the page table isolation is is disabled.


[ec2-user@ip-172-31-15-31 ~]$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.11.6.el7.x86_64 root=UUID=3e11801e-5277-4d87-be4c-0a9a61fbc3da ro console=ttyS0,115200n8 console=tty0 net.ifnames=0 crashkernel=auto LANG=en_US.UTF-8 nopti

Here are the same runs witn only nopti kernel option:

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               1.0              30.1      0.00      3.91
              DB CPU(s):               1.0              29.8      0.00      3.87
  Logical read (blocks):         873,451.2      26,303,984.2
Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               2.0              23.1      0.00      7.60
              DB CPU(s):               2.0              22.9      0.00      7.54
  Logical read (blocks):       1,502,151.4      17,360,883.8
Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               4.0              42.9      0.00     12.64
              DB CPU(s):               4.0              42.4      0.00     12.50
  Logical read (blocks):       1,764,293.0      18,954,682.3

Large pages

The previous tests were using small pages. I did a quick test with KPTI enabled and SGA using large pages:

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               1.0              30.1      0.00      4.85
              DB CPU(s):               1.0              30.1      0.00      4.85
  Logical read (blocks):         854,682.1      27,672,906.8

Here is the same but with KPTI disabled:

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               1.0              30.1      0.00      4.85
              DB CPU(s):               1.0              30.1      0.00      4.85
  Logical read (blocks):         920,129.9      27,672,906.8

So what?

This is just a test on a synthetic workload. Nothing similar to a production database situation. However, those cached SLOB runs are doing what an optimized database application should do most of the time: read blocks from the buffer cache. At least this test is much better than the graphs without explanations, or the SELECT 1, that I have seen these days on social media.

Some interesting food for thought in those numbers, by the way.

Now vs. last year: between 5% and 12% degradation, which is what people have reported those days in general. That looks high but usually when we do database performance troubleshooting we are there to address queries with x10 to x100 CPU usage doing unnecessary stuff because of bad design or suboptimal execution plan.

If disable KPTI: degradation is less than 1%, so that’s an easy way to get the same performance if you are sure that you control all software running. At least temporarily before some database tuning is done.

If disable KPTI, IBRS and IBPB: not better than when disabling only KPTI. I’ve no explanation about that… Makes me wonder if those predictive branching are always a good idea.

In all case, if you are not allocating SGA with large pages, then you should. The KPTI degradation is lowered with large pages, which makes sense as the page table is smaller. And if you are not yet using large pages, the benefit will probably balance the KPTI degradation.

This is not a benchmark and your application may see a higher degradation if doing a lot of system calls. If you upgrade from an old kernel, you may even see an improvement thanks to other features compensating the mitigation ones.