By Franck Pachot

.
The Exadata X5 can be installed Bare Metal or Virtualized. The latter one, Oracle VM allows to create multiple clusters on one machine, is more complex for installation and for storage capacity planning. But it allows a small flexibility on options licencing. Those are the real challenges behind the choice. However, when we talk about virtualization, most of questions are about the overhead. Last week, we did some tests on same machine with different configuration, thanks to Arrow Oracle Authorized Solution Center.

Comparison is not easy. Bare Metal has all resources. Virtualized has to distribute resources. And this test is very specific: all I/O hitting the ‘extreme’ flash cache because I don’t expect any virtualization overhead to be in milliseconds. So, don’t expect some universal conclusions from those tests. And don’t hesitate to comment about my way to read those numbers.

CPU

Do not expect a benchmark that shows the maximum capacity of the machine here. I’m comparing a bare metal node with 36 cores with a VM with 4 vCPUS. So I’ll compare a one thread workload only: SLOB with one session and SCALE=100M UPDATE_PCT=0 RUN_TIME=120 WORK_UNIT=64

Bare Metal load profile:


Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               1.0              30.5      0.00      2.91
              DB CPU(s):               1.0              29.0      0.00      2.76
      Background CPU(s):               0.0               0.2      0.00      0.00
      Redo size (bytes):          14,172.4         432,594.0
  Logical read (blocks):         810,244.4      24,731,696.3
          Block changes:              41.7           1,271.3
 Physical read (blocks):             111.6           3,407.8
Physical write (blocks):               0.0               0.3
       Read IO requests:             111.3           3,397.3
      Write IO requests:               0.0               0.3
           Read IO (MB):               0.9              26.6
          Write IO (MB):               0.0               0.0
         Executes (SQL):          12,285.1         374,988.5

Virtualized load profile:


Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               1.0              30.6      0.00      4.37
              DB CPU(s):               1.0              29.8      0.00      4.26
      Background CPU(s):               0.0               0.2      0.00      0.00
      Redo size (bytes):          13,316.5         410,404.0
  Logical read (blocks):         848,095.1      26,137,653.8
          Block changes:              41.1           1,266.3
 Physical read (blocks):             109.1           3,361.3
Physical write (blocks):               0.0               0.3
       Read IO requests:             103.8           3,198.5
      Write IO requests:               0.0               0.3
           Read IO (MB):               0.9              26.3
          Write IO (MB):               0.0               0.0
         Executes (SQL):          13,051.2         402,228.0

We can say that CPU and RAM performance is similar.

I/O

Now about IOPS on the storage cell flash cache.
I’ll compare SLOB with one session and SCALE=100000M UPDATE_PCT=100 RUN_TIME=120 WORK_UNIT=64

Bare Metal load profile:


Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               1.0               0.0      0.02      4.06
              DB CPU(s):               0.1               0.0      0.00      0.49
      Background CPU(s):               0.1               0.0      0.00      0.00
      Redo size (bytes):       1,652,624.9          51,700.6
  Logical read (blocks):           2,582.2              80.8
          Block changes:           4,214.5             131.9
 Physical read (blocks):           2,060.6              64.5
Physical write (blocks):           1,818.0              56.9
       Read IO requests:           2,051.0              64.2
      Write IO requests:           1,738.6              54.4
           Read IO (MB):              16.1               0.5
          Write IO (MB):              14.2               0.4
         Executes (SQL):              66.3               2.1
              Rollbacks:               0.0               0.0
           Transactions:              32.0

Virtualized load profile:


Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):               1.0               0.0      0.01      3.49
              DB CPU(s):               0.3               0.0      0.00      1.01
      Background CPU(s):               0.2               0.0      0.00      0.00
      Redo size (bytes):       2,796,963.3          51,713.3
  Logical read (blocks):           4,226.0              78.1
          Block changes:           7,107.0             131.4
 Physical read (blocks):           3,470.6              64.2
Physical write (blocks):           3,278.7              60.6
       Read IO requests:           3,462.0              64.0
      Write IO requests:           3,132.0              57.9
           Read IO (MB):              27.1               0.5
          Write IO (MB):              25.6               0.5
         Executes (SQL):              86.9               1.6
              Rollbacks:               0.0               0.0
           Transactions:              54.1

In two minutes we did more work here. Timed events show statistics about the ‘cell single block reads’ which are nothing else than ‘db file sequential read’ renamed to look more ‘Exadata’. No SmartScan happens here as they go to buffer cache and we cannot do any filtering for blocks that will be shared with other sessions.

Bare Metal:

                                           Total Wait       Wait   % DB Wait
Event                                Waits Time (sec)    Avg(ms)   time Class
------------------------------ ----------- ---------- ---------- ------ --------
cell single block physical rea     249,854      115.7       0.46   94.9 User I/O
DB CPU                                           14.6              12.0

Virtualized: Total Wait Wait % DB Wait

Event                                Waits Time (sec)    Avg(ms)   time Class
------------------------------ ----------- ---------- ---------- ------ --------
cell single block physical rea     425,071      109.3       0.26   89.4 User I/O
DB CPU                                           35.2              28.8

Lower latency here on average which explains why we did more work. But no conclusion before we know where the latency comes from. Averages hides the details, and it’s the same with the ‘IO Profile’ section:

Bare Metal

IO Profile                  Read+Write/Second     Read/Second    Write/Second
~~~~~~~~~~                  ----------------- --------------- ---------------
            Total Requests:           3,826.6         2,055.1         1,771.5
         Database Requests:           3,789.5         2,051.0         1,738.6
        Optimized Requests:           3,720.7         1,985.1         1,735.6
             Redo Requests:              32.5             0.0            32.5
                Total (MB):              32.0            16.2            15.9
             Database (MB):              30.3            16.1            14.2
      Optimized Total (MB):              29.3            15.6            13.7
                 Redo (MB):               1.7             0.0             1.7
         Database (blocks):           3,878.6         2,060.6         1,818.0
 Via Buffer Cache (blocks):           3,878.6         2,060.6         1,818.0
           Direct (blocks):               0.0             0.0             0.0

Virtualized

IO Profile                  Read+Write/Second     Read/Second    Write/Second
~~~~~~~~~~                  ----------------- --------------- ---------------
            Total Requests:           6,652.2         3,467.0         3,185.2
         Database Requests:           6,594.0         3,462.0         3,132.0
        Optimized Requests:           6,582.7         3,461.2         3,121.5
             Redo Requests:              54.7             0.0            54.7
                Total (MB):              55.6            27.2            28.4
             Database (MB):              52.7            27.1            25.6
      Optimized Total (MB):              51.8            27.1            24.6
                 Redo (MB):               2.8             0.0             2.8
         Database (blocks):           6,749.3         3,470.6         3,278.7
 Via Buffer Cache (blocks):           6,749.3         3,470.6         3,278.7
           Direct (blocks):               0.0             0.0             0.0

and for IO statistics.
Bare Metal:

                 Reads:  Reqs    Data   Writes:  Reqs    Data    Waits:   Avg
Function Name      Data per sec per sec    Data per sec per sec   Count  Tm(ms)
--------------- ------- ------- ------- ------- ------- ------- ------- -------
Buffer Cache Re    1.9G  2050.9 16.093M      0M     0.0      0M  250.2K     0.5
DBWR                 0M     0.0      0M    1.7G  1740.5 14.216M       0     N/A
LGWR                 0M     0.0      0M    201M    32.5  1.648M    3914     0.3
Others               8M     4.1   .066M      1M     0.5   .008M     560     0.0
TOTAL:             1.9G  2055.0 16.159M    1.9G  1773.4 15.872M  254.6K     0.5

Virtualized:

                 Reads:  Reqs    Data   Writes:  Reqs    Data    Waits:   Avg
Function Name      Data per sec per sec    Data per sec per sec   Count  Tm(ms)
--------------- ------- ------- ------- ------- ------- ------- ------- -------
Buffer Cache Re    3.3G  3462.7  27.12M      0M     0.0      0M  425.6K     0.3
DBWR                 0M     0.0      0M    3.1G  3133.9 25.639M       0     N/A
LGWR                 0M     0.0      0M    341M    54.7  2.775M    6665     0.3
Others              10M     5.0   .081M      1M     0.5   .008M     514     0.3
TOTAL:             3.3G  3467.7 27.202M    3.4G  3189.0 28.422M  432.7K     0.3

I’ve put the physical read statistics side-by-side to compare:


                                 BARE METAL                        VIRTUALIZED
 
Statistic                                     Total     per Trans              Total     per Trans
-------------------------------- ------------------ ------------- ------------------ -------------
cell flash cache read hits                  242,142          62.1            425,365          64.0
cell logical write IO requests                5,032           1.3              8,351           1.3
cell overwrites in flash cache              200,897          51.5            937,973         141.1
cell physical IO interconnect by      8,145,832,448   2,089,210.7     14,331,230,720   2,156,044.9
cell writes to flash cache                  638,514         163.8          1,149,990         173.0
physical read IO requests                   250,168          64.2            425,473          64.0
physical read bytes                   2,059,042,816     528,095.1      3,494,084,608     525,663.4
physical read partial requests                    4           0.0                  0           0.0
physical read requests optimized            242,136          62.1            425,365          64.0
physical read total IO requests             250,671          64.3            426,089          64.1
physical read total bytes             2,067,243,008     530,198.3      3,504,136,192     527,175.6
physical read total bytes optimi      1,993,089,024     511,179.5      3,497,918,464     526,240.2
physical read total multi block                   0           0.0                  0           0.0
physical reads                              251,348          64.5            426,524          64.2
physical reads cache                        251,348          64.5            426,524          64.2
physical reads cache prefetch                 1,180           0.3              1,051           0.2
physical reads direct                             0           0.0                  0           0.0
physical reads direct (lob)                       0           0.0                  0           0.0
physical reads prefetch warmup                1,165           0.3              1,016           0.2
physical write IO requests                  212,061          54.4            384,909          57.9
physical write bytes                  1,816,551,424     465,901.9      3,300,933,632     496,605.0
physical write requests optimize            211,699          54.3            383,624          57.7
physical write total IO requests            216,077          55.4            391,445          58.9
physical write total bytes            2,026,819,072     519,830.5      3,656,793,600     550,142.0
physical write total bytes optim      1,755,620,352     450,274.5      3,171,875,328     477,189.0
physical write total multi block                531           0.1                942           0.1
physical writes                             221,747          56.9            402,946          60.6
physical writes direct                            0           0.0                  0           0.0
physical writes direct (lob)                      0           0.0                  0           0.0
physical writes from cache                  221,747          56.9            402,946          60.6
physical writes non checkpoint              221,694          56.9            402,922          60.6

We already know that there were more work on the OVM run but comparing the ‘per transaction’ statistics show similar things but a bit more ‘flash cache’ ‘optimized’ I/O in the second run.
Of course, even if it’s the same machine, it has been re-imaged, database re-created, different volume and capacity. So maybe I hit more the cell flash on the second run than on the first one and more reads on spinning disks can explain the difference on single block reads latency.

We need to get beyond the averages with the wait event histograms. They don’t show lower than millisecond in the AWR report (I’ve opened an enhancement request for 12.2 about that) but I collected them from the V$EVENT_HISTOGRAM_MICRO

Bare Metal:

EVENT                                    WAIT_TIME_MICRO WAIT_COUNT WAIT_TIME_FORMAT
---------------------------------------- --------------- ---------- ------------------------------
cell single block physical read                        1          0 1 microsecond
cell single block physical read                        2          0 2 microseconds
cell single block physical read                        4          0 4 microseconds
cell single block physical read                        8          0 8 microseconds
cell single block physical read                       16          0 16 microseconds
cell single block physical read                       32          0 32 microseconds
cell single block physical read                       64          0 64 microseconds
cell single block physical read                      128        533 128 microseconds
cell single block physical read                      256     240142 256 microseconds
cell single block physical read                      512       7818 512 microseconds
cell single block physical read                     1024        949 1 millisecond
cell single block physical read                     2048        491 2 milliseconds
cell single block physical read                     4096       1885 4 milliseconds
cell single block physical read                     8192       3681 8 milliseconds
cell single block physical read                    16384       2562 16 milliseconds
cell single block physical read                    32768        257 32 milliseconds
cell single block physical read                    65536         52 65 milliseconds
cell single block physical read                   131072          3 131 milliseconds
cell single block physical read                   262144          0 262 milliseconds
cell single block physical read                   524288          1 524 milliseconds

Virtualized:

EVENT                                    WAIT_TIME_MICRO WAIT_COUNT WAIT_TIME_FORMAT
---------------------------------------- --------------- ---------- ------------------------------
cell single block physical read                        1          0 1 microsecond
cell single block physical read                        2          0 2 microseconds
cell single block physical read                        4          0 4 microseconds
cell single block physical read                        8          0 8 microseconds
cell single block physical read                       16          0 16 microseconds
cell single block physical read                       32          0 32 microseconds
cell single block physical read                       64          0 64 microseconds
cell single block physical read                      128          1 128 microseconds
cell single block physical read                      256     322113 256 microseconds
cell single block physical read                      512     105055 512 microseconds
cell single block physical read                     1024       1822 1 millisecond
cell single block physical read                     2048        813 2 milliseconds
cell single block physical read                     4096        681 4 milliseconds
cell single block physical read                     8192        283 8 milliseconds
cell single block physical read                    16384        231 16 milliseconds
cell single block physical read                    32768         64 32 milliseconds
cell single block physical read                    65536         11 65 milliseconds
cell single block physical read                   131072          3 131 milliseconds

In the first run we see more reads around 8ms which confirms the previous guess that we had more flash cache hit on the second run.
The waits between 128 and 512 milliseconds are from the cell flash storage and this is where I want to see if virtualization has an overhead.
I’ve put it in color there where it’s easier to visualize that most of the reads are in the 128-256 range. Bare Metal in blue, OVM in orange.

CaptureX5BMVM

In Bare Metal, most of the reads are faster than 256 microseconds. In virtualized there are some significant reads are above. This may be cause by virtualization but anyway that’s not a big difference. I don’t think that virtualization overhead is an important criteria when choosing how to install your Exadata. Storage capacity planning is the major criteria: consolidate all storage in two diskgroups (DATA and RECO) for all databases, or partition them for each cluster. choice is about manageability and agility in provisioning vs. licence optimization.