By Franck Pachot

.
Oracle Cloud free tier is already outstanding since OOW19 with two VMs and two databases that you can leave always running without any risk of being billed or seeing them stopped. I use them daily. And it goes to another level on May 25th with the announce of the ARM Ampere Altra 80 cores processor running at 3.0 GHz. Because on your free tier, one of those two VMs can be a 4 vCPU 24 GB RAM machine. With the same conditions: always free, never shut down, can even been patched online with Autonomous Linux, free support… And no risk: the credit card you validate when you open the trial account will never be charged. Using non-free services requires explicit upgrade to paid account, with new credit card validation. Oracle partnered with Ampere which provides the Altra processor, with up to 80 cores running at 3.0 GHz. This directly competes with AWS Graviton2 (also an ARM v8.2 Neoverse N1) for which the free offer is a trial t4g.micro burstable instance where the first 750 hours per month are free until end of June 2021 and additional usage is billed.

Once you have created a free trial account on https://www.oracle.com/cloud/free/ with a phone number and credit card number, in addition to the 300$ credits to try any service, you can create an always free VM by choosing the Ampere processor in “Edit Shape”.

You don’t have to choose a specific ARM image. The following ones are available when you choose the Ampere shape:

  • Oracle Linux 7.9 and 8 (RHEL binary compatible)
  • Ubuntu 20.04 18.04 (not the “mininal” versions)
  • Oracle Linux Cloud Developer 8 (the Oracle Linux image that contains Java, GraalVM, Oracle Database Instant Client, SQL Developer and SQLcl and many other development tools)

Autonomous Linux is not available at the time I’m writing this, but I think you have all in Oracle Linux Cloud Developer 8.

I have chosen Oracle Linux 7.9 which is in my opinion the best for a database.


[opc@arm-lon ~]$ lscpu

Architecture:          aarch64
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Model:                 1
BogoMIPS:              50.00
NUMA node0 CPU(s):     0-3
Flags:                 fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs

[opc@arm-lon gcc]$ grep ^CPU /proc/cpuinfo | sort -u

CPU architecture: 8
CPU implementer : 0x41
CPU part        : 0xd0c
CPU revision    : 1
CPU variant     : 0x3

lscpu doesn’t tell you a lot but /proc/cpuinfo ARM (0x41) Neoverse N1 (0xD0CU) revision r3p1, which is an ARM v8.2 processor. I install the latest GCC to be able to use -march=armv8-2a with the default -moutline-atomics (more about it in this previous post). However, please not that the right solution to compile for this processor is using “-mcpu=neoverse-n1” with the GCC 10 that is packaged with devtoolset-10 – as in this post: https://www.dbi-services.com/blog/postgresql-on-arm-oci/

I’ll compile the latest GCC on this Oracle Linux 7.9:


/usr/local/bin/gcc --version | grep "gcc (GCC) 11" || (

sudo yum -y install bzip2 git gcc gcc-c++ gmp-devel mpfr-devel libmpc-devel make flex bison
git clone --branch releases/gcc-11 https://github.com/gcc-mirror/gcc.git
cd gcc
make distclean
./configure --enable-languages=c,c++ --disable-multilib
make
sudo make install

sudo yum remove -y gcc
)

This installs the latest GCC 11 release (it takes a while to compile…)


[opc@arm-lon gcc]$ /usr/local/bin/gcc -march=native -dM -E - <<<"" | awk '/ARM/{printf " "$2"="$3}'
 __ARM_SIZEOF_WCHAR_T=4 __ARM_FEATURE_ATOMICS=1 __ARM_FEATURE_AES=1 __ARM_FEATURE_IDIV=1 __ARM_FP=14 __ARM_SIZEOF_MINIMAL_ENUM=4 __ARM_FEATURE_DOTPROD=1 __ARM_FEATURE_CRYPTO=1 __ARM_ALIGN_MAX_PWR=28 __ARM_FP16_FORMAT_IEEE=1 __ARM_FEATURE_FP16_SCALAR_ARITHMETIC=1 __ARM_FP16_ARGS=1 __ARM_FEATURE_CLZ=1 __ARM_FEATURE_QRDMX=1 __ARM_64BIT_STATE=1 __ARM_FEATURE_FMA=1 __ARM_ARCH_PROFILE=65 __ARM_PCS_AAPCS64=1 __ARM_FEATURE_FP16_VECTOR_ARITHMETIC=1 __ARM_ARCH=8 __ARM_FEATURE_UNALIGNED=1 __ARM_ARCH_8A=1 __ARM_FEATURE_SHA2=1 __ARM_FEATURE_CRC32=1 __ARM_NEON=1 __ARM_ALIGN_MAX_STACK_PWR=16 __ARM_FEATURE_NUMERIC_MAXMIN=1 __ARM_ARCH_ISA_A64=1[

The features are exactly the same as the AWS Graviton2 and the ATOMICS is what is really interesting for me to run databases (LSE – Large System Extensions, atomic instructions to optimize spinlock, latch and synchronization).

I’m talking about processor instructions that are the same as AWS Graviton2, because this is ARM v8.2-a specifications, but there are some differences, like:


$ ls /sys/kernel/mm/hugepages

hugepages-16777216kB  hugepages-2048kB  hugepages-524288kB

$ grep ^Hugepagesize /proc/meminfo

Hugepagesize:     524288 kB

Yes, huge page size can be larger than the Intel 2MB ones, and the default is 500MB. This is perfect for databases where shared buffers are allocated by gigabytes at instance start, and are mapped by hundred of processes. Remember that this processor has 80 cores. Note that this is for Oracle Linux. The Ubuntu 20.4 default pagesize is 2MB.

Talking about processor cores, there is no multi-threading there. This is very good for a database where you want predictable performance: when the OS schedules a process to run, it is guaranteed to have a physical core available for its instructions. And another process will not mess-up with branch prediction and share the CPU caches. Each core has 64KB of L1 cache and 1MB of L2 cache.

When comparing with AWS Graviton2 the CPU frequency is higher.

[opc@arm-lon gcc]$ sudo  dmidecode -t processor | grep "Speed"

        Max Speed: 2000 MHz
        Current Speed: 2000 MHz

lscpu doesn’t give any info about the processor speed, and dmidecode is not reliable here.

I’ll use James Price program (see http://uob-hpc.github.io/2017/11/22/arm-clock-freq.html)


cc -DITRS=1e10 -xc - -o /var/tmp/freq <<'CAT'

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#ifndef ITRS
#define ITRS 1000000000
#endif
int main(int argc, char *argv[])
{
  struct timeval tv;
  gettimeofday(&tv, NULL);
  double start = tv.tv_sec + tv.tv_usec*1e-6;
  long instructions;
  for (instructions = 0; instructions < ITRS; )
  {
#define INST0 "add  %[i], %[i], #1\n\t"
#define INST1 INST0 INST0 INST0 INST0   INST0 INST0 INST0 INST0 INST0 INST0 INST0 INST0   INST0 INST0 INST0 INST0
#define INST2 INST1 INST1 INST1 INST1   INST1 INST1 INST1 INST1 INST1 INST1 INST1 INST1   INST1 INST1 INST1 INST1
#define INST3 INST2 INST2 INST2 INST2   INST2 INST2 INST2 INST2 INST2 INST2 INST2 INST2   INST2 INST2 INST2 INST2
    asm volatile (
      INST3
      : [i] "+r" (instructions)
      :
      : "cc"
      );
  }
  gettimeofday(&tv, NULL);
  double end = tv.tv_sec + tv.tv_usec*1e-6;
  double runtime = end-start;
  printf("Runtime (seconds)     = %lf\n", runtime);
  printf("Instructions executed = %ld\n", instructions);
  printf("Estimated frequency   = %.2lf MHz\n", (instructions/runtime)*1e-6);
  return 0;

}
CAT
/var/tmp/freq 

This just runs a bunch of ‘add’ assembler instructions and estimates the frequency from the run time.


[opc@arm-lon ~]$ ./freq -DITRS=1e9

Runtime (seconds)     = 0.334466
Instructions executed = 1000001536
Estimated frequency   = 2989.85 MHz

This is 3.0 Ghz (as a comparison, AWS Graviton2 gives 2.5 GHz)

Another way to look at it is run something CPU bound with perf stat:


[opc@ampere-altra] timeout -s INT 60 perf stat yes >/dev/null

yes: Interrupt

 Performance counter stats for 'yes':

         59,977.98 msec task-clock:u              #    1.000 CPUs utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
                35      page-faults:u             #    0.001 K/sec
   179,146,129,962      cycles:u                  #    2.987 GHz
   660,120,368,732      instructions:u            #    3.68  insn per cycle
         branches:u
         1,444,040      branch-misses:u

      59.980961953 seconds time elapsed

      59.770686000 seconds user
       0.159955000 seconds sys

However, on AWS Graviton2 I got incorrect cycles measures.

The processor is Ampere Altra 80c (https://amperecomputing.com/altra/) with 80 cores at 3.0 GHz which is quite impressive. But there is more than that. It implements all ARM v8.2 with atomic instructions. The atomic operations that were implemented with load and store loops have their atomic instruction. I’ll show that with PostgreSQL in the next post.