Infrastructure at your Service

Open source Team

AWS: PostgreSQL on Graviton2 with newer GCC

By Franck Pachot

.
In the previous post I have run PostgreSQL on AWS m6gd.2xlarge (ARM Graviton2 processor).
I didn’t precise the compilation option and this post will give more details following this feedback:

First, the PostgreSQL ./configure has correctly detected ARM and compiled with the following flags: -march=armv8-a+crc
This is ARM v8. However, LSE (Large System Extensions) for atomic instructions were added later in ARM v8.1 and they can make a huge difference on PostgreSQL especially with spinlocks on on high CPU usage.

I followed the information in https://github.com/aws/aws-graviton-getting-started/blob/master/c-c++.md to check the binaries after compilation.


for i in $(find postgres/src/backend -name "*.o") ; do objdump -d "$i" | awk '/:$/{w=$2}/aarch64_(cas|casp|swp|ldadd|stadd|ldclr|stclr|ldeor|steor|ldset|stset|ldsmax|stsmax|ldsmin|stsmin|ldumax|stumax|ldumin|stumin)/{printf "%-27s %-20s %-30s %-60s\n","(LSE instructions)",$NF,w,f}' f="$i" ; done | sort | uniq -c | sort -rnk1,4


      8 (LSE instructions)          <__aarch64_swp4_acq> <StartupXLOG>:                 postgres/src/backend/access/transam/xlog.o
      7 (LSE instructions)          <__aarch64_swp4_acq> <BitmapHeapNext>:              postgres/src/backend/executor/nodeBitmapHeapscan.o
      6 (LSE instructions)          <__aarch64_ldclr4_acq_rel> <LWLockDequeueSelf>:           postgres/src/backend/storage/lmgr/lwlock.o
      6 (LSE instructions)          <__aarch64_cas8_acq_rel> <shm_mq_send_bytes>:           postgres/src/backend/storage/ipc/shm_mq.o
      5 (LSE instructions)          <__aarch64_swp4_acq> <WalReceiverMain>:             postgres/src/backend/replication/walreceiver.o
      5 (LSE instructions)          <__aarch64_cas8_acq_rel> <shm_mq_receive_bytes.isra.0>: postgres/src/backend/storage/ipc/shm_mq.o
      4 (LSE instructions)          <__aarch64_swp4_acq> <ProcessRepliesIfAny>:         postgres/src/backend/replication/walsender.o
      4 (LSE instructions)          <__aarch64_swp4_acq> <hash_search_with_hash_value>: postgres/src/backend/utils/hash/dynahash.o
      4 (LSE instructions)          <__aarch64_swp4_acq> <copy_replication_slot>:       postgres/src/backend/replication/slotfuncs.o
      4 (LSE instructions)          <__aarch64_ldadd4_acq_rel> <parallel_vacuum_index>:       postgres/src/backend/access/heap/vacuumlazy.o
      4 (LSE instructions)          <__aarch64_cas4_acq_rel> <LWLockAcquire>:               postgres/src/backend/storage/lmgr/lwlock.o
      3 (LSE instructions)          <__aarch64_swp4_acq> <xlog_redo>:                   postgres/src/backend/access/transam/xlog.o
      3 (LSE instructions)          <__aarch64_swp4_acq> <XLogInsertRecord>:            postgres/src/backend/access/transam/xlog.o
      3 (LSE instructions)          <__aarch64_swp4_acq> <SaveSlotToPath>:              postgres/src/backend/replication/slot.o
      3 (LSE instructions)          <__aarch64_swp4_acq> <RequestCheckpoint>:           postgres/src/backend/postmaster/checkpointer.o
      3 (LSE instructions)          <__aarch64_swp4_acq> <LogicalRepSyncTableStart>:    postgres/src/backend/replication/logical/tablesync.o
      3 (LSE instructions)          <__aarch64_swp4_acq> <LogicalConfirmReceivedLocation>: postgres/src/backend/replication/logical/logical.o
      3 (LSE instructions)          <__aarch64_swp4_acq> <InvalidateObsoleteReplicationSlots>: postgres/src/backend/replication/slot.o
      3 (LSE instructions)          <__aarch64_swp4_acq> <CreateInitDecodingContext>:   postgres/src/backend/replication/logical/logical.o
      3 (LSE instructions)          <__aarch64_swp4_acq> <CreateCheckPoint>:            postgres/src/backend/access/transam/xlog.o
      3 (LSE instructions)          <__aarch64_swp4_acq> <CheckpointerMain>:            postgres/src/backend/postmaster/checkpointer.o
      3 (LSE instructions)          <__aarch64_ldclr4_acq_rel> <LWLockQueueSelf>:             postgres/src/backend/storage/lmgr/lwlock.o
      3 (LSE instructions)          <__aarch64_ldadd4_acq_rel> <tbm_prepare_shared_iterate>:  postgres/src/backend/nodes/tidbitmap.o
      3 (LSE instructions)          <__aarch64_ldadd4_acq_rel> <tbm_free_shared_area>:        postgres/src/backend/nodes/tidbitmap.o
      3 (LSE instructions)          <__aarch64_cas8_acq_rel> <ProcessProcSignalBarrier>:    postgres/src/backend/storage/ipc/procsignal.o
      3 (LSE instructions)          <__aarch64_cas8_acq_rel> <ExecParallelHashIncreaseNumBatches>: postgres/src/backend/executor/nodeHash.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <XLogWrite>:                   postgres/src/backend/access/transam/xlog.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <XLogSendPhysical>:            postgres/src/backend/replication/walsender.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <XLogBackgroundFlush>:         postgres/src/backend/access/transam/xlog.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <WalRcvStreaming>:             postgres/src/backend/replication/walreceiverfuncs.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <WalRcvRunning>:               postgres/src/backend/replication/walreceiverfuncs.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <WalRcvDie>:                   postgres/src/backend/replication/walreceiver.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <TransactionIdLimitedForOldSnapshots>: postgres/src/backend/utils/time/snapmgr.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <StrategyGetBuffer>:           postgres/src/backend/storage/buffer/freelist.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <shm_mq_wait_internal>:        postgres/src/backend/storage/ipc/shm_mq.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotReserveWal>:   postgres/src/backend/replication/slot.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotRelease>:      postgres/src/backend/replication/slot.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <ProcKill>:                    postgres/src/backend/storage/lmgr/proc.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <process_syncing_tables>:      postgres/src/backend/replication/logical/tablesync.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <pg_get_replication_slots>:    postgres/src/backend/replication/slotfuncs.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <exec_replication_command>:    postgres/src/backend/replication/walsender.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <CreateRestartPoint>:          postgres/src/backend/access/transam/xlog.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <ConditionVariableBroadcast>:  postgres/src/backend/storage/lmgr/condition_variable.o
      2 (LSE instructions)          <__aarch64_swp4_acq> <BarrierArriveAndWait>:        postgres/src/backend/storage/ipc/barrier.o
      2 (LSE instructions)          <__aarch64_ldset4_acq_rel> <LWLockWaitListLock>:          postgres/src/backend/storage/lmgr/lwlock.o
      2 (LSE instructions)          <__aarch64_ldclr4_acq_rel> <LWLockWaitForVar>:            postgres/src/backend/storage/lmgr/lwlock.o
      2 (LSE instructions)          <__aarch64_ldclr4_acq_rel> <LWLockUpdateVar>:             postgres/src/backend/storage/lmgr/lwlock.o
      2 (LSE instructions)          <__aarch64_ldadd4_acq_rel> <vacuum_delay_point>:          postgres/src/backend/commands/vacuum.o
      2 (LSE instructions)          <__aarch64_ldadd4_acq_rel> <StrategyGetBuffer>:           postgres/src/backend/storage/buffer/freelist.o
      2 (LSE instructions)          <__aarch64_ldadd4_acq_rel> <LWLockRelease>:               postgres/src/backend/storage/lmgr/lwlock.o
      2 (LSE instructions)          <__aarch64_ldadd4_acq_rel> <lazy_parallel_vacuum_indexes>: postgres/src/backend/access/heap/vacuumlazy.o
      2 (LSE instructions)          <__aarch64_cas8_acq_rel> <WalReceiverMain>:             postgres/src/backend/replication/walreceiver.o
      2 (LSE instructions)          <__aarch64_cas8_acq_rel> <WaitForProcSignalBarrier>:    postgres/src/backend/storage/ipc/procsignal.o
      2 (LSE instructions)          <__aarch64_cas8_acq_rel> <shm_mq_receive>:              postgres/src/backend/storage/ipc/shm_mq.o
      2 (LSE instructions)          <__aarch64_cas8_acq_rel> <ResolveRecoveryConflictWithLock>: postgres/src/backend/storage/ipc/standby.o
      2 (LSE instructions)          <__aarch64_cas8_acq_rel> <ProcSignalInit>:              postgres/src/backend/storage/ipc/procsignal.o
      2 (LSE instructions)          <__aarch64_cas8_acq_rel> <ExecParallelHashTableInsert>: postgres/src/backend/executor/nodeHash.o
      2 (LSE instructions)          <__aarch64_cas8_acq_rel> <ExecParallelHashTableInsertCurrentBatch>: postgres/src/backend/executor/nodeHash.o
      2 (LSE instructions)          <__aarch64_cas8_acq_rel> <ExecParallelHashIncreaseNumBuckets>: postgres/src/backend/executor/nodeHash.o
      2 (LSE instructions)          <__aarch64_cas4_acq_rel> <TransactionIdSetTreeStatus>:  postgres/src/backend/access/transam/clog.o
      2 (LSE instructions)          <__aarch64_cas4_acq_rel> <ProcArrayEndTransaction>:     postgres/src/backend/storage/ipc/procarray.o
      2 (LSE instructions)          <__aarch64_cas4_acq_rel> <LWLockAcquireOrWait>:         postgres/src/backend/storage/lmgr/lwlock.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <XLogWalRcvFlush.part.4>:      postgres/src/backend/replication/walreceiver.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <XLogSetReplicationSlotMinimumLSN>: postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <XLogSetAsyncXactLSN>:         postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <XLogSendLogical>:             postgres/src/backend/replication/walsender.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <XLogPageRead>:                postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <XLogNeedsFlush>:              postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <XLogGetLastRemovedSegno>:     postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <XLogFlush>:                   postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <worker_freeze_result_tape>:   postgres/src/backend/utils/sort/tuplesort.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <WalSndWakeup>:                postgres/src/backend/replication/walsender.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <WalSndWaitStopping>:          postgres/src/backend/replication/walsender.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <WalSndSetState>:              postgres/src/backend/replication/walsender.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <WalSndRqstFileReload>:        postgres/src/backend/replication/walsender.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <WalSndKill>:                  postgres/src/backend/replication/walsender.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <WalSndInitStopping>:          postgres/src/backend/replication/walsender.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <WalRcvForceReply>:            postgres/src/backend/replication/walreceiver.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <WaitXLogInsertionsToFinish>:  postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <UpdateMinRecoveryPoint.part.10>: postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <tuplesort_performsort>:       postgres/src/backend/utils/sort/tuplesort.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <tuplesort_begin_common>:      postgres/src/backend/utils/sort/tuplesort.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <table_block_parallelscan_startblock_init>: postgres/src/backend/access/table/tableam.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <SyncRepInitConfig>:           postgres/src/backend/replication/syncrep.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <SyncRepGetCandidateStandbys>: postgres/src/backend/replication/syncrep.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <StrategySyncStart>:           postgres/src/backend/storage/buffer/freelist.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <StrategyNotifyBgWriter>:      postgres/src/backend/storage/buffer/freelist.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <StrategyFreeBuffer>:          postgres/src/backend/storage/buffer/freelist.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <SnapshotTooOldMagicForTest>:  postgres/src/backend/utils/time/snapmgr.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <s_lock>:                      postgres/src/backend/storage/lmgr/s_lock.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <SIInsertDataEntries>:         postgres/src/backend/storage/ipc/sinvaladt.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <SIGetDataEntries>:            postgres/src/backend/storage/ipc/sinvaladt.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ShutdownWalRcv>:              postgres/src/backend/replication/walreceiverfuncs.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <shm_toc_insert>:              postgres/src/backend/storage/ipc/shm_toc.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <shm_toc_freespace>:           postgres/src/backend/storage/ipc/shm_toc.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <shm_toc_allocate>:            postgres/src/backend/storage/ipc/shm_toc.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <shm_mq_set_sender>:           postgres/src/backend/storage/ipc/shm_mq.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <shm_mq_set_receiver>:         postgres/src/backend/storage/ipc/shm_mq.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <shm_mq_sendv>:                postgres/src/backend/storage/ipc/shm_mq.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <shm_mq_get_sender>:           postgres/src/backend/storage/ipc/shm_mq.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <shm_mq_get_receiver>:         postgres/src/backend/storage/ipc/shm_mq.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <shm_mq_detach_internal>:      postgres/src/backend/storage/ipc/shm_mq.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ShmemAllocRaw>:               postgres/src/backend/storage/ipc/shmem.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <SharedFileSetOnDetach>:       postgres/src/backend/storage/file/sharedfileset.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <SharedFileSetAttach>:         postgres/src/backend/storage/file/sharedfileset.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <SetWalWriterSleeping>:        postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <SetRecoveryPause>:            postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <SetPromoteIsTriggered>:       postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <SetOldSnapshotThresholdTimestamp>: postgres/src/backend/utils/time/snapmgr.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <RequestXLogStreaming>:        postgres/src/backend/replication/walreceiverfuncs.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotsDropDBSlots>: postgres/src/backend/replication/slot.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotsCountDBSlots>: postgres/src/backend/replication/slot.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotsComputeRequiredXmin>: postgres/src/backend/replication/slot.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotsComputeRequiredLSN>: postgres/src/backend/replication/slot.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotsComputeLogicalRestartLSN>: postgres/src/backend/replication/slot.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotPersist>:      postgres/src/backend/replication/slot.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotMarkDirty>:    postgres/src/backend/replication/slot.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotDropPtr>:      postgres/src/backend/replication/slot.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotCreate>:       postgres/src/backend/replication/slot.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotCleanup>:      postgres/src/backend/replication/slot.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ReplicationSlotAcquireInternal>: postgres/src/backend/replication/slot.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <RemoveOldXlogFiles>:          postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <RemoveLocalLock>:             postgres/src/backend/storage/lmgr/lock.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <RecoveryRestartPoint>:        postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <RecoveryIsPaused>:            postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ReadRecord>:                  postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <PublishStartupProcessInformation>: postgres/src/backend/storage/lmgr/proc.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <PromoteIsTriggered>:          postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ProcSendSignal>:              postgres/src/backend/storage/lmgr/proc.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ProcessWalSndrMessage>:       postgres/src/backend/replication/walreceiver.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <PhysicalReplicationSlotNewXmin>: postgres/src/backend/replication/walsender.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <pg_stat_get_wal_senders>:     postgres/src/backend/replication/walsender.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <pg_stat_get_wal_receiver>:    postgres/src/backend/replication/walreceiver.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <pg_replication_slot_advance>: postgres/src/backend/replication/slotfuncs.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ParallelWorkerReportLastRecEnd>: postgres/src/backend/access/transam/parallel.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <MaintainOldSnapshotTimeMapping>: postgres/src/backend/utils/time/snapmgr.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <LWLockNewTrancheId>:          postgres/src/backend/storage/lmgr/lwlock.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <LogicalIncreaseXminForSlot>:  postgres/src/backend/replication/logical/logical.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <LogicalIncreaseRestartDecodingForSlot>: postgres/src/backend/replication/logical/logical.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <lock_twophase_recover>:       postgres/src/backend/storage/lmgr/lock.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <LockRefindAndRelease>:        postgres/src/backend/storage/lmgr/lock.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <LockAcquireExtended>:         postgres/src/backend/storage/lmgr/lock.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <KnownAssignedXidsSearch>:     postgres/src/backend/storage/ipc/procarray.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <KnownAssignedXidsGetAndSetXmin>: postgres/src/backend/storage/ipc/procarray.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <KnownAssignedXidsAdd>:        postgres/src/backend/storage/ipc/procarray.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <KeepLogSeg>:                  postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <InitWalSender>:               postgres/src/backend/replication/walsender.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <InitProcess>:                 postgres/src/backend/storage/lmgr/proc.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <InitAuxiliaryProcess>:        postgres/src/backend/storage/lmgr/proc.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <HotStandbyActive>:            postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <HaveNFreeProcs>:              postgres/src/backend/storage/lmgr/proc.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetXLogWriteRecPtr>:          postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetXLogReplayRecPtr>:         postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetXLogInsertRecPtr>:         postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetWalRcvFlushRecPtr>:        postgres/src/backend/replication/walreceiverfuncs.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetSnapshotCurrentTimestamp>: postgres/src/backend/utils/time/snapmgr.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetReplicationTransferLatency>: postgres/src/backend/replication/walreceiverfuncs.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetReplicationApplyDelay>:    postgres/src/backend/replication/walreceiverfuncs.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetRedoRecPtr>:               postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetRecoveryState>:            postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetOldSnapshotThresholdTimestamp>: postgres/src/backend/utils/time/snapmgr.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetLatestXTime>:              postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetInsertRecPtr>:             postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetFlushRecPtr>:              postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetFakeLSNForUnloggedRel>:    postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <GetCurrentChunkReplayStartTime>: postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <FirstCallSinceLastCheckpoint>: postgres/src/backend/postmaster/checkpointer.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <element_alloc>:               postgres/src/backend/utils/hash/dynahash.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <do_pg_stop_backup>:           postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <do_pg_start_backup>:          postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <DecodingContextFindStartpoint>: postgres/src/backend/replication/logical/logical.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ConditionVariableTimedSleep>: postgres/src/backend/storage/lmgr/condition_variable.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ConditionVariableSignal>:     postgres/src/backend/storage/lmgr/condition_variable.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ConditionVariablePrepareToSleep>: postgres/src/backend/storage/lmgr/condition_variable.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ConditionVariableCancelSleep>: postgres/src/backend/storage/lmgr/condition_variable.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <ComputeXidHorizons>:          postgres/src/backend/storage/ipc/procarray.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <CheckXLogRemoved>:            postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <CheckRecoveryConsistency.part.11>: postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <_bt_parallel_seize>:          postgres/src/backend/access/nbtree/nbtree.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <_bt_parallel_scan_and_sort>:  postgres/src/backend/access/nbtree/nbtsort.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <btparallelrescan>:            postgres/src/backend/access/nbtree/nbtree.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <_bt_parallel_release>:        postgres/src/backend/access/nbtree/nbtree.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <_bt_parallel_done>:           postgres/src/backend/access/nbtree/nbtree.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <_bt_parallel_advance_array_keys>: postgres/src/backend/access/nbtree/nbtree.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <btbuild>:                     postgres/src/backend/access/nbtree/nbtsort.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <BarrierParticipants>:         postgres/src/backend/storage/ipc/barrier.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <BarrierDetach>:               postgres/src/backend/storage/ipc/barrier.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <BarrierAttach>:               postgres/src/backend/storage/ipc/barrier.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <BarrierArriveAndDetach>:      postgres/src/backend/storage/ipc/barrier.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <BarrierArriveAndDetachExceptLast>: postgres/src/backend/storage/ipc/barrier.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <AuxiliaryProcKill>:           postgres/src/backend/storage/lmgr/proc.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <AdvanceXLInsertBuffer>:       postgres/src/backend/access/transam/xlog.o
      1 (LSE instructions)          <__aarch64_swp4_acq> <AbortStrongLockAcquire>:      postgres/src/backend/storage/lmgr/lock.o
      1 (LSE instructions)          <__aarch64_ldset4_acq_rel> <ProcessProcSignalBarrier>:    postgres/src/backend/storage/ipc/procsignal.o
      1 (LSE instructions)          <__aarch64_ldset4_acq_rel> <LWLockWaitForVar>:            postgres/src/backend/storage/lmgr/lwlock.o
      1 (LSE instructions)          <__aarch64_ldset4_acq_rel> <LWLockQueueSelf>:             postgres/src/backend/storage/lmgr/lwlock.o
      1 (LSE instructions)          <__aarch64_ldset4_acq_rel> <LWLockDequeueSelf>:           postgres/src/backend/storage/lmgr/lwlock.o
      1 (LSE instructions)          <__aarch64_ldset4_acq_rel> <LWLockAcquire>:               postgres/src/backend/storage/lmgr/lwlock.o
      1 (LSE instructions)          <__aarch64_ldset4_acq_rel> <LockBufHdr>:                  postgres/src/backend/storage/buffer/bufmgr.o
      1 (LSE instructions)          <__aarch64_ldset4_acq_rel> <EmitProcSignalBarrier>:       postgres/src/backend/storage/ipc/procsignal.o
      1 (LSE instructions)          <__aarch64_ldclr4_acq_rel> <LWLockReleaseClearVar>:       postgres/src/backend/storage/lmgr/lwlock.o
      1 (LSE instructions)          <__aarch64_ldadd8_acq_rel> <table_block_parallelscan_nextpage>: postgres/src/backend/access/table/tableam.o
      1 (LSE instructions)          <__aarch64_ldadd8_acq_rel> <EmitProcSignalBarrier>:       postgres/src/backend/storage/ipc/procsignal.o
      1 (LSE instructions)          <__aarch64_ldadd4_acq_rel> <find_or_make_matching_shared_tupledesc>: postgres/src/backend/utils/cache/typcache.o
      1 (LSE instructions)          <__aarch64_ldadd4_acq_rel> <ExecParallelHashJoin>:        postgres/src/backend/executor/nodeHashjoin.o
      1 (LSE instructions)          <__aarch64_cas8_acq_rel> <table_block_parallelscan_reinitialize>: postgres/src/backend/access/table/tableam.o
      1 (LSE instructions)          <__aarch64_cas8_acq_rel> <ProcWakeup>:                  postgres/src/backend/storage/lmgr/proc.o
      1 (LSE instructions)          <__aarch64_cas8_acq_rel> <ProcSleep>:                   postgres/src/backend/storage/lmgr/proc.o
      1 (LSE instructions)          <__aarch64_cas8_acq_rel> <pg_stat_get_wal_receiver>:    postgres/src/backend/replication/walreceiver.o
      1 (LSE instructions)          <__aarch64_cas8_acq_rel> <InitProcess>:                 postgres/src/backend/storage/lmgr/proc.o
      1 (LSE instructions)          <__aarch64_cas8_acq_rel> <InitAuxiliaryProcess>:        postgres/src/backend/storage/lmgr/proc.o
      1 (LSE instructions)          <__aarch64_cas8_acq_rel> <GetWalRcvWriteRecPtr>:        postgres/src/backend/replication/walreceiverfuncs.o
      1 (LSE instructions)          <__aarch64_cas8_acq_rel> <GetLockStatusData>:           postgres/src/backend/storage/lmgr/lock.o
      1 (LSE instructions)          <__aarch64_cas8_acq_rel> <ExecParallelScanHashBucket>:  postgres/src/backend/executor/nodeHash.o
      1 (LSE instructions)          <__aarch64_cas8_acq_rel> <CleanupProcSignalState>:      postgres/src/backend/storage/ipc/procsignal.o
      1 (LSE instructions)          <__aarch64_cas4_acq_rel> <UnpinBuffer.constprop.11>:    postgres/src/backend/storage/buffer/bufmgr.o
      1 (LSE instructions)          <__aarch64_cas4_acq_rel> <StrategySyncStart>:           postgres/src/backend/storage/buffer/freelist.o
      1 (LSE instructions)          <__aarch64_cas4_acq_rel> <StrategyGetBuffer>:           postgres/src/backend/storage/buffer/freelist.o
      1 (LSE instructions)          <__aarch64_cas4_acq_rel> <ProcessProcSignalBarrier>:    postgres/src/backend/storage/ipc/procsignal.o
      1 (LSE instructions)          <__aarch64_cas4_acq_rel> <PinBuffer>:                   postgres/src/backend/storage/buffer/bufmgr.o
      1 (LSE instructions)          <__aarch64_cas4_acq_rel> <MarkBufferDirty>:             postgres/src/backend/storage/buffer/bufmgr.o
      1 (LSE instructions)          <__aarch64_cas4_acq_rel> <LWLockRelease>:               postgres/src/backend/storage/lmgr/lwlock.o
      1 (LSE instructions)          <__aarch64_cas4_acq_rel> <LWLockConditionalAcquire>:    postgres/src/backend/storage/lmgr/lwlock.o

So, this confirms that it was compiled with -march=armv8-a and outline -moutline-atomics (which is the default in GCC >= 10 and also in the GCC 7 compiled in Amazon Linux 2). LSE (Large-System Extensions) are there, and we can see where the atomic instructions are used: WAL and buffer lightweight locks that protect access to shared memory.

for i in /usr/local/pgsql/bin/postgres $(find postgres/src/backend -name "*.o") ; do objdump -d "$i" | awk '/:$/{w=$2}/aarch64_(cas|casp|swp|ldadd|stadd|ldclr|stclr|ldeor|steor|ldset|stset|ldsmax|stsmax|ldsmin|stsmin|ldumax|stumax|ldumin|stumin)/{printf "%-27s %-40s %-40s %-60s\n","(LSE instructions)",$NF,w,f}/\t(ldxr|ldaxr|stxr|stlxr)\t/{printf "%-27s %-40s %-40s %-60s\n","(load and store exclusives)",$3,w,f}' f="$i" ; done | sort | uniq -c | sort -rn

      1 (load and store exclusives) stxr                                     <__aarch64_swp4_acq>:                    /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) stlxr                                    <__aarch64_ldset4_acq_rel>:              /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) stlxr                                    <__aarch64_ldclr4_acq_rel>:              /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) stlxr                                    <__aarch64_ldadd8_acq_rel>:              /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) stlxr                                    <__aarch64_ldadd4_acq_rel>:              /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) stlxr                                    <__aarch64_cas8_acq_rel>:                /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) stlxr                                    <__aarch64_cas4_acq_rel>:                /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) ldaxr                                    <__aarch64_swp4_acq>:                    /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) ldaxr                                    <__aarch64_ldset4_acq_rel>:              /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) ldaxr                                    <__aarch64_ldclr4_acq_rel>:              /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) ldaxr                                    <__aarch64_ldadd8_acq_rel>:              /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) ldaxr                                    <__aarch64_ldadd4_acq_rel>:              /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) ldaxr                                    <__aarch64_cas8_acq_rel>:                /usr/local/pgsql/bin/postgres
      1 (load and store exclusives) ldaxr                                    <__aarch64_cas4_acq_rel>:                /usr/local/pgsql/bin/postgres

This confirms that the PostgreSQL binary also contains load and store exclusives so that the binary can run on Graviton and Graviton2.


[[email protected] ~]$ nm /usr/local/pgsql/bin/postgres | grep -E "aarch64(_have_lse_atomics)?"

00000000008fb460 t __aarch64_cas4_acq_rel
00000000008fb490 t __aarch64_cas8_acq_rel
0000000000bbe640 b __aarch64_have_lse_atomics
00000000008fb4f0 t __aarch64_ldadd4_acq_rel
00000000008fb580 t __aarch64_ldadd8_acq_rel
00000000008fb520 t __aarch64_ldclr4_acq_rel
00000000008fb550 t __aarch64_ldset4_acq_rel
00000000008fb4c0 t __aarch64_swp4_acq

This is the run-time detection. As it was compiled for ARM v8, with atomics outlined, the same binary can run on v8 or >=v8.1


[[email protected] ~]$ gcc --version
gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12)
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

This is GCC 7, but on Amazon Linux 2 it has been patched to enable -moutline-atomics by default.

Install latest version of GCC (version 11 experimental)

Here is how I compiled the latest GCC available:


gcc --version
sudo yum -y install bzip2 git gcc gcc-c++ gmp-devel mpfr-devel libmpc-devel make flex bison
git clone https://github.com/gcc-mirror/gcc.git
cd gcc
make distclean
./configure --enable-languages=c,c++
make
sudo make install

This basically get the latest GCC fron source, compiles and installs it (please remember this is a lab – use stable versions elswhere)

[[email protected] ~]$ gcc --version
gcc (GCC) 11.0.1 20210309 (experimental)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Here we are: gcc 11.0.1 20210309 (experimental)

PGIO LIOPS

I’m running the same PGIO as in previous post


Date: Wed Mar 10 14:39:38 UTC 2021
Database connect string: "pgio".
Shared buffers: 8500MB.
Testing 4 schemas with 1 thread(s) accessing 1024M (131072 blocks) of each schema.
Running iostat, vmstat and mpstat on current host--in background.
Launching sessions. 4 schema(s) will be accessed by 1 thread(s) each.
pg_stat_database stats:
          datname| blks_hit| blks_read|tup_returned|tup_fetched|tup_updated
BEFORE:  pgio    | 38262338086 |    562443 |  37644815538 | 37635763756 |          24
AFTER:   pgio    | 49691750429 |    562449 |  48890461241 | 48878858651 |          49
DBNAME:  pgio. 4 schemas, 1 threads(each). Run time: 3600 seconds. RIOPS >793709<

This is a little higher than what I had: 793709 LIOPS / CPU where I had 780651 with GCC 7 but that’s still lower than the 896280 I had on x86.

Of course, there can be more optimisations as mentioned in https://github.com/aws/aws-graviton-getting-started/blob/master/c-c++.md
I’ll recompile with the recommended flags

(
cd postgres
CFLAGS="-march=armv8.2-a+fp16+rcpc+dotprod+crypto -mtune=neoverse-n1 -fsigned-char" ./configure
make clean
make
make install
)

I didn’t make any difference in the PGIO run. Of course, this may change with a read-write workload (more spinlocks) with checksum.

Note that I compiled with the default (empty) CFLAGS and then gcc was called with -march=armv8-a+crc (and -moutline-atomics is the default) so I’m in the same situation with run-time detection. Because the GCC >=10 behaviour has been backported by Amazon to the GCC 7 in Amazon Linux 2. This was not clear for me initially (I got this clarified here).

By the way, Aurora on Graviton2 is still compiled with GCC 7.4

Update 15-MAY-2021: I have rephrased a few things here which were not clear (even for myself) but I’ll write more on PostgreSQL on ARM, and on benchmarks in general. http://blog.pachot.net should send to the right place (or @FranckPachot twitter of course)

8 Comments

  • Alexander Korotkov says:

    What did you try simple pgbench with LSE?
    I don’t know what is pgio (project readme isn’t very descriptive). But I guess it intended to exercise the instance IO, but LSE is good on solving CPU bound not IO.
    You can see my research, LSE gives dramatic effect in pgbench.
    https://www.postgresql.org/message-id/CAPpHfdsGqVd6EJ4mr_RZVE5xSiCNBy4MuSvdTrKmTpM0eyWGpg%40mail.gmail.com

  • Hi Alexander, I’ll read your tests, that’s interesting.
    pgio is not only storage i/o. As long as the working set fits in shared buffers, or in filesystem cache, you know exactly what you measure. I didn’t test with concurrent sessions on a small working set, but that’s also possible to focus on spin lock.
    The problem with pgbench, default workload: many components are involved, lot of context switches, and that may be what you want to measure… or not.
    I’ve written about this here:
    https://franckpachot.medium.com/do-you-know-what-you-are-measuring-with-pgbench-d8692a33e3d6
    For example, I did recently some benchmark on other RISC architectures, like Power9. Very different results with pgbench in simple or prepared protocol for example. Hard to take decisions on that.

  • Linking Alexander’s tests on the latest optimizations of PostgreSQL 14 for ARM:
    https://akorotkov.github.io/blog/2021/04/30/arm/

  • Andrii Mandybura says:

    Hi, Franck!
    I folowed your guide and can’t receive same result
    – AWS m6g instance on Amazon Linux
    – build gcc
    – build postgres with parameters -march=armv8.2-a+fp16+rcpc+dotprod+crypto -mtune=neoverse-n1 -fsigned-char”
    But after this objdump and nm comands with grep display nothing, empty output.
    Where could be the problem?

  • Hi Andrii,
    Yes, this is because with those flags, the binary generates only LSE instructions. And the objdump I’m doing here looks at those instructions when outlined in __aarch64_ functions (GCC doing that to build a binary that can run on older ARM versions). Sorry for the confusion. I realized that and wrote another post about it ( https://blog.dbi-services.com/postgresql-on-aws-graviton2-cflags/ ) with look at atomic instructions in all case ( = v8.1 with LSE only, >= v8 with both and runtime detection). This should clear the doubts. If not, please tell me.
    Franck.

  • Igor Rudyk says:

    Hello Frank,
    Thank you for sharing this information, it is very helpful.
    It seems that objdump command shows us Large-system extensions are not present on Ubuntu 20.04.
    Could you please write down a pack of commands to enable LSE on Ubuntu OS Graviton2 with GCC 9.3 or GCC 10+? Can you please describe the flow how to identify whether LSE is enabled?

    Example,
    # objdump -d /usr/local/pgsql/bin/postgres | awk ‘/\t(ldxr|ldaxr|stxr|stlxr)/{print $3″\t(load and store exclusives)”}/\t(cas|casp|swp|ldadd|stadd|ldclr|stclr|ldeor|steor|ldset|stset|ldsmax|stsmax|ldsmin|stsmin|ldumax|stumax|ldumin|stumin)/{print $3″\t(large-system extensions)”}’ | sort -k2 | uniq -c
    67 ldaxr (load and store exclusives)
    277 ldxr (load and store exclusives)
    113 stlxr (load and store exclusives)
    231 stxr (load and store exclusives)

    Thanks
    Igor

  • Igor Rudyk says:

    Hello Frank,
    Hopefully you can assist us with the issue and an output of the command on Ubuntu 20.04 Graviton2 is provided below:

    [email protected]:/home/ubuntu# nm /usr/local/pgsql/bin/postgres | grep __aarch64_have_lse_atomics | wc -l
    0

    [email protected]:/home/ubuntu# objdump -d /usr/local/pgsql/bin/postgres | awk ‘/\t(ldxr|ldaxr|stxr|stlxr)/{print $3″\t(load and store exclusives)”}/\t(cas|casp|swp|ldadd|stadd|ldclr|stclr|ldeor|steor|ldset|stset|ldsmax|stsmax|ldsmin|stsmin|ldumax|stumax|ldumin|stumin)/{print $3″\t(large-system extensions)”}’ | sort -k2 | uniq -c
    8 casal (large-system extensions)
    7 ldaddal (large-system extensions)
    1 ldclral (large-system extensions)
    2 ldsetal (large-system extensions)
    31 swpa (large-system extensions)

  • Igor,
    Your first example has no LSE. Your second example has only LSE. You need to compile with CFLAGS=”-mno-outline-atomics” to get both with the __aarch64_ detection. Ubuntu GCC has this flag but not by default for GCC before 11.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Open source Team
Open source Team