Ampere Computing Logo
Contact Sales
Ampere Computing Logo
Customer reference board (CRB) platforms from Ampere

PostgreSQL Tuning Guide

for Ampere Altra Processors on Oracle Cloud Infrastructure

Overview

This document provides setup and tuning details for PostgreSQL running on Ampere Altra and Altra Max systems.

Note: This tuning guide will show you how to build/tune PostgreSQL for Ampere processors. Please note there is scope to stress the system more by using additional pg threads/more memory to get the best performance as needed.

This test was done on OCI - A1 instance with 4 OCPUs and 40GB RAM. For this test, we used pgbench to stress PostgreSQL. We ran pgbench on the same system as PostgreSQL to avoid network latencies, however, if you find that pgbench consumes significant CPU resources, then it is recommended to run it on a different client system/VM.

  • Operating Sytem: Oracle Linux 8.6
  • Kernel: 5.4.17-2136.308.9.el8uek.aarch64
  • Storage: 1 TB – default balanced SSD (use iSCSI commands to attach the disk)
  • THP is set to always.
Setup and Tuning

Follow the steps below to setup the system:


sudo yum update sudo yum groupinstall 'Development Tools' sudo yum install readline-devel

The test used an external disk converted to xfs file system for storing PostgreSQL data.

The test downloaded the latest PostgreSQL source code to build it. GCC 10.2 and above is recommended to build PostgreSQL as it includes lse based optimizations.

wget https://ftp.postgresql.org/pub/source/v14.4/postgresql-14.4.tar.gz

This is an example to build PostgreSQL with recommended GCC flags.

CC=${path to GCC bin} CFLAGS="-mcpu=neoverse-N1 -march=armv8.2-a+crypto+FP16+rcpc+dotprod+crc+lse -O3" ./configure --enable-arm-crc32 --prefix=${path to postgresql build} && make clean && make -j && make install

Initilize and start the newly built PostgreSQL.

pg_ctl -D /<postgres data dir> initdb pg_ctl -D /<postgres data dir> -l logfile start

Here is an example of postgresql.conf used for this benchmarking exercise:

listen_addresses = 'localhost' port = 5432 # (change requires restart) max_connections = 100 # (change requires restart) shared_buffers = 25GB # min 128kB huge_pages = try # on, off, or try max_wal_senders = 0 max_wal_size=200GB maintenance_work_mem = 512MB # min 1MB autovacuum_work_mem = -1 # min 1MB, or -1 to use maintenance_work_mem max_stack_depth = 7MB # min 100kB dynamic_shared_memory_type = posix # the default is the first option max_files_per_process = 4000 # min 25 effective_io_concurrency = 32 # 1-1000; 0 disables prefetching wal_level = minimal # minimal, archive, hot_standby, or logical synchronous_commit = off # synchronization level; wal_buffers = -1 # min 32kB, -1 sets based on shared_buffers checkpoint_timeout = 1h # range 30s-1h checkpoint_completion_target = 1 # checkpoint target duration, 0.0 - 1.0 checkpoint_warning = 0 # 0 disables log_min_messages = error # values in order of decreasing detail: log_min_error_statement = error # values in order of decreasing detail: log_timezone = 'GB' log_checkpoints = on log_autovacuum_min_duration = 0 autovacuum = off # Enable autovacuum subprocess? 'on' datestyle = 'iso, dmy' timezone = 'GB' lc_messages = 'C.UTF-8' # locale for system error message lc_monetary = 'C.UTF-8' # locale for monetary formatting lc_numeric = 'C.UTF-8' # locale for number formatting lc_time = 'C.UTF-8' # locale for time formatting default_text_search_config = 'pg_catalog.english' max_locks_per_transaction = 64 # min 10 max_pred_locks_per_transaction = 64 # min 10 On Recoverable mode, we took out some of the flags that would impact recoverability.

Adequately scale max_connections, shared_buffers and max_wal_size to avoid bottlenecks to sufficiently scale throughput delivered by PostgreSQL. Now, we create a database and populate it using pgbench. Next, we run a test with 8 clients and 8 threads to stress the database.

pgbench -i -s 50 pg50 // create a database of scale factor 50 <use the size you need> pgbench -c 8 -j 8 -T 60 pg50 // start the test, you may -c and -j as needed.

Following the above steps, we measured a TPS of 6146, with a latency of 1.3ms on the above-mentioned OCI A1 system based on Ampere Altra. The same steps on a comparable OCI E4 instance resulted in a TPS of 5547 with a latency of 1.4ms. There is more scope to push CPU/pgbench threads as needed on this system, as seen below.

Scaling of -c (clients) and -j(threads)

This table shows scaling of -c (clients) and -j(threads) within pgbench on a database of scale factor 50.

-c, -j A1 - tps A1 - latency (ms) E4 – tps E4 – latency (ms)
86,0411.3 4,963 1.6
166,7232.35,6642.8
327,946 46,010 5.3
488,351 5.76,4627.4
648,624 7.46,545 9.8

In general, it is good practice to do a manual checkpoint between each run. This is to avoid automatic checkpoints in between the run.

Created At : June 1st 2023, 10:43:08 pm
Last Updated At : July 31st 2023, 4:27:15 pm
Ampere Logo

Ampere Computing

4655 Great America Parkway

Suite 601 Santa Clara, CA 95054

image
image
 |  |  |  |  | 
© 2022 Ampere Computing LLC. All rights reserved. Ampere, Altra and the A and Ampere logos are registered trademarks or trademarks of Ampere Computing.
This site runs on Ampere Processors.