For Ampere Altra family of Arm64-based Cloud Native Processors
The purpose of this guide is to describe techniques to run Redis in an optimal manner on Ampere ® Altra ® family of Cloud Native processors.
Redis is an open source, in-memory, key-value data store that is typically used as a fast cache. It uses an in-memory dataset, but data can be persisted through periodic writes or appends to disk. Due to its in-memory nature, Redis is very fast, and it can deliver high throughput at sub-millisecond latencies. It continues to rank highly in popularity among key value stores in the cloud, according to DB-engines.
We have used memtier_benchmark (developed by Redis Labs) as a load generator for benchmarking Redis. Each test was configured to run with multiple threads, multiple clients per thread, and with pipelining enabled.
Compile open source Redis on the server
Install dependencies
sudo dnf groupinstall "Development Tools" -y; \ sudo dnf install libnsl libevent-devel pcre-devel numactl -y; \ sudo dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm -y
Clone and build Redis from source with GCC
git clone https://github.com/antirez/redis.git && cd redis && git checkout 6.2.1
make CFLAGS=”-march=native”
Compile open source Memtier Benchmark on the client and server
Install dependencies
sudo dnf groupinstall "Development Tools" -y; \ sudo dnf install libnsl libevent-devel pcre-devel numactl -y; \ sudo dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm -y
Clone and build Memtier from source with GCC
cd $HOME && git clone https://github.com/RedisLabs/memtier_benchmark; \ cd memtier_benchmark && git checkout 793d74dbc09395dfc241342d847730a6197d7c0c
autoreconf -ivf && PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:${PKG_CONFIG_PATH} ./configure && sudo make install
Tune System Level Settings
On the server: optimize overcommit_memory and socket connections
echo -e "vm.overcommit_memory=1\nnet.core.somaxconn=65535\n" | sudo tee -a /etc/sysctl.conf; \ sudo /usr/sbin/sysctl -p
Ensure that firewall is provisioned adequately to run the test, either by disabling it or opening the correct ports for the Redis server processes. While running in the cloud, open ports for traffic in network security group.
Network Tuning
Redis is a network bandwidth and latency-sensitive application. Tuning network settings can have a large impact on performance, both throughput and latency. We recommend the following network tunings:
INTERFACE="" systemctl stop irqbalance.service tuned-adm profile throughput-performance echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor echo always > /sys/kernel/mm/transparent_hugepage/enabled echo always > /sys/kernel/mm/transparent_hugepage/defrag echo 1 > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag echo 262144 > /proc/sys/net/core/somaxconn echo 262144 > /proc/sys/net/ipv4/tcp_max_syn_backlog echo 50 > /proc/sys/net/core/busy_read echo 75 > /proc/sys/net/core/busy_poll echo "none" > /sys/kernel/debug/sched/preempt ethtool -C ${INTERFACE} adaptive-rx off adaptive-tx off ;: ethtool -K ${INTERFACE} lro on ;: ethtool -K ${INTERFACE} gro on ethtool -C ${INTERFACE} rx-usecs 1 tx-usecs 1 rx-frames 8192 tx-frames 8192;:
We have also set the affinities of the IRQs to the first 2 cores of the server. This removes any unpredictable network bottlenecks while pushing the Redis server processes on the remaining cores to 100% CPU utilization.
Application Setup
On the server: create a customized Redis configuration file by adapting the default configuration from source
cp ./redis/redis.conf ./redis_custom.conf; \ sed -i 's/bind 127.0.0.1 -::1/# bind 127.0.0.1 -::1/' redis_custom.conf; \ sed -i 's/protected-mode yes/protected-mode no/' redis_custom.conf; \ sed -i 's/tcp-backlog 511/tcp-backlog 4096/' redis_custom.conf; \ sed -i 's/daemonize no/daemonize yes/' redis_custom.conf; \ sed -i 's/# supervised auto/supervised no/' redis_custom.conf; \ sed -i 's/loglevel notice/loglevel warning/' redis_custom.conf; \ sed -i 's/# save ""/save ""/' redis_custom.conf; \ sed -i 's/dbfilename dump.rdb/# dbfilename dump.rdb/' redis_custom.conf; \ sed -i 's;dir ./;dir /tmp;' redis_custom.conf; \ sed -i 's/repl-disable-tcp-nodelay no/repl-disable-tcp-nodelay yes/' redis_custom.conf; \ sed -i 's/# maxmemory <bytes>/maxmemory 1gb/' redis_custom.conf; \ sed -i 's/# maxmemory-policy noeviction/maxmemory-policy allkeys-random/' redis_custom.conf; \ sed -i 's/# maxmemory-samples 5/maxmemory-samples 5/' redis_custom.conf; \ sed -i 's/# io-threads 4/io-threads 1/' redis_custom.conf; \ sed -i 's/disable-thp yes/disable-thp no/' redis_custom.conf; \ sed -i 's/hz 10/hz 50/' redis_custom.conf; \ sed -i 's/rdb-save-incremental-fsync yes/# rdb-save-incremental-fsync yes/' redis_custom.conf; \ sed -i 's/# ignore-warnings ARM64-COW-BUG/ignore-warnings ARM64-COW-BUG/' redis_custom.conf
On the server: start the Redis server processes on each core from 2-15 (leaving room for IRQs pinned to cores 0-1 in the previous step). Since Redis is single-threaded, we are pinning each Redis server process to one core/port.
Note: we’ve found that setting io-threads to roughly 75% of the total vCPU count gives the best performance.
PORT=6379 for CORE in $(seq 2 15); do nohup sudo numactl -C $CORE ./redis/src/redis-server ./redis_custom.conf --port $PORT --protected-mode no --ignore-warnings ARM64-COW-BUG --save "" --io-threads 12 --maxmemory-policy noeviction --maxmemory 1536mb &> /dev/null & ((PORT+=1)) done
On the server: warmup the Redis processes that have been started using Memtier benchmark on localhost
PORT=6379 for CORE in $(seq 2 15); do numactl -C $CORE ./memtier_benchmark/memtier_benchmark --server localhost --port $PORT --protocol redis --clients 1 --threads 1 --ratio 1:0 --data-size 32 --pipeline 100 --key-minimum 1 --key-maximum 10000000 --requests allkeys --print-percentile 50,90,95,99,99.9 & ((PORT+=1)) done
On the client: run memtier benchmark against Redis using the server’s internal IP address. We are aiming for 100% CPU utilization for all Redis server processes under SLA. The clients, threads, and pipeline command line options can be scaled up gradually to find this performance sweet spot.
SERVER_INTERNAL_IP="" PORT=6379 for CORE in $(seq 2 15); do numactl -C $CORE ./memtier_benchmark/memtier_benchmark --server $SERVER_INTERNAL_IP --port $PORT --protocol redis --clients 1 --threads 1 --ratio 1:9 --data-size 32 --pipeline 894 --key-minimum 1 --key-maximum 10000000 --key-pattern R:R --run-count 1 --test-time 30 --out-file /tmp/memtier_results_$PORT --print-percentile 50,90,95,99,99.9 --random-data & ((PORT+=1)) done wait