Memcached tests were performed on bare-metal single socket servers with equivalent memory, networking and storage configurations for each of the platforms shown. The processors tested here are: AMD EPYC 7763 "Milan"; Intel Xeon 8380 "Ice Lake"; Ampere Altra Q80-30; Ampere Altra Max M128-30. For more information on testing configurations click
The memtier_benchmark (developed by Redis Labs) was used as a load generator for benchmarking Memcached. Each test was configured to run with multiple threads, multiple clients per thread, and with pipelining enabled.
We recommend to compiling Memcached server with GCC (GNU Compiler Collection) 10.2 or newer as newer compilers have made significant progress towards generating optimized code that can improve performance for Aarch64 applications.
CentOS 8.4 (kernel 4.18) with Memcached 1.6.9 were used and compiled with GCC 10.2 for the tests. We compared Ampere Altra Max M128-30, AMD EPYC 7763 and Intel Ice Lake (refer to the chart below for results). For each of the tests, similar clients were used to generate requests for the Memcached-server.
The test ran for 2 minutes with a 1:10 set:get ratio (1 key/value write and 10 key/value read) and 128 bytes payload, which is common for in-memory caches. An appropriate number of clients and threads/client were used to load one instance of Memcached server, while ensuring the p.99 latency was at most 1 ms. The pipelining feature in Memcached allows client to pack multiple requests into one single request packet which can reduce packet processing overhead. This feature can dramatically reduce response times and 126 concurrent pipelined requests were used in these tests for Ampere Altra Max M128- 30.