Ampere Computing Logo
Contact Sales
Ampere Computing Logo
Hero Image

Memcached on AmpereOne

Memcached In-Memory Key-Value Store

Overview

Memcached is an open source, in-memory, key-value data store that is typically used for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering. Due to its in-memory nature, Memcached is intended for use in speeding up dynamic web applications by caching data and objects in RAM to alleviating database loading.

It continues to be ranked as one of most popular key-value stores in the cloud, according to DB-engines. In this workload brief, we compare AmpereOne® A192-32X to AMD EPYC 9654 and AMD EPYC 9754 processors running Memcached while measuring the throughput and latencies on each of these processors.

Memcached on AmpereOne® A192-32X

AmpereOne® 192-32X is designed to deliver exceptional performance for cloud native applications like Memcached. This is accomplished through an innovative architectural design, operating at consistent frequencies, and using single-threaded cores that make applications more resistant to noisy neighbor issues. This allows workloads to run in a predictable manner with minimal variance under increasing loads.

The processor is also designed to deliver exceptional energy efficiency. This translates to industry leading performance/watt capabilities and a lower carbon footprint.


Benefits of running Memcached on AmpereOne® A192-32X

  • Cloud Native: Designed from the ground up for "born in the cloud" workloads like Memcached, AmpereOne® A192-32X can deliver up to 15% higher performance than the best x86 servers.

  • Power Efficient: With up to 192 energy-efficient Arm cores, AmpereOne® A192-32X can consume lower power while maintaining competitive levels of performance.

  • Lower Carbon Footprint: Industry leading performance and high energy efficiency result in AmpereOne® A192-32X demonstrating up to 1.79x higher performance/watt, leading to lower TCO and a smaller carbon footprint.

  • Consistency & Predictability: Singlethreaded cores running at fixed maximum frequencies ensure linear scaling under stringent SLAs and at high loads while running Memcached.

Benchmarking Configuration

We have used memtier_benchmark (developed by Redis Labs) as a load generator for benchmarking Memcached. Each test was configured to run with multiple threads, multiple clients per thread, and with pipelining enabled.

We recommend compiling Memcached server with GCC 13.2.1 or newer as newer compilers have made significant progress towards generating optimized code that can improve performance.

We used Fedora38 Server Edition (kernel 6.4.13-200.fc38.aarch64) with Memcached 1.6.21 compiled with GCC 13.2.1 for our tests. We compared AmpereOne® A192-32X, AMD EPYC 9654 "Genoa" and AMD EPYC 9754 "Bergamo" (refer to the chart below for results). For each of the tests, we used similar clients to generate requests to Memcached-server.

Since it is realistic to measure throughput under a specified Service Level Agreement (SLA), we have used a 99th percentile latency (p.99) of 1 millisecond. This ensures that 99 percent of the requests have a response time of 1 ms in the worst case.

The test ran for 2 minutes with a 1:10 set:get ratio (1 key/value write and 10 key/value read) and 64 bytes payload, which is common for in-memory caches. We initially used an appropriate number of clients and threads/client to load one instance of Memcached server, while ensuring the p.99 latency was at most 1 ms. The pipelining feature in Memcached allows the client to pack multiple requests into one single request packet which can reduce packet processing overhead. This feature can dramatically reduce response times.

Next, we successively increased the number of Memcached instances till one or more instances violated the p.99 latency SLA. The aggregate throughput of all instances was used as the primary performance metric. We ran the test three times and saw minimal run-to-run variations.

AmpereOne® A192-32X Industry-leading Performance and Energy Efficiency on Memcached
Fig.1: Throughput (Higher is Better)
Fig.2: Performance/Watt (Higher is Better)
Benchmarking Results and Conclusions

As can be seen in the chart above, we observed up to a 1.15x improvement in performance on AmpereOne® A192-32X compared to AMD EPYC 9654 "Genoa" (see chart).

For large-scale cloud deployments, performance/watt (i.e. energy efficiency) is an important metric in addition to raw performance. AmpereOne processors have 1.79x better performance/watt under a specified SLA than that on AMD Genoa servers and 1.41x higher performance/watt compared to that on AMD Bergamo servers.

Fast in-memory caches are used in most cloud usages today. Memcached is a popular high throughput in-memory key-value store that is applicable to low latency applications in a scale out configuration. AmpereOne is designed to deliver exceptional performance and energy efficiency for cloud native applications like Memcached. In Ampere’s testing, the processor demonstrated up to 1.15x performance improvements and achieved up to 1.79x energy efficiency improvements. For more information on this workload or other workloads our engineers have been working on, please visit the Ampere Solutions Center.

Footnotes

All data and information contained herein is for informational purposes only and Ampere reserves the right to change it without notice. This document may contain technical inaccuracies, omissions and typographical errors, and Ampere is under no obligation to update or correct this information. Ampere makes no representations or warranties of any kind, including but not limited to express or implied guarantees of noninfringement, merchantability, or fitness for a particular purpose, and assumes no liability of any kind. All information is provided “AS IS.” This document is not an offer or a binding commitment by Ampere. Use of the products contemplated herein requires the subsequent negotiation and execution of a definitive agreement or is subject to Ampere’s Terms and Conditions for the Sale of Goods.

System configurations, components, software versions, and testing environments that differ from those used in Ampere’s tests may result in different measurements than those obtained by Ampere.

©2024 Ampere Computing. All Rights Reserved. Ampere, Ampere Computing, Altra and the ‘A’ logo are all registered trademarks or trademarks of Ampere Computing. Arm is a registered trademark of Arm Limited (or its subsidiaries). All other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

Ampere Computing® / 4655 Great America Parkway, Suite 601 / Santa Clara, CA 95054 / amperecomputing.com

Created At : September 6th 2024, 6:20:12 pm
Last Updated At : September 20th 2024, 5:20:50 pm
Ampere Logo

Ampere Computing LLC

4655 Great America Parkway Suite 601

Santa Clara, CA 95054

image
image
image
image
image
 |  |  |  | 
© 2024 Ampere Computing LLC. All rights reserved. Ampere, Altra and the A and Ampere logos are registered trademarks or trademarks of Ampere Computing.
This site is running on Ampere Altra Processors.