Not too long ago, Arm-based solutions were generally regarded as mobile solutions for cell phones or limited-purpose laptops. That has all changed with a move to power some of the world’s most demanding data center workloads using Arm-based servers. The Arm-based Ampere Altra processor is changing conventional thinking about accelerating storage workloads. With this new entrant comes new possibilities! Imagine the possibility of cloud-native storage built with an army of Arm-based dual-socket servers that consume the input/out (I/O) of twenty-four Micron® 7300 MAX SSDs with NVMe™. The Micron Storage Solutions Engineering team worked with the Ampere team to examine dense storage uses cases and the system performance under file and object storage loads. The Ampere Altra processor has a high capacity for input/output (I/O) expansion with 128 PCIe Gen4 lanes per processor and 192 lanes available for expansion in two-socket system configurations. In addition, for the all-flash NVMe storage market, Altra supports up to 24x NVMe devices using four PCIe lanes each, directly connected to the processor.
As a global leader in high-capacity and high-performance NVMe storage drives, Micron collaborated with Ampere to showcase the performance of this new compute platform for storage use cases. For this study, the teams focused on the Micron 7300 SSD with NVMe, which delivers six times the performance of a typical SATA SSD at a comparable price. The 7300 SSD delivers balanced performance for mixed read/write data center workloads.
Results of FIO testing
We first ran a series of flexible I/O (FIO) tests to characterize the performance of the Altra processor under high storage loads. We used Mt. Jade reference platforms, each equipped with two Ampere Altra 80-core processors and supporting up to twenty-four U.2 form-factor drives. This test used the Ampere Altra Q80-30 processor, which supports a 3.0Ghz operating frequency. The focus was on storage operational performance relative to server processor use. This testing allowed us to gauge various processor selection criteria for a variety of storage-intensive workloads.
Working with a full complement of 24x Micron 7300 MAX 800GB SSDs with NVMe, the team was able to drive sustained peak load into the drives. The FIO storage tests showed that the Ampere Altra can saturate up to 24 drives at their maximum specification in a linear fashion from one to 24 drives. The processing overhead required to drive this load remained low, with the top utilization for read workloads at 17.5%. This low CPU utilization for read workloads is an impressive level of kernel performance, given that there is no degradation in performance as the load scales. The latency for these I/O operations remained consistent at ~100µs, effectively tracking the sum of the drive latency specification and the software overhead of the stack. The processing overhead for the write workloads was, as expected, even lower.
Finally, these tests indicate that there is ample headroom for other value-added software and system tasks needed in the typical storage target workload. Alternatively, if the overall storage workload demand is lower, then lower-performing SKUs of Altra that use less power, such as the Q64-30 or the Q64-26, are excellent candidates.
The graphs below show the raw throughput data of the FIO tests on the Ampere Mt. Jade reference platform with the Ampere Altra Q80-30 processors installed.
Results of SPDK testing
Another popular software methodology to drive high-performance and high-density storage solutions is the Storage Performance Development Kit. SPDK provides a set of tools and libraries for writing high-performance, scalable user-mode storage applications. It achieves high performance through various optimizations in the software stack such as zero copy and interrupt elimination via polling mode drivers (PMD) as well as other techniques.
In this test case, our teams ran SPDK tests using 24x Micron 7300 MAX 3.2TB SSDs with NVMe on a similar two-socket Mt. Jade reference platform with Q80-30 Ampere Altra processors installed. Each NVMe disk uses four PCIe Gen3 lanes to deliver 520,000 4KiB random read operations per second (IOPS), 160,000 4KiB random write IOPS, and 3 GB/s sequential 128KiB read and 1.9 GB/s sequential 128KiB write throughputs per drive.
The NVMe drives were connected directly to the Ampere Altra processors in the test system, maximizing the number of NVMe disks while eliminating the need for expensive interconnect devices such as PCIe switches. While the current platform only supports 24x NVMe U.2 drives, the data in this blog implies that the processor can support additional drives given the proper lane breakout and drive enclosure solution.
The linear scalability of this dense storage configuration is the theme again. In the SPDK testing, we dedicated one Altra core per socket to run SPDK and an additional core to drive the load, effectively requiring pairs of cores to complete the test within a single server. This behavior mirrors the scalability of the FIO test but at a higher transactional performance that uses the higher individual drive specifications of the Micron SSDs and the overall efficiency of SPDK.
The SPDK efficiency exhibited on the 24x NVMe SSD configuration using random 4K reads required only six Altra cores running SPDK, or 7.5% of the available cores, to deliver nearly 13 million IOPs! Random 4K I/O is far more computationally demanding in terms of CPU use compared to sequential workloads. And remember that Altra has 80 cores!
The pair of dense storage tests that Ampere and Micron conducted exhibit the high performance and excellent efficiency of the Ampere Altra processors combined with the Micron 7300 SSDs with NVMe.
Given this data, we can conclude that the Altra Q80-30 is an extremely proficient I/O processor that can handle vast amounts of dedicated IOPS in storage target applications and leave substantial headroom for value-added storage software stacks.
Micron and Ampere are planning future testing to further characterize common storage workloads. We will build on this foundational performance, eventually extending to cluster storage solutions, like Ceph, to investigate real-world use cases on these systems.
Note: This blog is available on both micron.com and amperecomputing.com
All information herein is provided on as “AS IS” basis without warranties of any kind, including any implied warranties, warranties of merchantability or warranties of fitness for a particular purpose. Micron, the Micron orbit logo, the M orbit logo, Intelligence Accelerated™ and all other Micron trademarks are the property of Micron Technology, Inc. The Altra and Ampere are marks, logos and designs of Ampere Computing LLC, and are used by permission. All other trademarks are the property of their respective owners. Micron assumes no liability for lost, stolen or corrupted data arising from the use of any Micron product, including those products that incorporate any of the mentioned security features. Products are warranted only to meet Micron’s production data sheet specifications. Products, programs and specifications are subject to change without notice.
 Mt. Jade node composition: (GCC 8.3.1, -03), two sockets, 80 cores/socket, 1 node per socket (1 NPS), 3.0GHz, DDR = 32GB x 16 = 512GB @3200MHz, CentOS 8.0.1905, page size = 64K, kernel 4.18.0-80.11.2.el8.20200716+amp.opt.aarch64. Random workloads used 4K block size to maximize IOPs (I/O operations per second).