Ampere claims the top CephFS storage cluster on Arm in IO500’s 10 Node Challenge


Sean Varley

Real-world storage clusters just got a shot in the Arm! Sorry for the pun, but in this case it is apt.  Notable events in the HPC world for Arm-based compute platforms such as the #1 spot in the Top500 announced in June (Arm #1 with Fugaku supercomputer) continue to compound today with the release of the IO500 10 Node Challenge list, a ranked comparison of storage systems that work in tandem with the world’s largest supercomputers.

This IO500 benchmark highlights storage design and performance tuning benefits of HPC workloads through a competitive list compiled twice per year to coincide with ISC and SC tradeshows. In 2020, this process has been disrupted like everything else in our world, but the list goes on and we are excited to see the first Arm based CephFS cluster ranking at #27 in the 10 Node Challenge list, which is the specifically tuned list to catalogue the best storage performance in a constrained 10 client node environment.  With the many new entrants this year, the list’s curators are debating how systems can be fairly evaluated in terms of systems built for a benchmark vs. real-world implementations that can be built off-the-shelf in a budget any university could afford! For the collaboration between the partners that published this benchmark, we were definitely aiming for the latter.

In a six-way collaboration (codenamed Hammerhead) between Ampere, Arm, SUSE, nVidia, Micron and Broadcom, we’ve demonstrated the first Ceph distributed filesystem (running on Ampere’s Armv8 compatible cores) was able to achieve the top Ceph-based score on Arm in the IO500 10 Node Challenge benchmark list. Despite the list growing larger this year with tuned systems for the benchmark, the aim of this group was to firmly improve the performance of a production oriented storage solution that can supply any organization with a high-performance cluster tuned for low power at an attractive cost point.

Ceph storage has always been a workhorse for truly unified, distributed, reliable, high performance, and most importantly, highly scalable storage beyond the exabyte level. Yet, while Ceph is certainly not a new entrant to the storage market, it has taken a more tentative path toward the upper echelons of high performance HPC storage.  In recent years, the pace of innovation in Ceph has been building with many contributions coming from a variety of Linux vendors and others including some of the partners in this collaboration.  Indeed, contributions to the project by SUSE and Arm have considerably improved efficiency and made it possible for Ceph-based solutions to provide a complete end-to-end storage platform especially on Arm-native platforms like Ampere.  All of these developments in Ceph technology have given rise to more and more performant storage clusters.

The improved performance along with all the reliability, versatility and scalability features have made Ceph a trusted solution and the preferred backend to major open source cloud platforms such as OpenStack, CloudStack, and OpenNebula.  Inroads in the cloud have been mirrored in telecommunications, media and unified enterprise storage environments across a variety of industries.

Role

Qty

Node

Components

Admin, Monitor, Gateway, MDS Nodes

6

Ampere 1U Servers

(Lenovo HR330A)

  • 1x Ampere eMAG 8180 32Core 3.3GHz

  • 32GB DRAM 4x8 DIMM 2667

  • 2x Micron 7300 Pro NVMe M.2 480GB

  • 1x Mellanox MCX516A-CCAT Dual Port 100GbE

OSD Nodes

10

Ampere 2U Servers

(Lenovo HR350A)

  • 1x Ampere eMAG 8180 32Core 3.3GHz

  • 128GB 8x16 DIMM 2667

  • 2x Micron 240GB 5300 NVMe M.2

  • 4x Micron 3.84TB 7300 PRO NVMe U.2 SSD

  • 1x BRCM 9500-16i HBA

  • 1x Mellanox MCX516A-CCAT Dual Port 100GbE

Network Switch

1

Mellanox Spectrum 2 100G Switch

Table 1: Hammerhead Cluster Configuration

 

The Cluster

For those that are interested in the innards of the storage cluster, the configuration is listed in the table above.  With no major updates to the software stack used for the 2019 IO500 benchmark by SUSE, (Operating system was SLES 15 Service Pack 1 with SUSE Enterprise Storage version 6, based on the Ceph version “Nautilus”) the major difference in this benchmark was the cluster configuration shown.

The partners sought to beat the score of 12.43 from last year’s SC19 submission (TigerShark) to the 10 Node Challenge.  That full configuration can be seen in the SUSE blog at the time.  This year, by using the combined performance of the Ampere eMAG® Arm-based server along with 100G networking provided by nVidia (MLNX NIC and Switch), and a storage subsystem provided by Micron (NVMe drives) and Broadcom (HBA), the theory was we could improve on the that score. The cluster was set up to emulate a production system that a customer can deploy, therefore all appropriate Ceph security and data protection features were enabled during the benchmark runs.

This configuration was designed by the partners in this collaboration to not only be blazingly fast but also to be very cost efficient and deployable!  In fact, the performance per dollar metric for this configuration over the previous x86 based TigerShark configuration is a whopping 209% better1!  This is based on the 26% improved performance over the substantial ~40% better cost1 in a street price comparison of both configurations!

 

The Results

Tuning and configuration of a large cluster for benchmarks like the IO500 can be time consuming.  Our collaboration of partners were nothing but tenacious in working through the small and large issues that can come up in an effort like the Hammerhead project.  The difference this time around was the hardware configuration.  The Hammerhead team put together the configuration above to compare head to head with the Intel based Tiger Shark cluster.  In the end, the results speak for themselves:

  • Performance: Score 15.67 which was 26% higher than the Tiger Shark cluster.
  • Power: The cluster nodes peaked at about 152W during full throttle testing.  A result more than 50% better than the typical node power of the X86 cluster which typically ran greater than 310W per node!
  • Performance/$$: This came in at an amazing 200% advantage over the comparable cluster configuration.

 

If you would like to see more information on the specific setup of the SUSE Enterprise Storage stack and additional details of the cluster please see the reference architecture document here.

 

Strong Collaboration and More to Come!   

The great results on this Hammerhead I cluster demonstrate that high performance Ceph-based storage can come in highly power-efficient and cost-effective packages.  These parameters are important to prospective customers in cloud, enterprise, HPC or any other market that is looking for solid distributed storage solutions to answer the ever-growing data needs of today let alone tomorrow.  We are very grateful for the help of all our partners at SUSE, Arm, nVidia, Micron, and Broadcom for their contributions to this Hammerhead I storage cluster.  It really would not have been possible without everyone contributing not just for this cluster but for all the engineering behind every component in software and hardware needed to stand up a cluster of this magnitude.

While we at Ampere are excited to share the results of this collaborative benchmark with you today, we are not resting on this effort!  We have plans to up the ante on a Hammerhead II cluster based on our forthcoming Ampere® Altra™ processor platforms featuring 80 Arm N1 cores per socket, 8 channels of DDR4 at 3200 and up to 192 PCIe g4 lanes across 2 sockets for loads of drives, networking and other high speed IO.  Stay tuned at the end of this year for more on the IO500 and the Hammerhead cluster project.

Read more from our partners here.

Arm

Suse

 

1 Cluster price is estimated “street pricing” for both clusters obtained from public sources such as CDW.com, Lenovo.com and Newegg.com during the month of July 2020.

 



Related Post