Spark Workload Brief
with Ampere Altra Max Processors
In this solution brief, we provide an analysis of various benchmarks run on Spark using Ampere Altra® Max® processors and Intel x86 processors. The benchmarks Spark TeraSort and Spark TPC-DS encompass power and performance metrics as well as rack-level efficiency in a datacenter.
We observed that servers equipped with Ampere processors outperformed x86 systems in both raw TeraSort throughput and Performance/Watt (Perf/Watt) at the rack level by 18% and 95% respectively. This performance advantage results in fewer racks needed to operate Big Data services, such as Spark, at scale. By using fewer racks, the overall datacenter footprint is reduced, leading to fewer infrastructure components like servers, switches, and cables. This reduction results in a decrease in the square footage, cooling, water, and other supporting resources required to maintain the datacenter. This efficiency improvement enables datacenter operations to achieve their PUE (Power Usage Effectiveness), carbon footprint and other SLA (Service Level Agreement) objectives.
Ampere processors are engineered to offer a higher number of cores per socket, thereby maximizing the number of cores per rack. Clusters leveraging Ampere Altra processors gain an advantage from the power-optimized design, which reduces power consumption and ensures predictable performance for big data applications and other data lake technologies. Additionally, the processors are designed to deliver exceptional energy efficiency, thus resulting in industry-leading Perf/Watt at the individual server level and Performance per Rack (Perf/Rack) at scale. This not only reduces operating costs but also results in a significantly lower carbon footprint.
Apache Spark is an open-source distributed processing system designed for big data workloads. Spark overcomes the limitations of Hadoop by performing in-memory processing and using Resilient Distributed Datasets (RDD). It offers APIs in Java, Scala, and Python, supporting a variety of real-time analytic workloads, batch processing, interactive queries, and machine learning. As such, Apache Spark is ideally complemented by the Ampere Altra Family of processors and lends itself well to the density and performance advantages of Cloud Native Processors.
To evaluate the performance and scalability of Spark, we used both HiBench and TPC-DS benchmarking tools. We compared the performance data collected from two three-node clusters, one equipped with Ampere Altra Max processors and the other with Intel Ice Lake processors. We assessed the rack-level performance to gain a better understanding of the overall system efficiency as operators scale out their infrastructure.
TPC-DS is a decision support benchmark that models several aspects of a decision support system, including data maintenance and queries. It is a widely accepted industry standard benchmark that helps organizations make informed decisions about their technology choices for decision support systems.
On the other hand, HiBench is a comprehensive big data benchmark suite that helps to evaluate different big data frameworks in terms of speed, throughput, and system resource utilization. The benchmarks in HiBench cover a range of tasks, including data generation, data preprocessing, data analytics, and machine learning, making it a useful tool for evaluating the performance of big data processing systems in different scenarios.
Architecture | Ampere Altra Max | x86_64 |
---|---|---|
Make & Model | HPE RL300 | Dell PowerEdge R650 |
Cluster Nodes | 3 | 3 |
CPU | Ampere M128-30 | Intel Ice Lake Xeon SP 6342 |
Sockets/Node | 1 | 2 |
Cores/Socket | 128 | 24 |
Threads/Socket | 128 | 48 |
CPU Speed | 3.0 GHz | 2.8 GHz / 3.5 GHz (turbo) |
Memory | 512GB, DDR4, 3200 MHz | 512GB, DDR4, 3200 MHz |
Network Card | 1 x Mellanox CD-6 Dx | 1 x Mellanox CX-6 Dx |
Storage | 4 x Micron 7450 Gen 4 NVME | 4 x ScaleFlux CSD 3010 Gen 4 NVME |
Kernel Version | 4.18.0-348.7.1 | 5.15.0-60-generic |
Operating System | CentOS 8.5 | Ubuntu 22.04 LTS |
Spark Version | 3.3.1 | 3.3.1 |
To compare the performance of the two clusters, two three-node clusters were set up using Dell PowerEdge R650 servers with Intel Ice Lake processors and HPE RL300 with Altra Max processors. Both clusters were running Spark version 3.3.1 and were configured to use Apache Hadoop Yarn 3.3.4. A data set of 3 TB was used to perform benchmarking using both HiBench and TPC-DS benchmarking tools on these clusters. The results were then analyzed to determine the performance and scalability of each cluster under the given workload.
To evaluate cluster performance, we collected the throughput data from HiBench for TeraSort and the total time taken to execute all the 99 SQL queries from TPC-DS on the clusters. We also measured the total power consumption of the clusters using IPMI and Redfish tools. This data was used to compare the performance and power efficiency of the two clusters running Spark with Altra Max and Ice Lake processors.
During our testing, we found that the TeraSort throughput was 18% higher on the Altra Max systems compared to the Intel Ice Lake systems. Additionally, we observed that the TPC-DS queries completed 21% faster for the same dataset size of 3TB on the Altra Max systems. These results indicate that the Altra Max processors provide better performance for big data workloads like Spark, compared to the Intel Ice Lake processors.
The Perf/Watt ratio is an important metric for measuring energy efficiency in data centers. To measure the energy efficiency of each cluster, we calculated the Perf/Watt ratio by dividing TeraSort cluster throughput (MBPS) by cluster power consumed (watts) during the benchmarking interval. We took the time taken and the power consumed to calculate the relative Perf/Watt ratio while running TPC-DS benchmark.
The Altra Max system was observed to have a superior Perf/Watt ratio of around 195% while running TeraSort and 177% while running TPC-DS workloads when compared to Intel Ice Lake systems.
We extended the 3-node cluster data to calculate efficiency at the rack level (42U with 12kW power budget, leaving room for network and other equipment). We found that the HPE RL300 servers with Altra Max processors delivered 89% higher TeraSort throughput compared to Dell PowerEdge R650 servers with Intel Ice Lake processors under the same power budget. In addition, to achieve the same TeraSort throughput, Intel Ice Lake systems required 76% more rack space than Altra Max systems.
Pairing scalability and sustainability are essential tasks for operators of Big Data processing infrastructure.
Our benchmarking efforts indicate that Ampere’s Altra Max CPUs offer notably better performance and consume less power while scaling out infrastructure. While extrapolating the three node cluster data to rack-level data, we found that in order to achieve comparable performance, we would require 64 Intel Ice Lake sockets as opposed to only 30 Altra Max sockets. More importantly, given the rack-level power budget (12kW), the Intel-based deployment would result in a 2nd rack needing to be deployed, which in turn requires its own network, cooling and power management infrastructure. This translates to a significant power savings of 90% with the use of Altra Max servers compared to Intel servers while running Spark workloads.
The key benefits of running Hadoop on Ampere Altra Max processors are:
Increased Throughput: Based on the type of workload, Ampere Cloud Native Processors running Spark exhibit around 20% improvement in throughput compared to legacy x86 servers.
Conserved Rack Space: When it comes to scaling out workloads, Ampere Altra Max's combination of performance and power efficiency enables exceptional performance per rack, especially for high-demand workloads like Spark. Our observations showed that x86 systems required 76% more rack space for delivering the same throughput as Altra Max.
Lower Power Consumption: The use of Ampere Altra Max processors in Spark workloads can result in significant power savings and higher scalability due to their superior performance and power efficiency. In the study conducted, a power savings of 90% was observed at scale compared to traditional x86 servers.
As part of performance benchmarking, we observed run to run variations in the measured throughput. In order to minimize the effects of these variations, we ran each test 3 times and used the geomean of the measured throughput in MBPS and power consumption in watts for our final calculations.
Disclaimer: All data and information contained in or disclosed by this document are for informational purposes only and are subject to change. This document may contain technical inaccuracies, omissions and typographical errors, and Ampere Computing LLC, and its affiliates (“Ampere”), is under no obligation to update or otherwise correct this information. Ampere makes no representations or warranties of any kind, including express or implied guarantees of noninfringement, merchantability or fitness for a particular purpose, regarding the information contained in this document and assumes no liability of any kind. Ampere is not responsible for any errors or omissions in this information or for the results obtained from the use of this information. All information in this presentation is provided “as is”, with no guarantee of completeness, accuracy, or timeliness.