Customer reference board (CRB) platforms from Ampere

Spark Workload Brief

with Ampere Altra 128 Core Processors

Greater Efficiency on Spark with Ampere® Altra® Processors

In this solution brief, we provide an analysis of various benchmarks run on Spark using Ampere Altra 128 core processors and Intel x86 processors. The benchmarks Spark TeraSort and derived from Spark TPC-DS* encompass power and performance metrics as well as rack-level efficiency in a datacenter.

We observed that servers equipped with Ampere processors outperformed x86 systems in both raw TeraSort throughput and Performance/Watt (Perf/Watt) at the rack level by 18% and 95% respectively. This performance advantage results in fewer racks needed to operate Big Data services, such as Spark, at scale. By using fewer racks, the overall datacenter footprint is reduced, leading to fewer infrastructure components like servers, switches, and cables. This reduction results in a decrease in the square footage, cooling, water, and other supporting resources required to maintain the datacenter. This efficiency improvement enables datacenter operations to achieve their PUE (Power Usage Effectiveness) and other SLA (Service Level Agreement) objectives.

Spark with Ampere Altra 128 core processors

Ampere processors are engineered to offer a higher number of cores per socket, thereby maximizing the number of cores per rack. Clusters leveraging Ampere Altra processors gain an advantage from the power-optimized design, which reduces power consumption and ensures predictable performance for big data applications and other data lake technologies. Additionally, the processors are designed to deliver exceptional energy efficiency, thus resulting in industry-leading Perf/Watt at the individual server level and Performance per Rack (Perf/Rack) at scale.

Apache Spark is an open-source distributed processing system designed for big data workloads. Spark overcomes the limitations of Hadoop by performing in-memory processing and using Resilient Distributed Datasets (RDD). It offers APIs in Java, Scala, and Python, supporting a variety of real-time analytic workloads, batch processing, interactive queries, and machine learning. As such, Apache Spark is ideally complemented by the Ampere Altra family of processors and lends itself well to the density and performance advantages of Cloud Native Processors.

Benchmarking Configuration

To evaluate the performance and scalability of Spark, we used both HiBench and derived from TPC-DS benchmarking tools. We compared the performance data collected from two three-node clusters, one equipped with Ampere Altra processors and the other with Intel Ice Lake processors. We assessed the rack-level performance to gain a better understanding of overall system efficiency as operators scale out their infrastructure.

The Spark benchmark derived from TPC-DS is a decision support benchmark that models several aspects of a decision support system, including data maintenance and queries. All the 99 queries were derived from TPC-DS 2.4. The tpcds-kit was cloned from data bricks github site. Spark 3.3.1 was run in yarn mode with a Scale Factor of 3000 (SF 3000), and the time taken to execute all the 99 SQL queries were captured between the two systems.

On the other hand, HiBench is a comprehensive big data benchmark suite that helps to evaluate different big data frameworks in terms of speed, throughput, and system resource utilization. The benchmarks in HiBench cover a range of tasks, including data generation, data preprocessing, data analytics, and machine learning, making it a useful tool for evaluating the performance of big data processing systems in different scenarios.

Equipment Under Test

Architecture	Ampere Altra 128 cores	x86_64
Make & Model	HPE RL300	Dell PowerEdge R650
Cluster Nodes	3	3
CPU	Ampere M128-30	Intel Ice Lake Xeon SP 6342
Sockets/Node	1	2
Cores/Socket	128	24
Threads/Socket	128	48
CPU Speed	3.0 GHz	2.8 GHz / 3.5 GHz (turbo)
Memory	512GB, DDR4, 3200 MHz	512GB, DDR4, 3200 MHz
Network Card	1 x Mellanox CD-6 Dx	1 x Mellanox CX-6 Dx
Storage	4 x Micron 7450 Gen 4 NVME	4 x ScaleFlux CSD 3010 Gen 4 NVME
Kernel Version	4.18.0-348.7.1	5.15.0-60-generic
Operating System	CentOS 8.5	Ubuntu 22.04 LTS
Spark Version	3.3.1	3.3.1

Performance tests on a 3 Node cluster (Ampere Altra vs Intel Ice Lake)

Spark TeraSort Relative Performance (3 Node Cluster)

Spark TeraSort Relative Perf/Watt (3 Node Cluster)

Relative Performance of 3 Node Cluster (Derived from TPC-DS Workload)

Relative Perf/Watt of 3 Node Cluster (Derived from TPC-DS Workload)

To compare the performance of the two clusters, two three-node clusters were set up using Dell PowerEdge R650 servers with Intel Ice Lake processors and HPE RL300 with Ampere Altra 128 core processors. Both clusters were running Spark version 3.3.1 and were configured to use Apache Hadoop Yarn 3.3.4. A data set of 3 TB was used to perform benchmarking using both HiBench and derived from TPC-DS benchmarking tools on these clusters. The results were then analyzed to determine the performance and scalability of each cluster under the given workload.

To evaluate cluster performance, we collected the throughput data from HiBench for TeraSort and the total time taken to execute all the 99 SQL queries derived from TPC-DS on the clusters. We also measured the total power consumption of the clusters using IPMI and Redfish tools. This data was used to compare the performance and power efficiency of the two clusters running Spark with Amper Altra and Ice Lake processors.

During our testing, we found that the TeraSort throughput was 18% higher on the Ampere Altra 128 core systems compared to the Intel Ice Lake systems. Additionally, we observed that the queries derived from TPC-DS completed 21% faster for the same dataset size of 3TB on the Ampere Altra 128 core systems. These results indicate that the Ampere Altra 128 core processors provide better performance for big data workloads like Spark, compared to the Intel Ice Lake processors.

The Perf/Watt ratio is an important metric for measuring energy efficiency in data centers. To measure the energy efficiency of each cluster, we calculated the Perf/Watt ratio by dividing TeraSort cluster throughput (MBPS) by cluster power consumed (watts) during the benchmarking interval. We took the time taken and the power consumed to calculate the relative Perf/Watt ratio while running the benchmark derived from TPC-DS.

The Ampere Altra 128 core system was observed to have a superior Perf/Watt ratio of around 195% while running TeraSort and 177% while running workloads derived from TPC-DS when compared to Intel Ice Lake systems.

Simulated Spark TeraSort at Rack Level

Spark: Relative Performance at Rack Level

We extended the 3-node cluster data to calculate efficiency at the rack level (42U with 12kW power budget, leaving room for network and other equipment). We found that the HPE RL300 servers with Ampere Altra 128 core processors delivered 89% higher TeraSort throughput compared to Dell PowerEdge R650 servers with Intel Ice Lake processors under the same power budget. In addition, to achieve the same TeraSort throughput, Intel Ice Lake systems required 76% more rack space than Ampere Altra 128 core systems.

Key Findings and Conclusions

Spark TeraSort.jpg

Pairing scalability and sustainability are essential tasks for operators of Big Data processing infrastructure.

Our benchmarking efforts indicate that Ampere’s Altra 128 core CPUs offer notably better performance and consume less power while scaling out infrastructure. While extrapolating the three node cluster data to rack-level data, we found that in order to achieve comparable performance, we would require 64 Intel Ice Lake sockets as opposed to only 30 Ampere Altra 128 core sockets. More importantly, given the rack-level power budget (12kW), the Intel-based deployment would result in a 2nd rack needing to be deployed, which in turn requires its own network, cooling and power management infrastructure. This translates to a significant power savings of 90% with the use of Ampere Altra 128 cores servers compared to Intel servers while running Spark workloads.

About Ampere and Spark on Ampere processors

The key benefits of running Hadoop on Ampere Altra 128 core processors are:

Increased Throughput: Based on the type of workload, Ampere Cloud Native Processors running Spark exhibit around 20% improvement in throughput compared to legacy x86 servers.
Conserved Rack Space: When it comes to scaling out workloads, Ampere Altra 128 core combination of performance and power efficiency enables exceptional performance per rack, especially for high-demand workloads like Spark. Our observations showed that x86 systems required 76% more rack space for delivering the same throughput as Ampere Altra 128 core.
Lower Power Consumption: The use of Ampere Altra 128 core processors in Spark workloads can result in significant power savings and higher scalability due to their superior performance and power efficiency. In the study conducted, a power savings of 90% was observed at scale compared to traditional x86 servers.

Footnotes

As part of performance benchmarking, we observed run to run variations in the measured throughput. In order to minimize the effects of these variations, we ran each test 3 times and used the geomean of the measured throughput in MBPS and power consumption in watts for our final calculations.

*The workload in this brief is derived from the TPC-DS Benchmark and as such is not comparable to published TPC-DS results, as the workload results do not comply with the TPC-DS Specification.

Disclaimer: All data and information contained in or disclosed by this document are for informational purposes only and are subject to change. This document may contain technical inaccuracies, omissions and typographical errors, and Ampere Computing LLC, and its affiliates (“Ampere”), is under no obligation to update or otherwise correct this information. Ampere makes no representations or warranties of any kind, including express or implied guarantees of noninfringement, merchantability or fitness for a particular purpose, regarding the information contained in this document and assumes no liability of any kind. Ampere is not responsible for any errors or omissions in this information or for the results obtained from the use of this information. All information in this presentation is provided “as is”, with no guarantee of completeness, accuracy, or timeliness.

© 2026 Ampere Computing LLC. All rights reserved. Ampere, Ampere Computing, Altra, Altra Max, AmpereOne and the Ampere logo are all trademarks of Ampere Computing LLC or its affiliates. TPC Benchmark is a trademark of the TPC, and copying is by permission of the Transaction Processing Performance Council. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.