Apache Kafka Solution Brief
with Ampere Altra Max Processors
In this solution brief, we present an analysis of performance tests conducted on Kafka using Ampere Altra® Max® processors and Intel x86 processors. The tests include both Producer and Consumer scenarios, evaluating power and performance metrics and rack-level efficiency within a datacenter.
Our findings revealed that servers equipped with Ampere processors outperformed x86 systems in terms of Throughput and Performance/Watt (Perf/Watt) at the rack level. Specifically, the Ampere processors demonstrated an 8% higher throughput per server and a notable 97% improvement in Perf/Watt. This performance advantage translates to a significantly lower number of racks required to operate Kafka effectively. With fewer racks needed, the overall footprint of the datacenter is minimized, resulting in fewer infrastructure components such as servers, switches, and cables. Consequently, this reduction leads to decreased square footage, cooling, water, and other resources necessary for maintaining the datacenter, which drives Capex and Opex savings. By improving efficiency, datacenter operators can achieve their objectives regarding Power Usage Effectiveness (PUE), carbon footprint, and other Service Level Agreement (SLA) requirements.
Ampere technology packs more cores per socket, maximizing the number of cores per rack. This engineering approach provides servers utilizing Ampere Altra Max processors with a distinct advantage. The power-optimized design of these processors not only reduces power consumption but also ensures consistent and reliable performance for applications like Kafka. Additionally, Ampere processors excel in energy efficiency, resulting in industry-leading Performance/Watt at the individual server level and outstanding Performance per Rack (Perf/Rack) at scale.
Apache Kafka is a powerful real-time data streaming technology that excels in handling high volumes of events per second. It serves as an ideal solution for applications requiring high-performance data pipelines, streaming analytics, data integration, and mission-critical operations. Kafka boasts impressive throughput, built-in partitioning, replication capabilities, and inherent fault-tolerance, making it highly suitable for large-scale message processing applications. Functioning as a distributed publish-subscribe system queue, Kafka efficiently manages and processes vast amounts of data. It caters to both online and offline message consumption scenarios, persisting messages on disk and replicating them within the cluster to prevent data loss.
To evaluate the performance and scalability of Kafka, we conducted tests using the producer and Consumer scripts provided with Kafka. The performance data was collected from two nodes: one being a Dell PowerEdge R650 server equipped with Intel Ice Lake processors, and the other being an HPE ProLiant RL300 server equipped with Altra Max processors. We assessed the rack-level performance to gain a better understanding of the overall system efficiency as operators scale out their infrastructure.
For measuring the throughput and latency of writing and reading events in Kafka, we employed single producer tests. These tests were conducted on a Bare Metal setup with brokers running on containers. We utilized average and 99th percentile latency metrics to gauge the throughput of a single producer. Additionally, we recorded the throughput of multiple Producers on different containers while maintaining an average latency of 2-3 milliseconds.
|Make & Model||HPE ProLiant RL300||Dell PowerEdge R650|
|CPU||Ampere Altra Max M128-30||Intel Ice Lake Xeon SP 6342|
|CPU Speed||3.0 GHz||2.8 GHz / 3.5 GHz (turbo)|
|Memory||512GB, DDR4, 3200 MHz||512GB, DDR4, 3200 MHz|
|Network Card||1 x Mellanox CD-6 Dx||1 x Mellanox Cx-6 Dx|
|Storage||4 x Micron 7450 Gen 4 NVME||4 x ScaleFlux CSD 3010 Gen 4 NVME|
|Operating System||CentOS 8.5||Ubuntu 22.04 LTS|
|Client Nodes||2 x HPE ProLiant RL300||2 x Dell PowerEdge R650|
Single Producer Tests
During the single producer tests, we explored different record sizes, ranging from 100 to 800 bytes and measured the output in MB/s for varying record sizes. It was noteworthy that the average latency remained below the 2 millisecond Service Level Agreement (SLA). As we increased the record size, we observed a corresponding increase in the producer's throughput, reaching close to 100 MB/s.
Multiple Producer Tests
We deployed multiple containers on both systems, with each container serving as a broker. Initially, we created numerous topics using the kafka-topics.sh tool, with a replication factor of 2 and 32 partitions across each broker. Subsequently, we executed the kafka-producer-perf-test.sh script to generate 100 million records, each with a record size of 100 bytes. To determine the overall throughput of all brokers on the machine, we aggregated the output of these records while maintaining SLAs around 2-3 milliseconds of average latencies for calculating the total throughput from the system. To find the optimal balance, we conducted experiments using 8-16 containers. While the above graphs are plotted for 8 containers, the system's throughput remained constant throughout the experiments.
During our observations, we found that HPE RL300 systems equipped with Ampere Altra Max processors exhibited an 8% improvement in raw throughput compared to Dell PowerEdge R650 servers equipped with Intel Ice Lake processors. Evaluating energy efficiency in data centers, the Perf/Watt ratio serves as a crucial metric. We calculated the Perf/Watt ratio by dividing the producer’s throughput (MBPS) across all brokers by the total platform power consumed (watts) during the benchmarking interval. When running the Producer Throughput tests, the Altra Max system demonstrated a significantly superior Perf/Watt ratio of approximately 197% compared to Intel Ice Lake systems.
Producer and Consumer Tests
To simulate real-life scenarios, we performed comprehensive tests involving both Kafka Producers and Consumers. To achieve optimal results, we carefully balanced and optimized the number of Producers and Consumers to maximize the system's performance. During these tests, the Altra Max systems consistently outperformed x86 systems. While running these tests, Altra Max systems performed better by 11% on producer tests and 18% on Consumer tests when compared to Intel Ice Lake systems.
We extended results obtained from both the test beds to calculate efficiency at the rack level (42U with 12kW power budget, leaving room for network and other equipment). We found that the HPE RL300 servers with Altra Max processors delivered 71% higher throughput compared to Dell PowerEdge R650 servers with Intel Ice Lake processors under the same power budget and limited by the same latency SLAs.
Furthermore, it is worth noting that the Intel Ice Lake systems required 58% more rack space than the Altra Max systems to achieve the same level of throughput. This finding emphasizes the density of the HPE RL300 servers and their ability to maximize performance within a given rack configuration.
When it comes to scaling Data Center Infrastructure, striking a balance between scalability and sustainability is crucial. Based on our benchmarking efforts, we have observed that the Ampere Altra Max CPUs offer remarkable performance improvements while consuming less power during infrastructure scaling.
By extrapolating our data to the rack level, we discovered that achieving comparable performance would necessitate the deployment of 66 Intel Ice Lake sockets, whereas only 30 Altra Max sockets would be required. This difference highlights the superior scalability of Altra Max CPUs.
Moreover, considering the power budget of the rack (12kW), deploying Intel-based infrastructure would require an additional rack with its own network, cooling, and power management infrastructure. In contrast, the use of Altra Max servers running Kafka Producer workloads results in significant power savings of 56%. This not only reduces power consumption but also eliminates the need for additional rack deployment, thereby promoting sustainability and optimizing resource utilization.
The key benefits of running Kafka on Ampere Altra Max processors are:
Increased Throughput: Ampere Cloud Native Processors running Kafka exhibit around 8% improvement in raw throughput compared to legacy x86 servers.
Conserved Rack Space: Ampere Altra Max processors offer a compelling combination of performance and power efficiency, resulting in exceptional performance per rack, for resource-intensive workloads like Kafka. Our observations indicate that x86 systems require 58% more rack space to achieve the same level of throughput as Altra Max.
Lower Power Consumption: The use of Ampere Altra Max processors in Kafka workloads can result in significant power savings and higher scalability due to their superior performance and power efficiency. In the study conducted, a power savings of 56% was observed at scale compared to traditional x86 servers.
As part of performance benchmarking, we observed run to run variations in the measured throughput. minimize the effects of these variations, we ran each test 3 times and used the geomean of the measured throughput in MBPS and power consumption in watts for our final calculations.
Disclaimer: All data and information contained in or disclosed by this document are for informational purposes only and are subject to change. This document may contain technical inaccuracies, omissions and typographical errors, and Ampere Computing LLC, and its affiliates (“Ampere”), is under no obligation to update or otherwise correct this information. Ampere makes no representations or warranties of any kind, including express or implied guarantees of noninfringement, merchantability or fitness for a particular purpose, regarding the information contained in this document and assumes no liability of any kind. Ampere is not responsible for any errors or omissions in this information or for the results obtained from the use of this information. All information in this presentation is provided “as is”, with no guarantee of completeness, accuracy, or timeliness.