Company
Solutions
Search
EN
EN
EnglishChinese
Ampere Computing Logo
Solutions
Solutions Home
SolutionsCloud Native SolutionsBriefs OverviewTutorials OverviewTuning Guides OverviewWhere to Try
Developers
Developers CenterDesigning Cloud ApplicationsBuilding Cloud ApplicationsDeploying Cloud ApplicationsUsing Your DataAmpere Ready SoftwareWorking with Open SourceCommunity Forum
Search
Significant advantages in executing CV tasks on edge devices from Ampere.

Hadoop on Azure Brief

Solution Brief Jan 2023

Ampere - Empowering What’s Next
Key Benefits
What it Enables
Ampere Altra
Hadoop on Microsoft Azure Ampere VMs
Benchmarking Configuration
VM and Hadoop Configuration
Performance Data
Observations
Conclusions
Footnotes
Ampere - Empowering What’s Next

Microsoft Azure VMs offers Ampere® Altra® general purpose Dplsv5, Dpldsv5 and memory optimized Epsv5 virtual machines on the new Cloud Native Ampere platform. Azure VMs based on Ampere processors are available up to 64 Arm cores and 208 GB of memory, and they offer several key benefits such as deterministic performance, linear scalability, and the best price-performance in the market. These new virtual machines have been engineered to efficiently run scale-out and cloud-native workloads. 

The Apache Hadoop software framework is designed for distributed processing of large data sets, and it is designed to scale out from a single server to thousands of machines, each offering local computation or storage or both. To optimize cluster deployments, the software has built-in resiliency to handle individual server or component (PCIe cards, SSDs, etc.) failures. It consists of four main modules: HDFS, YARN, Map Reduce and Hadoop Common. Applications collect data in various formats and seed it to the cluster. The name node has metadata information for all these chunks of data. A MapR job runs against this data in HDFS across data nodes.

All the above tasks are computationally intensive. The data must be (1) pulled from HDFS, which demands a high-performance storage; it (2) must be coordinated across different computers, which demands a high-speed network; it (3) must be quickly processed by thousands of tasks; and finally, it must be (4) aggregated by reducers to organize the final output.

Ampere Altra-powered instances in Azure VMs provide the optimal platform for tackling the ever-growing big data challenges found in modern enterprise environments.

Key Benefits

Cloud Native: Designed from the ground up for ‘born in the cloud’ workloads, Ampere Altra can deliver much higher price-performance over its x86 peers.

Consistency and Predictability: Ampere Altra processors provide consistent and predictable performance of Hadoop solutions , inclusive of burstable workloads.

Scalable: With an innovative scale-out architecture, Ampere Altra processors offer industry leading core counts with compelling single-threaded performance. All cores run on a consistent frequency to allow big data workloads to scale up and scale out efficiently.

Power Efficient: Industry-leading energy efficiency allows Ampere Altra processors to hit competitive levels of raw performance while consuming much lower power than the competition.

What it Enables

  • Consistent, predictable high performance, especially at high loads
  • Much higher resistance to noisy neighbors in multitenant environments
  • TCO Savings and reduced carbon footprint
Ampere Altra
  • 80 64-bit CPU cores up to 3.00 GHz
  • 64 KB L1 I-cache, 64 KB L1 D-cache per core
  • 1 MB L2 cache per core
  • 32 MB System Level Cache (SLC)
  • 2x full-width (128b) SIMD
  • Coherent mesh-based interconnect

Memory

  • 8x 72-bit DDR4-3200 channels
  • ECC and DDR4 RAS
  • Up to 16 DIMMs and 4 TB addressable memory

Connectivity

  • 128 lanes of PCIe Gen4
  • Coherent multi-socket support
  • 4 x16 CCIX lanes

Technology & Functionality

  • Arm v8.2+, SBSA Level 4
  • Advanced Power Management

Performance

  • SPECrate®2017 Integer Estimated: 300
Hadoop on Microsoft Azure Ampere VMs

Ampere's Arm technology has higher core density per socket, maximizing the number of cores per rack. Its power-efficient design allows for reduced power consumption while its single threaded architecture provides consistent, predictable performance for large data processing tasks.

The Azure Ampere Arm-based virtual machine families include:

  1. Dpsv5 series, with up to 64 vCPUs and up to 128 GiB
  2. Dplsv5 series, with up to 64 vCPUs and up to 128 GiB
  3. Epsv5 series, with up to 32 vCPUs and up to 208 GiB

All these virtual machine sizes support up to 40 Gbps of networking bandwidth, standard HDDs, standard or premium SSDs and Ultra Disk storage that can be attached to the VMs.

Ampere based Azure VMs provide superior price-performance for big data applications when compared to its x86 peers. These VMs are the perfect choice for Hadoop applications due to the predictable and highly scalable nature of the architecture.

In this Solution Brief, we contrast 3 Azure VMs, each featuring comparable CPUs from Intel, AMD and Ampere.

Benchmarking Configuration

We used Intel HiBench benchmarking tool, and ran Hadoop TeraSort benchmark on the following three Azure VMs:

  1. D16sv5 (Intel Icelake Platinum 8370C)
  2. D16adsv5 (AMD Milan EPYC 7763v)
  3. D16psv5 (Ampere Altra Q80)

TeraGen was used to generate a dataset of 250GB, and then the data was sorted using TeraSort capturing throughput in MB/s.

  • All the virtual machines had an identical configuration on CPU cores/threads, memory and storage.

  • The storage size was chosen to limit the bandwidth to 1000 MB/s across all the VMs.

  • Transparent huge pages were disabled on the guest operating system.

  • Few configuration parameters in Hadoop were tuned to maximize the utilization of CPU, memory and storage.

VM and Hadoop Configuration
D16s v5 D16ads v5 D16ps v5
vCPU161616
Cores8816
Mem64G64G64G
Archx86_64x86_64aarch64
KernelUbuntu 22.04Ubuntu 22.04Ubuntu 22.04
Storage4 x 1024 GB (P40 Performance tier), totaling 1000 MB/s throughput 4 x 1024 GB (P40 Performance tier), totaling 1000 MB/s throughput 4 x 1024 GB (P40 Performance tier), totaling 1000 MB/s throughput
JDKOracle JDK 8u345Oracle JDK 8u345Oracle JDK 8u345

Yarn Configuration

dfs.block.size256M
yarn.scheduler.minimum-allocation-mb1024
yarn.scheduler.maximum-allocation-mb59392
yarn.scheduler.minimum-allocation-vcores1
yarn.scheduler.maximum-allocation-vcores15
yarn.nodemanager.resource.cpu-vcores16
yarn.nodemanager.resource.memory-mb63488
mapreduce.map.memory.mb2048
mapreduce.reduce.memory.mb3072
mapred.reduce.parallel.copies16
mapreduce.reduce.shuffle.parallelcopies14
mapreduce.map.java.opts2048M
Performance Data

The relative performance data captured on the Azure VMs with Hadoop on Yarn is shown below.

  • for illustration purposes, the Intel VM (D16sv5) was used as the reference point (100%)

Relative Hadoop TeraSort Performance on Azure
Relative Hadoop TeraSort Price Performance on Azure
Observations
  • Ampere VMs performed well compared to Intel and AMD VMs.
  • Ampere VMs deliver significantly better price performance than Intel and AMD VMs.

(VM pricing calculated with Azure’s public pricing calculator)

Conclusions

Azure VMs featuring Ampere Altra processors provide better performance for Big Data applications like Hadoop. Hadoop and MapReduce frameworks benefit from the linear scale out architecture. Ampere Altra processors scale up linearly with workload demands, hence making them an ideal choice for Big Data projects.

Ampere-based Azure VMs delivers 5% more performance and over 22% more price performance than its x86 competitors for Hadoop workloads.

We look forward to helping our customers discuss their unique needs.

For more information, please visit:

Footnotes

All data and information contained herein is for informational purposes only and Ampere reserves the right to change it without notice. This document may contain technical inaccuracies, omissions and typographical errors, and Ampere is under no obligation to update or correct this information. Ampere makes no representations or warranties of any kind, including but not limited to express or implied guarantees of noninfringement, merchantability, or fitness for a particular purpose, and assumes no liability of any kind. All information is provided “AS IS.” This document is not an offer or a binding commitment by Ampere. Use of the products contemplated herein requires the subsequent negotiation and execution of a definitive agreement or is subject to Ampere’s Terms and Conditions for the Sale of Goods.

System configurations, components, software versions, and testing environments that differ from those used in Ampere’s tests may result in different measurements than those obtained by Ampere.

Price performance was calculated using Microsoft's Virtual Machines Pricing, in September of 2022. Refer to individual tests for more information.

©2022 Ampere Computing. All Rights Reserved. Ampere, Ampere Computing, Altra and the ‘A’ logo are all registered trademarks or trademarks of Ampere Computing. Arm is a registered trademark of Arm Limited (or its subsidiaries). All other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

Ampere Computing® / 4655 Great America Parkway, Suite 601 / Santa Clara, CA 95054 / amperecomputing.com

Ampere Computing

4655 Great America Parkway

Suite 601 Santa Clara, CA 95054

image
image
image
image
© 2023 Ampere Computing LLC. All rights reserved. Ampere, Altra and the A and Ampere logos are registered trademarks or trademarks of Ampere Computing.
This site is running on Ampere Altra Processors.