Hadoop on OCI Workload Brief
Big Data Hadoop on Oracle Cloud Ampere A1 instance
Oracle Cloud Infrastructure (OCI) offers Ampere® Altra® compute instances on the new Cloud Native Ampere A1 platform. The Ampere A1 platform can be deployed as bare metal servers or flexible VM shapes, giving customers full control of their entire cloud stack. The Ampere A1 VM shapes provide flexible sizing from 1-80 cores and 1-64 GB of memory per core, along with several key benefits such as deterministic performance, linear scalability, and a secure architecture with the best price-performance in the market.
The Apache Hadoop framework is designed for distributed processing of large data sets intended to scale out from a single server to thousands of machines, each offering local computation, storage, or both. When implemented in a cluster, the software has built-in resiliency to handle a failed server or a failed component in a server. Hadoop consists of four main modules, HDFS (Hadoop Distributed File System), YARN (Yet Another Resource Negotiator), Map Reduce and Hadoop Common. Applications collect data in various formats and seed it to the cluster. The name node, which is the center piece of HDFS file system, has metadata information of all chunks of data and keeps the directory tree of all files in the file system and tracks where across the cluster the file data is kept. A MapR (Map Reduce) job runs against this data in HDFS across data nodes.
All the above tasks are computationally intensive, and the entire cluster is better implemented on high performance components. The data pulled from HDFS, demands high-performance storage, is coordinated across different servers in the cluster, demanding a high-speed network and must be quickly processed by thousands of tasks until ultimately aggregated by reducers to compose the final output.
Oracle Cloud Infrastructure uses Ampere Altra processors with an industry leading 80 cores per CPU for the Ampere A1 shapes. All cores are capable of running at the maximum frequency of 3.0 GHz consistently. Utilizing Ampere low power design and OCIs high performance infrastructure, Ampere A1 shapes offer the best price-performance in the cloud.
OCI’s A1 compute provides superior price-performance for big data applications when compared to its x86 peers. A1 shapes with Ampere Arm processors are a recommended choice for Hadoop applications due to the predictable and highly scalable nature of the architecture.
In this solution brief, we compare the performance of OCI A1 (Ampere Altra) VM’s with OCI’s S3 Standard (Intel Icelake), E3 (AMD Rome) and E4 (AMD Milan) flex VM’s running Hadoop TeraSort.
Consistency and Predictability: Ampere Altra processors are designed for cloud native usage, providing consistent and predictable performance for Hadoop solutions.
Scalable: With an innovative scale-out architecture, Ampere Altra processor’s high core count and compelling single-threaded performance combined with consistent frequency on all cores delivers up to 15-20% better performance on OCI Ampere AI compute shapes making them ideal for big data workloads.
Power Efficient: Industry-leading energy efficiency allows Ampere Altra processors to hit competitive levels of raw performance while consuming much lower power than the competition and hence a lower carbon footprint.
Technology & Functionality
Virtual machines were provisioned in a private network space as depicted above. Hadoop 3.3.1 (with aarch64 binaries) was installed on the test bed. We used Intel HiBench benchmark tool on each of these VM’s to generate a 250GB dataset. Hadoop TeraSort benchmark was run on these VM’s to capture throughput measured in MB/s.
VM and Hadoop Configuration
|Kernel||Oracle Linux 8.5|
|Storage||iSCSi 2 x 500G luns, VPU 50, 2 x 480 MBPS|
|JDK||Oracle JDK 8 EPP|
Hadoop and Yarn Configuration
dfs.block.size - 256M
yarn.scheduler.minimum-allocation-mb - 1024
yarn.scheduler.maximum-allocation-mb - 65536
yarn.scheduler.minimum-allocation-vcores - 1
yarn.scheduler.maximum-allocation-vcores - 15
yarn.nodemanager.resource.cpu-vcores - 16
yarn.nodemanager.resource.memory-mb - 94208
mapreduce.map.memory.mb - 1024
mapreduce.reduce.memory.mb - 3072M
mapred.reduce.parallel.copies - 16
mapreduce.reduce.shuffle.parallelcopies - 16
mapreduce.map.java.opts - 2048M
Intel HiBench benchmark tool was used on each of the VM’s to generate a 250GB dataset. Hadoop TeraSort benchmark was run on these VM’s and the TeraSort output in MBPS was captured.
The CPU utilization was hovering around 80% making this a fair comparison under high load conditions.
The disk utilization of iSCSI LUNs was around 90%, also near capacity.
A1 VM’s performed well compared to the legacy x86 shapes. The above graphs were plotted by taking s3flex as the baseline reference point.
Ampere A1 instances price per performance was observed to be 60% better than Intel and 10-15% better than AMD shapes.
Note: Price-performance was calculated from OCI Compute pricing list, for 16 core VM’s and 96G Memory (Oct 2022). Storage Costs were calculated from OCI Storage pricing sheet for 2x500GB iSCSI luns at 50 VPU ( 480 MB/s).
Oracle OCI A1 instances with Ampere Altra processors provide high performance for big data solutions like Hadoop. The performance advantage on the Ampere shapes combined with the price advantage provides a up to 60% higher value when using OCI Ampere A1 shapes for Hadoop workloads.
For More Information
All data and information contained herein is for informational purposes only and Ampere reserves the right to change it without notice. This document may contain technical inaccuracies, omissions and typographical errors, and Ampere is under no obligation to update or correct this information. Ampere makes no representations or warranties of any kind, including but not limited to express or implied guarantees of noninfringement, merchantability, or fitness for a particular purpose, and assumes no liability of any kind. All information is provided “AS IS.” This document is not an offer or a binding commitment by Ampere. Use of the products contemplated herein requires the subsequent negotiation and execution of a definitive agreement or is subject to Ampere’s Terms and Conditions for the Sale of Goods.
System configurations, components, software versions, and testing environments that differ from those used in Ampere’s tests may result in different measurements than those obtained by Ampere.
©2022 Ampere Computing. All Rights Reserved. Ampere, Ampere Computing, Altra and the ‘A’ logo are all registered trademarks or trademarks of Ampere Computing. Arm is a registered trademark of Arm Limited (or its subsidiaries). All other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
Ampere Computing® / 4655 Great America Parkway, Suite 601 / Santa Clara, CA 95054 / amperecomputing.com