Solutions with Ampere Cloud Native Processors

Elastic Search on OCI Workload Brief

Benefits of running Elasticsearch on OCI with Ampere
Sept, 2022

Ampere—Empowering What’s Next

The Ampere Altra processors are complete system-on-chip (SOC) solutions built for cloud native applications. Ampere Altra supports up to 80 aarch64 cores. In addition to incorporating many high-performance cores, the innovative architecture delivers predictable high performance, linear scaling, and high energy efficiency.

Oracle Cloud Infrastructure (OCI) offers Ampere Altra at an attractive price point of $0.01 per core hour, with flexible sizing from 1-80 OCPUs and 1-64 GB of memory per core. The OCI Ampere Altra A1 Compute Platform provides deterministic performance, linear scalability, and a secure architecture with the best price-performance in the market.

Elasticsearch is a distributed, open search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. Elasticsearch is built on Apache Lucene by Elastic. Elastic added support for a pre-built binary on aarch64 that can be downloaded from here.

How does Elasticsearch work? Raw data such as geospatial data, logs, or web data is inserted into Elasticsearch. The data is then formatted, enriched and eventually indexed such that it is optimal to retrieve. Once it is indexed in Elasticsearch, users can run complex queries against their data in order to retrieve specific data summaries. Given the vast amount of data we have today, optimized indexing and searching using Elasticsearch is critical to access meaningful data.

In this workload brief, we compare Elasticsearch on Ampere Altra based OCI instances against the latest generation Intel and AMD instances.

Elasticsearch on Ampere Altra instances in OCI

Ampere Altra processors are designed to deliver exceptional performance for cloud native applications like Elasticsearch. They do so by using an innovative architectural design, operating at consistent frequencies, and using single-threaded cores that make applications more resistant to noisy neighbor issues. This allows workloads to run in a predictable manner with minimal variance under increasing loads.

The processors are also designed to deliver exceptional energy efficiency. This translates to industry leading performance/watt capabilities and a lower carbon footprint.

Benefits of Running Elastic Search on Ampere Altra

Cloud Native: Designed from the ground up for cloud customers, Ampere Altra processors are ideal for cloud native usages such as Elasticsearch
Scalable: With an innovative scale- out architecture, Ampere Altra processors have a high core count with compelling single-threaded performance combined with consistent frequency for all cores delivering greater performance at socket level.
Power Efficient: Industry-leading energy efficiency allows Ampere Altra processors to hit competitive levels of raw performance while consuming much lower power than the competition.

Ampere Altra

80 64-bit CPU cores up to 3.30 GHz
64 KB L1 I-cache, 64 KB L1 D-cache per core
1 MB L2 cache per core
32 MB System Level Cache (SLC)
2x full-width (128b) SIMD
Coherent mesh-based interconnect

Memory

8x 72-bit DDR4-3200 channels
ECC and DDR4 RAS
Up to 16 DIMMs and 4 TB addressable memory

Connectivity

128 lanes of PCIe Gen4
Coherent multi-socket support
4 x16 CCIX lanes

Technology & Functionality

Arm v8.2+, SBSA Level 4
Advanced Power Management

Performance

SPECrate®2017 Integer Estimated: 300

Benchmarking Configuration

We used Rally (download here) as a load generator for benchmarking Elasticsearch. Rally provides several tracks which are different benchmarking scenarios consisting of varied dataset types, eg: http_logs – consists of HTTP server log data, NYC taxis – consists of Taxi rides in New York in the year 2015 etc. Each track caters to different application data types, upon selecting the right data type, we can choose the challenge such append only or index and append. Verify this makes sense

We recommend using the latest Elasticsearch pre-built by Elastic for aarch64. We used Oracle Linux 7.9 on OCI (kernel 5.4) with Elasticsearch 7.17 for our tests. For each of the tests, we used similar client machines as load generators for Elasticsearch.

We recommend using the latest JDK compiled with GCC (GNU Compiler Collection) 10.2, or newer as newer compilers have made significant progress towards generating optimized code that can improve performance for aarch64 processors. For this test we used JDK 17, build with GCC 10.2.

We ran the test under two system setups. First, we compared single node Elasticsearch server on Ampere Altra, AMD Milan and Intel Icelake virtual machines on OCI. Second, we compared a 3 node Elasticsearch cluster on Ampere Altra, AMD Milan and Intel Icelake virtual machines on OCI. For the single node tests, we performed tests on two VM sizes, one with 2 logical threads and one with 4 logical threads.

In the tests with 2 logical threads, the virtual machine had 8GB RAM, out of which 4GB was allocated to the jvm. In the tests with 4 logical threads, the virtual machine had 16GB RAM, out of which 8GB was allocated to the jvm. We used G1GC as our garbage collector for all the tests. Block volume with an IOPS of 75,000 was used for storing the data for all tests.

Here is our esrally test command line example, here we test the track pmc –

esrally race --track=pmc --target-hosts=<private ip of instance on OCI>:9200 --pipeline=benchmark-only --challenge=append-no-conflicts-index-only

We used several tracks for stressing Elasticsearch, each track has a different dataset/datatype that it supports. Below is a summary of each dataset –

http_logs – contains HTTP server log data
PMC – is a full text benchmark comprising of academia papers from PubMed Central® (PMC)
Nested – contains nested documents from StackOverflow Q&A
Geonames – consists of points of interests from the Geonames geographical database
nyc_taxis – consists of Taxi rides in New York in the year 2015

Each test ran three times, the median of the results is used in the report below.

Benchmarking Results

Fig 1. Single Node - Throughput @ 2 Logical Threads

Fig 2. Single Node - Throughput @ 4 Logical Threads

Fig 3. Elasticsearch 3 Node Cluster - Throughput

Benchmarking Conclusions

As seen in Figure 1, we observed up to 43% higher throughput on Ampere Altra virtual machines compared to AMD Milan virtual machines and 30% higher compared to Intel Icelake on single node Elasticsearch with 2 logical threads.

As seen in Figure 2, we observed up to 43% higher throughput on Ampere Altra virtual machines compared to AMD Milan virtual machines and 25% higher compared to Intel Icelake virtual machines on single node Elasticsearch with 4 logical threads.

We also tested Elastic search in a 3-node cluster. In Figure 3, we observed up to 58% higher throughput on Ampere Altra virtual machines compared to AMD Milan virtual machines and 27% higher compared to Intel Icelake virtual machines.

Scalable search engines are used in many cloud workflows today. Elasticsearch is a popular and efficient search engine that is used in a scale out configuration. Ampere Altra processors are designed to deliver exceptional performance and energy efficiency for cloud native applications like Elasticsearch. In Ampere’s testing, these processors demonstrated compelling performance and outstanding energy efficiency compared to the best x86 processors on the market. For more information on this workload or other workloads our engineers have been working on, please visit https://solutions.amperecomputing.com/.

Footnotes

All data and information contained herein is for informational purposes only and Ampere reserves the right to change it without notice. This document may contain technical inaccuracies, omissions and typographical errors, and Ampere is under no obligation to update or correct this information. Ampere makes no representations or warranties of any kind, including but not limited to express or implied guarantees of noninfringement, merchantability, or fitness for a particular purpose, and assumes no liability of any kind. All information is provided “AS IS.” This document is not an offer or a binding commitment by Ampere. Use of the products contemplated herein requires the subsequent negotiation and execution of a definitive agreement or is subject to Ampere’s Terms and Conditions for the Sale of Goods.

System configurations, components, software versions, and testing environments that differ from those used in Ampere’s tests may result in different measurements than those obtained by Ampere.

Price performance was calculated from OCI Compute pricing list, for A1 Flex VMs in March of 2022. Refer to individual tests for core counts. Memory and Storage is same across all the VM’s, and hence not considered.

©2022 Ampere Computing. All Rights Reserved. Ampere, Ampere Computing, Altra and the ‘A’ logo are all registered trademarks or trademarks of Ampere Computing. Arm is a registered trademark of Arm Limited (or its subsidiaries). All other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

Ampere Computing^® / 4655 Great America Parkway, Suite 601 / Santa Clara, CA 95054 / amperecomputing.com

Created At : July 6th 2022, 5:23:40 pm

Last Updated At : July 30th 2024, 10:01:48 pm

Ampere Computing LLC

4655 Great America Parkway Suite 601

Santa Clara, CA 95054

| | |

This site runs on Ampere Processors.