Ampere Computing Logo
Contact Sales
Ampere Computing Logo
Cloud Native Solutions

Unlocking Java Performance on

Ampere®Altra®Family Processors

Overview

Over the last decade, Java has become one of the most popular programming languages for the cloud. Popular cloud applications like Hadoop, Cassandra, and Kafka use the Java language and framework. Because Java is a general-purpose object-oriented language that is designed to Write Once Run Anywhere, it relies on a platform-dependent Java Virtual Machine (JVM) to translate bytecodes into machine code that is specific to the architecture on which the application runs. Obviously, the quality of code generated by the JVM at runtime is critical to application performance.

This guide describes Java support status on Ampere Altra Family processors, provides a method to build OpenJDK, and compares the performance of different OpenJDK versions and binary sources.

Is Java supported on Ampere Altra Family processors and aarch64?

OpenJDK is the official reference JVM implementation. OpenJDK is Free Open-Source Software (FOSS), is used by most Java developers, and is the default JVM for most Linux distributions. The AArch64 port has been part of the OpenJDK project for a while now. Today, OpenJDK is well-supported on AArch64 from Java Development Kit 8 (JDK8) onwards.

Ampere Altra and Ampere® Altra® Max processors are designed from the ground up to deliver predictable performance, high scalability, and power efficiency for Cloud Native usages. Ampere Altra cores implement the ARMv8 Instruction Set Architecture (ISA) and support the AArch64 and AArch32 instruction sets. JDKs included in various Linux distributions today support Ampere Altra Family processors, but newer Long Term Support (LTS) versions like JDK17 can provide noticeably better performance.

Where is OpenJDK available?

OpenJDK binaries for Ampere Altra Family processors are available from several sources. Linux distributions make OpenJDK available through their respective package repositories. Adoptium is another source for prebuilt OpenJDK AArch64 binaries.

OpenJDK has many release versions but only the versions listed in Table 1 have the LTS release qualifier. Different OpenJDK distributions may provide End of Life (EOL) dates as shown in Table 1.


Table 1: OpenJDK LTS
First AvailabilityEnd of Availability
Java 8 (LTS) Mar 2014 Nov 2026
Java 11 (LTS) Sep 2018Oct 2024
Java 17 (LTS) Sep 2021Oct 2027

For more information, refer to https://access.redhat.com/articles/1299013 and https://adoptopenjdk.net/support.html.

How does one build OpenJDK?

Linux distros provide different ways to install OpenJDK - yum repositories for RHEL and CentOS, and apt repositories for Ubuntu or Debian, for example.

For custom OpenJDK builds, this section lists the recommended steps to build OpenJDK from source code.

GCC is recommended for building OpenJDK. Different GCC versions have different AArch64 options as shown in Table 2.


Table 2: GCC Options
GCC versionOptionsDescription
>=10.1-moutline-atomicsDetect atomic instructions at run time; Large System Extensions (LSE) atomic instructions are generated if the processor supports them; Enabled by default
>=8.4-mcpu=neoverse-n1 Generate optimized code for Ampere Altra Family processors; LSE atomic instructions are generated
>=8.1march=armv8.2-aGenerate optimized code for armv8.2-a ISA; LSE atomic instructions are generated

These configurations and options are used to build OpenJDK:
bash configure --with-alsa=/usr --with-alsa-lib=/usr/lib64 --with-cacerts-file=/etc/pki/java/cacerts --with-cups=/usr --with-debug-level=release --with-native-debug-symbols=none --with-extra-cflags="-pipe -fPIC -DPIC -Wl,-rpath=/usr/lib64 -L/usr/lib64 -mcpu=neoverse-n1" --with-extra-cxxflags="-pipe -fPIC -DPIC -Wl,-rpath=/usr/lib64 -L/usr/lib64 -mcpu=neoverse-n1" --with-extra-ldflags="-Wl,-rpath=/usr/lib64 -L/usr/lib64" --with-stdc++lib=dynamic --with-target-bits=64 --with-zlib=system --x-includes=/usr/include --x-libraries=/usr/lib64 --with-boot-jdk=<jdk-home-directory> --prefix=<jdk-install-directory> make images make install
Performance Implications

Let’s evaluate some performance improvements that are possible with basic tuning. We use SPECjbb2015, a popular standardized Java benchmark in Composite mode on an Ampere Altra Q80-30-based server. Table 3 summarizes the system configuration:


Table 3: System Configuration
Operating SystemCentOS Linux release 8.4.2105
Kernel4.18.0-305.12.1.el8_4
gcc version8.5.0 20210514
BIOS settings 1 NUMA per Socket (NPS), Max Performance
p-state governor Performance
Transparent Hugepages Always
Kernel scheduling parameterskernel.sched_latency_ns=400000 kernel.sched_migration_cost_ns=40000 kernel.sched_min_granularity_ns=400000000 kernel.sched_nr_migrate=128 kernel.sched_wakeup_granularity_ns=40000

These OpenJDK options are used in this evaluation:
-Xms130560m -Xmx130560m -Xmn123g -XX:SurvivorRatio=39 -XX:ObjectAlignmentInBytes=32 -XX:TargetSurvivorRatio=95 -XX:+UseCompressedOops -XX:+UseCompressedClassPointers -XX:MetaspaceSize=64m -server -XX:+AlwaysPreTouch -XX:-UseAdaptiveSizePolicy -XX:-UseCountedLoopSafepoints -XX:-UsePerfData -XX:+PrintFlagsFinal -XX:+UseTransparentHugePages -XX:+UseParallelGC -XX:ParallelGCThreads=80 -XX:AllocatePrefetchDistance=512 -XX:AllocatePrefetchLines=4 -XX:InlineSmallCode=2k -XX:TypeProfileWidth=4 -XX:SoftwarePrefetchHintDistance=128 -XX:+AvoidUnalignedAccesses -XX:BlockZeroingLowLimit=64K -XX:+UseBlockZeroing -XX:-UseSIMDForArrayEquals -XX:+UseSIMDForMemoryOps

These are the SPECjbb properties
-Dspecjbb.customerDriver.threads=64 -Dspecjbb.customerDriver.threads.service=64 -Dspecjbb.customerDriver.threads.probe=64 -Dspecjbb.customerDriver.threads.saturate=96 -Dspecjbb.forkjoin.workers=80 -Dspecjbb.forkjoin.workers.Tier1=80 -Dspecjbb.forkjoin.workers.Tier2=1 -Dspecjbb.forkjoin.workers.Tier3=16 -Dspecjbb.comm.connect.selector.runner.count=4 -Dspecjbb.controller.type=HBIR_RT -Dspecjbb.controller.port=24000

*Note*: Our testing did not target absolute best performance, but is intended to study performance improvements using different compiler options and versions.

Here are three sources for the same OpenJDK version:

  • CentOS repository
  • Adoptium prebuilt binary
  • Self-built binary

The self-built binary was built from Adoptium source code with the method described in the “How does on build OpenJDK” section. Table 4 lists the JDK provider and GCC version used for these binaries.


Table 4: JDK Providers and GCC Versions
JDK Provider GCC Version
1.8.0_312-b07CentOS GCC 8.5.0-4
1.8.0_342-b07AdoptiumGCC 7.5.0
1.8.0_342-b07 Self-build GCC 8.5.0-4 with “-mcpu=neoverse-n1”
11.0.13+8CentOSGCC 8.5.0-4
11.0.16+8AdoptiumGCC 7.5.0
11.0.16+8Self-buildGCC 8.5.0-4 with “-mcpu=neoverse-n1”
17.0.1+12CentOSGCC 8.5.0-4
17.0.4_8AdoptiumGCC 10.3.0

Using the SPECjbb2015 Composite Max-jOPS as the performance metric and CentOS-provided JDK8 data as a baseline, Figure 1 shows JDK8 and JDK11 performance from various sources.
Figure 1: SPECjbb2015 JDK8 and JDK11 Performance from Various Sources
SPECjbb 2015 Max-jOPS across different sources

For a specific OpenJDK version, the latest CentOS-provided binary, the Adoptium binary, and the self-built binary perform similarly, showing that the OpenJDK community is already very friendly to AArch64 and the Ampere Altra Family processors.

Comparing different OpenJDK versions from the same source (the CentOS-provided binary), OpenJDK17 is the most performant version for Ampere Altra Family processors. Figure 2 shows that the Max-jOPS improved by 6% from JDK8 to JDK11 and 12% from JDK8 to JDK17.

Figure 2: Performance Across OpenJDK Versions
SPECjbb 2015 Max-jOPS across OpenJDK versions

Table 4 lists and compares AArch64-specific OpenJDK options on different JDK versions. Use this command to obtain these options:

java -XX:+PrintFlagsFinal -version

Table 5: OpenJDK AArch64 options
OptionsDescription1.8.0_3121.8.0_34211.0.13+811.0.16+817.0.1+12
UseLSEUse LSE instructionsTrue (T)TTTT
UseCRC32Use CRC32 instructions for CRC32 computationTTTTT
UseNeonUse Neon for CRC32 computationFalse (F)FFFF
UseSHAControl whether SHA instructions are used when availableTTTTT
UseSIMDForArrayEquals Use SIMD instructions in generated array equals codeNANATTT
UseSIMDForMemoryOpsUse SIMD instructions in generated memory move codeFFTTT
SoftwarePrefetchHintDistanceUse prfm hint with specified distance in compiled codeNANA192192192

Because Ampere Altra Family processors include atomics and crc32 implementations in the feature list, the UseLSE and UseCRC32 options are automatically enabled on the evaluated versions using Altra and Altra Max processors.

That means, even without compile options like "-march=armv8.2-a" or "-mcpu=neoverse-n1", OpenJDK can generate optimized code for Ampere Altra Family processors.

Conclusions

OpenJDK is a FOSS implementation of the Java platform and is the de facto JDK used in the cloud. In this white paper, we see that OpenJDK is ported to and has great performance on Ampere Altra Family processors. The latest pre-built binaries provided by OS distro and Adoptium are as performant as building the JDK from source. That said, using the latest OpenJDK LTS version 17 can result in even higher performance. As usual, it is recommended to use newer versions of the GCC compiler and the architecture-specific options for building OpenJDK from source.

Footnotes

All data and information contained herein is for informational purposes only and Ampere reserves the right to change it without notice. This document may contain technical inaccuracies, omissions and typographical errors, and Ampere is under no obligation to update or correct this information. Ampere makes no representations or warranties of any kind, including but not limited to express or implied guarantees of noninfringement, merchantability, or fitness for a particular purpose, and assumes no liability of any kind. All information is provided “AS IS.” This document is not an offer or a binding commitment by Ampere. Use of the products contemplated herein requires the subsequent negotiation and execution of a definitive agreement or is subject to Ampere’s Terms and Conditions for the Sale of Goods.

System configurations, components, software versions, and testing environments that differ from those used in Ampere’s tests may result in different measurements than those obtained by Ampere.

©2022 Ampere Computing. All Rights Reserved. Ampere, Ampere Computing, Altra and the ‘A’ logo are all registered trademarks or trademarks of Ampere Computing. Arm is a registered trademark of Arm Limited (or its subsidiaries). All other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

Ampere Computing® / 4655 Great America Parkway, Suite 601 / Santa Clara, CA 95054 / amperecomputing.com

Created At : November 23rd 2022, 4:45:23 am
Last Updated At : January 12th 2023, 8:31:33 am
Ampere Logo

Ampere Computing LLC

4655 Great America Parkway Suite 601

Santa Clara, CA 95054

image
image
image
image
image
 |  |  | 
© 2024 Ampere Computing LLC. All rights reserved. Ampere, Altra and the A and Ampere logos are registered trademarks or trademarks of Ampere Computing.
This site runs on Ampere Processors.