Unlocking Java Performance on
Ampere®Altra®Family Processors
Over the last decade, Java has become one of the most popular programming languages for the cloud. Popular cloud applications like Hadoop, Cassandra, and Kafka use the Java language and framework. Because Java is a general-purpose object-oriented language that is designed to Write Once Run Anywhere, it relies on a platform-dependent Java Virtual Machine (JVM) to translate bytecodes into machine code that is specific to the architecture on which the application runs. Obviously, the quality of code generated by the JVM at runtime is critical to application performance.
This guide describes Java support status on Ampere Altra Family processors, provides a method to build OpenJDK, and compares the performance of different OpenJDK versions and binary sources.
OpenJDK is the official reference JVM implementation. OpenJDK is Free Open-Source Software (FOSS), is used by most Java developers, and is the default JVM for most Linux distributions. The AArch64 port has been part of the OpenJDK project for a while now. Today, OpenJDK is well-supported on AArch64 from Java Development Kit 8 (JDK8) onwards.
Ampere Altra and Ampere® Altra® Max processors are designed from the ground up to deliver predictable performance, high scalability, and power efficiency for Cloud Native usages. Ampere Altra cores implement the ARMv8 Instruction Set Architecture (ISA) and support the AArch64 and AArch32 instruction sets. JDKs included in various Linux distributions today support Ampere Altra Family processors, but newer Long Term Support (LTS) versions like JDK17 can provide noticeably better performance.
OpenJDK binaries for Ampere Altra Family processors are available from several sources. Linux distributions make OpenJDK available through their respective package repositories. Adoptium is another source for prebuilt OpenJDK AArch64 binaries.
OpenJDK has many release versions but only the versions listed in Table 1 have the LTS release qualifier. Different OpenJDK distributions may provide End of Life (EOL) dates as shown in Table 1.
First Availability | End of Availability | |
---|---|---|
Java 8 (LTS) | Mar 2014 | Nov 2026 |
Java 11 (LTS) | Sep 2018 | Oct 2024 |
Java 17 (LTS) | Sep 2021 | Oct 2027 |
For more information, refer to https://access.redhat.com/articles/1299013 and https://adoptopenjdk.net/support.html.
Linux distros provide different ways to install OpenJDK - yum repositories for RHEL and CentOS, and apt repositories for Ubuntu or Debian, for example.
For custom OpenJDK builds, this section lists the recommended steps to build OpenJDK from source code.
GCC is recommended for building OpenJDK. Different GCC versions have different AArch64 options as shown in Table 2.
GCC version | Options | Description |
---|---|---|
>=10.1 | -moutline-atomics | Detect atomic instructions at run time; Large System Extensions (LSE) atomic instructions are generated if the processor supports them; Enabled by default |
>=8.4 | -mcpu=neoverse-n1 | Generate optimized code for Ampere Altra Family processors; LSE atomic instructions are generated |
>=8.1 | march=armv8.2-a | Generate optimized code for armv8.2-a ISA; LSE atomic instructions are generated |
bash configure --with-alsa=/usr --with-alsa-lib=/usr/lib64 --with-cacerts-file=/etc/pki/java/cacerts --with-cups=/usr --with-debug-level=release --with-native-debug-symbols=none --with-extra-cflags="-pipe -fPIC -DPIC -Wl,-rpath=/usr/lib64 -L/usr/lib64 -mcpu=neoverse-n1" --with-extra-cxxflags="-pipe -fPIC -DPIC -Wl,-rpath=/usr/lib64 -L/usr/lib64 -mcpu=neoverse-n1" --with-extra-ldflags="-Wl,-rpath=/usr/lib64 -L/usr/lib64" --with-stdc++lib=dynamic --with-target-bits=64 --with-zlib=system --x-includes=/usr/include --x-libraries=/usr/lib64 --with-boot-jdk=<jdk-home-directory> --prefix=<jdk-install-directory> make images make install
Let’s evaluate some performance improvements that are possible with basic tuning. We use SPECjbb2015, a popular standardized Java benchmark in Composite mode on an Ampere Altra Q80-30-based server. Table 3 summarizes the system configuration:
Operating System | CentOS Linux release 8.4.2105 |
---|---|
Kernel | 4.18.0-305.12.1.el8_4 |
gcc version | 8.5.0 20210514 |
BIOS settings | 1 NUMA per Socket (NPS), Max Performance |
p-state governor | Performance |
Transparent Hugepages | Always |
Kernel scheduling parameters | kernel.sched_latency_ns=400000 kernel.sched_migration_cost_ns=40000 kernel.sched_min_granularity_ns=400000000 kernel.sched_nr_migrate=128 kernel.sched_wakeup_granularity_ns=40000 |
-Xms130560m -Xmx130560m -Xmn123g -XX:SurvivorRatio=39 -XX:ObjectAlignmentInBytes=32 -XX:TargetSurvivorRatio=95 -XX:+UseCompressedOops -XX:+UseCompressedClassPointers -XX:MetaspaceSize=64m -server -XX:+AlwaysPreTouch -XX:-UseAdaptiveSizePolicy -XX:-UseCountedLoopSafepoints -XX:-UsePerfData -XX:+PrintFlagsFinal -XX:+UseTransparentHugePages -XX:+UseParallelGC -XX:ParallelGCThreads=80 -XX:AllocatePrefetchDistance=512 -XX:AllocatePrefetchLines=4 -XX:InlineSmallCode=2k -XX:TypeProfileWidth=4 -XX:SoftwarePrefetchHintDistance=128 -XX:+AvoidUnalignedAccesses -XX:BlockZeroingLowLimit=64K -XX:+UseBlockZeroing -XX:-UseSIMDForArrayEquals -XX:+UseSIMDForMemoryOps
-Dspecjbb.customerDriver.threads=64 -Dspecjbb.customerDriver.threads.service=64 -Dspecjbb.customerDriver.threads.probe=64 -Dspecjbb.customerDriver.threads.saturate=96 -Dspecjbb.forkjoin.workers=80 -Dspecjbb.forkjoin.workers.Tier1=80 -Dspecjbb.forkjoin.workers.Tier2=1 -Dspecjbb.forkjoin.workers.Tier3=16 -Dspecjbb.comm.connect.selector.runner.count=4 -Dspecjbb.controller.type=HBIR_RT -Dspecjbb.controller.port=24000
Here are three sources for the same OpenJDK version:
The self-built binary was built from Adoptium source code with the method described in the “How does on build OpenJDK” section. Table 4 lists the JDK provider and GCC version used for these binaries.
JDK | Provider | GCC Version |
---|---|---|
1.8.0_312-b07 | CentOS | GCC 8.5.0-4 |
1.8.0_342-b07 | Adoptium | GCC 7.5.0 |
1.8.0_342-b07 | Self-build | GCC 8.5.0-4 with “-mcpu=neoverse-n1” |
11.0.13+8 | CentOS | GCC 8.5.0-4 |
11.0.16+8 | Adoptium | GCC 7.5.0 |
11.0.16+8 | Self-build | GCC 8.5.0-4 with “-mcpu=neoverse-n1” |
17.0.1+12 | CentOS | GCC 8.5.0-4 |
17.0.4_8 | Adoptium | GCC 10.3.0 |
For a specific OpenJDK version, the latest CentOS-provided binary, the Adoptium binary, and the self-built binary perform similarly, showing that the OpenJDK community is already very friendly to AArch64 and the Ampere Altra Family processors.
Comparing different OpenJDK versions from the same source (the CentOS-provided binary), OpenJDK17 is the most performant version for Ampere Altra Family processors. Figure 2 shows that the Max-jOPS improved by 6% from JDK8 to JDK11 and 12% from JDK8 to JDK17.
Table 4 lists and compares AArch64-specific OpenJDK options on different JDK versions. Use this command to obtain these options:
java -XX:+PrintFlagsFinal -version
Options | Description | 1.8.0_312 | 1.8.0_342 | 11.0.13+8 | 11.0.16+8 | 17.0.1+12 |
---|---|---|---|---|---|---|
UseLSE | Use LSE instructions | True (T) | T | T | T | T |
UseCRC32 | Use CRC32 instructions for CRC32 computation | T | T | T | T | T |
UseNeon | Use Neon for CRC32 computation | False (F) | F | F | F | F |
UseSHA | Control whether SHA instructions are used when available | T | T | T | T | T |
UseSIMDForArrayEquals | Use SIMD instructions in generated array equals code | NA | NA | T | T | T |
UseSIMDForMemoryOps | Use SIMD instructions in generated memory move code | F | F | T | T | T |
SoftwarePrefetchHintDistance | Use prfm hint with specified distance in compiled code | NA | NA | 192 | 192 | 192 |
Because Ampere Altra Family processors include atomics and crc32 implementations in the feature list, the UseLSE and UseCRC32 options are automatically enabled on the evaluated versions using Altra and Altra Max processors.
That means, even without compile options like "-march=armv8.2-a" or "-mcpu=neoverse-n1", OpenJDK can generate optimized code for Ampere Altra Family processors.
OpenJDK is a FOSS implementation of the Java platform and is the de facto JDK used in the cloud. In this white paper, we see that OpenJDK is ported to and has great performance on Ampere Altra Family processors. The latest pre-built binaries provided by OS distro and Adoptium are as performant as building the JDK from source. That said, using the latest OpenJDK LTS version 17 can result in even higher performance. As usual, it is recommended to use newer versions of the GCC compiler and the architecture-specific options for building OpenJDK from source.
All data and information contained herein is for informational purposes only and Ampere reserves the right to change it without notice. This document may contain technical inaccuracies, omissions and typographical errors, and Ampere is under no obligation to update or correct this information. Ampere makes no representations or warranties of any kind, including but not limited to express or implied guarantees of noninfringement, merchantability, or fitness for a particular purpose, and assumes no liability of any kind. All information is provided “AS IS.” This document is not an offer or a binding commitment by Ampere. Use of the products contemplated herein requires the subsequent negotiation and execution of a definitive agreement or is subject to Ampere’s Terms and Conditions for the Sale of Goods.
System configurations, components, software versions, and testing environments that differ from those used in Ampere’s tests may result in different measurements than those obtained by Ampere.
©2022 Ampere Computing. All Rights Reserved. Ampere, Ampere Computing, Altra and the ‘A’ logo are all registered trademarks or trademarks of Ampere Computing. Arm is a registered trademark of Arm Limited (or its subsidiaries). All other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
Ampere Computing® / 4655 Great America Parkway, Suite 601 / Santa Clara, CA 95054 / amperecomputing.com