Ampere Computing Logo
Ampere Computing Logo

Optimizing Java Applications for Arm64 in the Cloud

Java remains one of the most popular languages for enterprise applications running on the cloud. While languages like Go, Rust, Javascript, and Python have a high profile for cloud application developers, the RedMonk language rankings have ranked Java in the top three most popular languages throughout the history of the ranking.

When deploying applications to the cloud, there are a few key differences between deployment environments and development environments. Whether you’re spinning up a microservice application on Kubernetes or launching virtual machine instances, it is important to tune your Java Virtual Machine (JVM) to ensure that you are getting your money’s worth from your cloud spend. It pays to know how the JVM allocates resources, and to ensure that you are using the resources efficiently.

Most of the information and advice in this series is platform independent and will work just as well on x86_64 and Arm64 CPUs. As Java was designed as a platform-independent language, this is not surprising. As the Java community has spent effort optimizing the JVM for Arm64 (which you will also see called aarch64, for “64-bit Arm architecture”), Java developers should see the performance of their applications improve on that architecture without doing anything special. However, we will point out some areas where the Arm64 and x86_64 architectures differ, and how to take advantage of those differences for your applications. Additionally, we will generally only refer to long-term supported versions of Java tooling. For example, G1GC was introduced as the default garbage collector in the Java 9 development cycle but was not available in a long-term supported JDK (Java Development Kit) until Java 11. As most enterprise Java developers are using LTS versions of the JDK, we will limit version references to those (at the time of writing those are Java 8, 11, 17, 21, and 25).

In this two-part series on tuning Java applications for the cloud, we come at the problem from two different perspectives. In part 1 (this article), we will concentrate on how the JVM allocates resources and identify some of the options and operating system configurations that can result in higher performance on Ampere powered instances in the cloud or dedicated bare metal hardware. In part 2, we will look more closely at the infrastructure side, and Kubernetes and Linux kernel configuration, in particular. We will walk through some of the architectural differences between the Arm64 architecture and x86, and how to ensure that your Kubernetes, operating system, and JVM are all tuned to maximize the bang per buck that you are getting from your Java application.

Part 1: Optimizing the JVM

When running Java applications in the cloud, tuning the JVM is not necessarily one of the things that is front of mind for deployment teams, but getting things wrong, or running with just default options, can impact the performance and cost for your cloud applications.

In this article, we will walk through some of the more helpful tunable elements in the JVM, covering:

  • Performance benefits of using recent Java versions
  • Key differences between cloud instances and developer environments
  • Setting the right heap size and choosing the right Garbage Collector for your application
  • JVM options that may boost price/performance for Ampere powered instances

Keeping Up with the Times

Arm64 support was first introduced into the Java ecosystem with Java 8, and it has been steadily improving since then. If you are still using Java 8, your Java applications can run up to 30% slower than if you are using a more recent version of Java like Java 21 or the recently released Java 25. The reason is two-fold:

  • The performance of Java has been steadily improving across all architectures
  • There are a number of initiatives that have specifically improved performance on Arm64

It is worth noting that it is possible to develop applications with the Java 8 language syntax, while taking advantage of the performance improvements of a more recent JVM, using Oracle’s Java SE Enterprise Performance Pack. This is (simplifying slightly) a distribution of tools that compiles Java 8 applications to run on a JVM from the Java 17 JDK. That said, the language has included a great many improvements over the past 10 years, and we recommend updating your Java applications to run on a more recent distribution of Java.

What is the Difference between Cloud Instances and Developer Desktops?

The JVM’s default ergonomics were designed with the assumption that your Java application is just one of many processes running on a shared host. On a developer laptop or a multi-tenant server, the JVM intentionally plays nice, limiting itself to a relatively small percentage of system memory and leaving headroom for everything else. That works fine on a workstation where the JVM is competing with your IDE, your browser, and background services, but in cloud environments your Java application will typically be the only application you care about in that VM or Docker (more generally OCI) container instance.

By default, if you don’t explicitly set initial and max heap size, the JVM uses a tiered formula to size the heap based on “available memory”. You can see what the heap size is by default for your cloud instances using Java logging:

java -Xlog:gc+heap=debug [0.005s][debug][gc,heap] Minimum heap 8388608 Initial heap 524288000 Maximum heap 8342470656

The defaults for heap sizing, based on system RAM available, are:

  • On small systems (≤ 384 MB RAM), the default max heap is set to 50% of available memory.
  • On systems with memory between 384 MB and 768 MB, the max heap is fixed at 192 MB, no matter how much memory the system actually has in that range.
  • For systems with available memory over 768 MB, the max heap is 25% of available memory.
  • The initial heap (-Xms) is much smaller: around 1/64th of available memory, capped at 1 GB.
  • Since Java 11, when running in OCI containers, the JVM bases these calculations on the container’s memory limit (cgroup) rather than host memory, but the percentages and thresholds remain the same. We will talk about the JVM’s container awareness in our next article.

So, for a VM with 512 MB RAM, the JVM will still only allow 192 MB for the heap. On a laptop with 16 GB RAM, the default cap is ~4 GB. On a container with a 2 GB memory limit, the heap defaults to ~512 MB.

That’s a perfectly reasonable choice if your JVM is sharing a machine with dozens of other processes. But in the cloud, when you spin up a dedicated VM or a container instance, the JVM is often the only significant process running. Instead of trying to be a good neighbor and leave resources for other applications, you want it to use the majority of the resources you’ve provisioned — otherwise, you’re paying for idle memory and under-utilized CPU.


JVM Heap Defaults vs. Cloud Recommendations

This shift has two key implications:

  • Memory allocation: Instead of defaulting to 25–50% of RAM, cloud workloads should usually allocate 80–85% of available memory to the heap. This ensures you get the most out of the memory you’re paying for while leaving room for JVM internals (metaspace, thread stacks, code cache) and OS overhead.
  • CPU utilization: Cloud instances nearly always run on multiple cores, but Kubernetes resource limits can confuse the JVM’s view of the world. If your container requests 1 CPU, the scheduler enforces that limit with time slices across multiple cores. However, the JVM will believe that it is only running on a single core and can make inefficient choices as a result. This can lead to poor garbage collection choice or thread pool sizing. For this reason, cloud developers should explicitly set -XX:ActiveProcessorCount to a number greater than 1 and explicitly choose a garbage collector that can work with multiple garbage collection threads.
ScenarioDefault Ergonomics (no flags)Recommended for Cloud Workloads
Initial heap (-Xms or -XX:InitialRAMPercentage)~1/64th of memory (capped at 1 GB)Match initial heap close to max heap (stable long-lived services): -XX:InitialRAMPercentage=80
Max heap (-Xmx or -XX:MaxRAMPercentage)- ≤ 384 MB RAM → 50% of RAM
- 384–768 MB → fixed 192 MB
- ≥ 768 MB → 25% of RAM
Set heap to 80-85% of container/VM limit: -XX:MaxRAMPercentage=80
GC choiceG1GC (default in Java 11+) or Parallel GC (Java 8) when processor count is greater than or equal to 2 SerialGC when processor count is less than 2 G1GC (-XX:+UseG1GC) is a sensible default for most cloud services
CPU countJVM detects host cores, may overshoot container quotaXX:ActiveProcessorCount=(cpu_limit with min of 2)
Cgroup awarenessJava 11+ detects container limitsSet explicit percentages as you would for VMs

Regardless of your target architecture, if you only tweak a few JVM options for cloud workloads, start here. These settings prevent the most common pitfalls and align the JVM with the resources you’ve explicitly provisioned:


Garbage Collector: Use the G1GC (-XX:+UseG1GC) for most cloud services. It balances throughput and latency, scales well with heap sizes in the multi-GB range and is the JVM’s default in recent releases when you have more than one CPU core.


Active Processor Count:

-XX:ActiveProcessorCount=<cpu_limit with minimum 2>

Match this value to the number of CPUs or millicores assigned to the underlying compute hosting your container. For example, even if Kubernetes allocates a quota of 1024 millicores to your container, if it is running in a 16-core virtual machine, you should be setting ActiveProcessorCount to 2 or more. This allows the VM to appropriately allocate thread pools and choose a garbage collector like G1GC instead of SerialGC, which halts your application entirely during GC runs. The optimal value for this will depend on what else is running in the virtual machine – if you set the number too high, you will have noisy neighbor impacts for other applications running on the same compute node.


Heap Sizing:

-XX:InitialRAMPercentage=80 -XX:MaxRAMPercentage=85

These options tell the JVM to scale its heap based on the container’s memory limits rather than host memory, and to claim a larger fraction than desktop defaults. Use 80% as a safe baseline; push closer to 85% if your workload is steadystate.


Consistency Between Init and Max: For long-lived services, set InitialRAMPercentage equal to or slightly smaller than MaxRAMPercentage. This avoids the performance penalty of gradual heap expansion under load.

With these three knobs, most Java applications running in Kubernetes or cloud VMs will achieve predictable performance and avoid out-of-memory crashes.

JVM Options That Can Improve Performance on Arm64

Beyond heap sizing and CPU alignment, a handful of JVM options can give you measurable improvements for servers running Ampere’s Arm64 CPUs. These are not “one size fits all”. They depend on workload characteristics like RAM usage, latency vs throughput trade-offs, and network I/O, but they’re worth testing to see if they improve your application performance.


Enabling HugePages Transparent Huge Pages allocates a large contiguous block of memory consisting of multiple kernel pages in one try? and treats it as a single memory page from an application perspective. It enables large memory pages by booting the appropriate Linux kernel and using Transparent Huge Pages in your JVM with -XX:+UseTransparentHugePages to allocate large, continuous blocks of memory can offer a massive performance boost for workloads that can take advantage.


Using a 64k-page kernel Booting your host OS with a 64K kernel page size makes sure that memory is allocated and managed by your kernel in larger blocks than the 4K default. This will reduce TLB misses, and speed up memory access, for workloads that tend to use large contiguous blocks of memory. Note that booting kernels with a specific kernel page size and configuring TransparentHugePages require OS support and configuration, so they’re best handled in coordination with your ops team.


Memory Prefetch Some workloads benefit from pre-touching memory pages on startup. By default, virtual memory pages are not mapped to physical memory until they are needed. The first time a physical memory page is needed, the operating system generates a page fault, which fetches a physical memory page, maps the virtual address to the physical address, and stores the pair of addresses in the kernel page table. Pre-touch maps virtual memory addresses to physical memory addresses at start-up time, which makes the first access of those memory pages at run-time faster. Adding the option:

-XX:+AlwaysPreTouch

forces the JVM to commit and map all heap pages at startup, avoiding page faults later under load. The tradeoff: slightly longer startup time, but more consistent latency once running. This option is good for latency-sensitive services that stay up for a long time. This has the additional benefit of ensuring a fast failure at start-up time, if you are requesting more memory than can be made available to your application.


Tiered Compilation vs. Ahead-of-Time JIT The JVM normally compiles hot code paths incrementally at runtime. Options like -XX:+TieredCompilation (enabled by default) balance startup speed with steady-state performance. For cloud workloads where startup time is less important than throughput, you can bias toward compiling more aggressively up front. In some cases, compiling JIT profiles ahead of time (using jaotc or Class Data Sharing archives) can further reduce runtime CPU overhead. However, ahead-of-time compilation comes with both risks and constraints. Just-In-Time (JIT, or runtime) compilation takes advantage of gathering profiling information while running the application. To identify hot methods, method calls that need not be virtual method calls, calls that can be inlined, hot loops within methods, constant parameters, branches frequencies, etc. An Ahead-Of-Time (AOT) compiler is missing all that information and may produce sub-optimal code performance. In addition, language features related to dynamic class loading, where class definitions are not available ahead of time, or are generated at run-time, cannot be used with ahead-of-time compilation.


Vectorization and Intrinsics Modern JVMs on Arm64 include optimized intrinsics for math, crypto, and vector operations. No flags are needed to enable these, but it’s worth validating that you’re running at least Java 17+ to take advantage of these optimizations.


Guideline for adoption:

  • For short-lived batch jobs, avoid options that slow startup (AlwaysPreTouch, aggressive JIT).
  • For long-running services (APIs, web apps), favor memory pretouch and consistent heap sizing.
  • For memory-intensive services, configure TransparentHugePages, consider a kernel with larger memory page size from the default 4K, and monitor TLB performance

Conclusion

The JVM has a long history of making conservative assumptions, tuned for developer laptops and multi-tenant servers rather than dedicated cloud instances. On Ampere®-powered VMs and containers, those defaults often leave memory and CPU cycles unused. By explicitly setting heap percentages, processor counts, and choosing the right garbage collector, you can ensure your applications take full advantage of the hardware beneath them. By using a more recent version of the JVM, you are benefiting fromnthe incremental improvements that have been made since Arm64 support was first added in Java 8.

That’s just the beginning, though. JVM flags and tuning deliver real wins, but the bigger picture includes the operating system and Kubernetes itself. How Linux allocates memory pages, how Kubernetes enforces CPU and memory quotas, and how containers perceive their share of the host all have a direct impact on JVM performance.

In the next article in this series, we’ll step outside the JVM and look at the infrastructure layer:

  • How container awareness in the JVM and Kubernetes resource requests and limits interact,
  • What happens if you don’t set quotas explicitly,
  • How kernel- and cluster-level tuning (kernel-level tuning options, memory page sizes, core pinning) can unlock even more efficiency.

Part 1 provides guidance on the JVM to “use what you’ve paid for,” part 2 will ensure your OS and container platform are tuned for optimal performance.

We invite you to learn more about Ampere developer efforts, find best practices, insights, and give us feedback at: https://developer.amperecomputing.com and https://community.amperecomputing.com/

Created At : November 19th 2025, 6:41:44 pm
Last Updated At : December 9th 2025, 5:42:27 pm
Ampere Logo

Ampere Computing LLC

4655 Great America Parkway Suite 601

Santa Clara, CA 95054

image
image
image
image
image
 |  |  | 
© 2025 Ampere Computing LLC. All rights reserved. Ampere, Altra and the A and Ampere logos are registered trademarks or trademarks of Ampere Computing.
This site runs on Ampere Processors.