Empowering Developers and Fueling AI Adoption with AmpereOne® Design Innovations

Dave Neary, Developer Relations

Jan 10th, 2025

Introduction

The Ampere® roadmap has been meticulously designed to address the evolving demands of data centers and cloud computing environments, especially with the ramp of AI inference applications in virtually every domain. Our Cloud Native Processors offer a compelling combination of performance, scalability, and power efficiency, making them ideal for modern workloads. In our 2024 Annual Roadmap Video we showed you our product plan beyond 192 cores and 8 channels of memory, and later in July we revealed even more of our roadmap to the press and shared various architectural details of AmpereOne®.

For software developers, however, innovations in CPUs can sometimes feel abstract and not directly relevant to them. In this article, we want to share how these hardware-level features are directly relevant to application developers and operators for cloud native application development. Let’s explore how some of the architectural innovations of the AmpereOne processor show up to developers in user space, and what the implications are for new and seasoned developers alike.

Memory Tagging

The Memory Tagging Extension (MTE) is an Arm architecture feature, now available on AmpereOne CPUs. It is implemented in silicon hardware as a defense mechanism to detect both types of memory safety violations.

Benefits of Memory Tagging on Arm64

Memory tagging is designed to improve memory safety and reliability by helping detect and mitigate memory errors such as:

Buffer Overflows: Identifying and blocking out-of-bounds memory accesses.
Use-After-Free (UAF): Detects when a program attempts to access freed memory blocks, reducing security vulnerabilities.
Memory Corruption Bugs: Helps catch subtle issues early in the development cycle, improving software stability.
Debugging Efficiency: Offers hardware-level tagging, which reduces the performance overhead compared to software-based tools that perform a similar function.

By tagging memory regions and enforcing pointer compliance with these tags, MTE allows developers to identify and resolve bugs more effectively, helping build more robust applications.

Taking Advantage of Memory Tagging

Memory tagging is a hardware feature, and it requires your operating system and system C library to support this feature and expose it to user applications. If your system C library implementation (e.g., glibc) supports memory tagging, you can enable and use it as follows:

Ensure Hardware and Kernel Support

Memory tagging requires an AmpereOne CPU, or other CPUs supporting Armv8.5-A or later, and a Linux kernel compiled with memory tagging support (CONFIG_ARM64_MTE).
You can check if your Linux kernel already supports memory tagging by checking for MTE support in /proc/cpuinfo and the kernel configuration.
Memory tagging is available by default in some Linux distributions, including Fedora 39 or later.

Enable Memory Tagging in glibc

In addition to kernel support, your memory allocation library (usually the system C library) should support memory tagging at allocation time. Support for memory tagging in the GNU C library (glibc) was introduced in version 2.38, included in recent Linux distributions including Fedora 39 and later, and Ubuntu 24.04 and later.
Use the glibc.mem.tagging tunable to enable memory tagging – this is disabled by default: export GLIBC_TUNABLES=glibc.mem.tagging=3 enables support for all memory tag checks supported.
This applies tagging to memory allocated through malloc and related functions. Unauthorized memory accesses without the appropriate tag will result in the program stopping with an unauthorized memory access error.

Debug and Test

Running applications with memory tagging enabled can be a useful tool to identify memory access violations during development.
You can use debugging tools such as gdb to inspect memory tags during runtime.

Run in Production (Optional)

Memory tagging can be used in production to detect and mitigate memory safety issues without significant performance penalties, though the use case will depend on application requirements.

Real-World Use Cases

End-users and developers can leverage memory tagging for:

Building more secure applications by catching memory errors proactively.
Running large-scale deployments with enhanced memory safety.
Complementing other memory safety tools with lower runtime overhead.

Memory Tagging Extensions are game-changing for application development and deployment where memory safety and debugging efficiency are critical. Potential application areas can be found in domains such as automotive, medical, telecommunications, and others.

Check out our MTE blog and explainer video.

Quality of Service Enforcement for System Level Cache

System Level Cache (SLC) is a single pool of cache memory which is higher latency than L2 cache, but lower latency that going to system RAM. Quality of service enforcement (QoS Enforcement, also known as Memory Partitioning and Access Management) allows the system user to declare that a specific tenant may only access a capped amount of that SLC.

Using this feature helps application operators and system administrators manage how different processes and applications access memory, offering fine-grained control over memory partitioning and isolation.

Benefits of QoS Enforcement on Arm64

Security: By providing the ability to partition memory access, memory partitioning can help mitigate risks from side-channel attacks or malicious applications that attempt to access other processes' memory. This improves the overall security of systems running untrusted or varied workloads, as each application’s memory can be securely isolated from others.
Resource Management: Memory partitioning provides tools for managing and tracking memory access at a granular level. Operators can set policies for how different parts of the system access memory, ensuring that critical applications always have access to the resources they need, while lower-priority tasks are constrained in their memory usage.
Multi-Tenant Systems: In cloud computing and multi-tenant environments, QoS enforcement is particularly useful. It enables operators to enforce memory access boundaries between different virtual machines (VMs) or containers, improving overall system stability and preventing one tenant's processes from affecting others, which is critical in shared-resource environments.
Support for Large-Scale Systems: Memory partitioning enhances the scalability of systems by enabling more effective memory management, especially in large-scale systems with complex workloads. Developers can create more efficient applications by understanding and controlling how their programs interact with memory, which can lead to better performance in high-demand systems like databases, or AI inference.

Taking Advantage of Memory Partitioning for QoS Enforcement

Ensure Hardware and Kernel Support

To use memory partitioning and SLC QoS enforcement on Linux, the Linux kernel must first be compiled with support for the MPAM Arm feature (CONFIG_ARM_MPAM). This feature is available in Linux kernel version 5.12 and later.
To check if your kernel was built with support for this feature, run _grep CONFIG_ARM_MPAM /boot/config-$(uname -r) _

Enable Memory Partitioning in the Kernel If your kernel supports the feature, but it is not enabled by default, you might need to enable it manually.

First, mount the resctrl virtual filesystem: _mount -t resctrl resctrl /sys/fs/resctrl _
You now have a top-level resource group in /sys/fs/resctrl and can create additional resource groups by making additional directories in this folder.
Once you create a custom resource group, this directory is populated with files that you can use to configure resource policy constraints to apply to this resource group.
The cpus file contains a bitmask of which CPU cores belong to this resource group, and the schemata file defines the policy controls that apply to this resource group. For example, you can ensure that cores dedicated to a latency-sensitive application get 100% bandwidth to 75% of the SLC, and restrict other resource groups to share the remaining 25%.

Real-World Use Cases

QoS enforcement with memory partitioning is beneficial for cloud operators, virtualization technologies, and developers building high-performance, memory-intensive applications. It enables service providers to offer better performance guarantees and operators to manage resources more effectively, particularly in scenarios where multiple applications share the same underlying hardware.

Check out our QoS enforcement blog or the explainer video.

Nested Virtualization

Nested virtualization allows a virtual machine (VM) running under a hypervisor to act as a host for additional VMs. With AmpereOne, Ampere CPUs now support the hardware feature FEAT_NV2 (included in ARMv8.4-A and beyond) which allows for network virtualization, enabling advanced workloads on Ampere infrastructure. This is particularly useful in several scenarios.

Benefits of Nested Virtualization on Arm64

Cloud Platform Enablement: Nested virtualization enables cloud service providers to offer their customers the ability to run hypervisors in VMs. Customers can deploy custom hypervisors or further virtualize their workloads for more flexibility.
Enhanced Efficiency for Testing: Developers and testers of applications requiring an operating system kernel, including hypervisor and kernel developers, or developers of eBPF-related projects, can validate their software in a nested environment without requiring direct access to physical hardware, reducing resource costs.
Isolation and Security: Nested virtualization can improve security by allowing applications to run in isolated, deeply nested environments. This is valuable in confidential computing or scenarios requiring workload isolation through microVMs.
Application Sandboxing: Running complex applications in VMs enables applications using different operating systems to run together in the same cloud environment, with minimal interaction and attack surface.

Availability of Software Support in Linux

As with all new hardware features, there is a period between the availability of the feature in hardware, and its support in software. Ampere engineers are working with ecosystem partners to ensure that Nested Virtualization on Ampere CPUs is available to all customers as soon as possible. The current state of support for Nested Virtualization in the Linux kernel is incomplete, but patches to complete support of the feature are in progress.

Once the feature is fully supported upstream by the Linux kernel and QEMU, future releases of Linux distributions will automatically include software support for this feature.

Real-World Use Cases

Nested Virtualization on Ampere CPUs has a variety of real-world applications across industries, particularly as Ampere and other ARM CPUs gain prominence in cloud, edge, and high-performance computing environments.

Managing VMs in hosted Kubernetes clusters: Cloud Service Provider customers can use hosted Kubernetes services to manage their VM-based applications, in addition to their container applications.
Testing operating system features in the cloud: Developers and testers of hypervisors, operating system kernels, and eBPF-related projects can validate their software in a nested environment without requiring direct access to physical hardware, reducing resource costs. -Increased security for container workloads: MicroVMs like FirecrackerVM enables container workloads to reduce the resources shared with other tenants in a multi-tenant cloud environment, providing a higher level of security.
Cloud-based embedded application development: Nested virtualization enables developers of ARM-based embedded applications like automotive software to run and test their software using cloud infrastructure software, running custom operating systems in VMs.

AmpereOne’s innovative design builds upon the success of the Ampere Altra® family of processors. The new features we described here collectively empower developers to build more secure, efficient, and scalable applications on Arm64, leveraging the latest advancements in hardware capabilities. As these technologies mature, they will further enhance the development and deployment of applications across various industries. Particularly in the age of AI compute, providing top notch tools to developers is key to fueling AI adoption and data center modernization.

Check out our nested virtualization blog for a deeper dive.

Learn more about AmpereOne’s potential to shape the future of compute here.

Useful external reference material can be found here:

Disclaimer

All data and information contained herein is for informational purposes only and Ampere reserves the right to change it without notice. This document may contain technical inaccuracies, omissions and typographical errors, and Ampere is under no obligation to update or correct this information. Ampere makes no representations or warranties of any kind, including express or implied guarantees of noninfringement, merchantability, or fitness for a particular purpose, and assumes no liability of any kind. All information is provided “AS IS.” This document is not an offer or a binding commitment by Ampere.

System configurations, components, software versions, and testing environments that differ from those used in Ampere’s tests may result in different measurements than those obtained by Ampere._

©2024 Ampere Computing LLC. All Rights Reserved. Ampere, Ampere Computing, AmpereOne and the Ampere logo are all registered trademarks or trademarks of Ampere Computing LLC or its affiliates. All other product names used in this publication are for identification purposes only and may be trademarks of their respective companies._

Created At : January 2nd 2023, 4:32:12 am

Last Updated At : March 3rd 2025, 11:08:20 pm

Ampere Computing LLC

4655 Great America Parkway Suite 601

Santa Clara, CA 95054

| | |

This site runs on Ampere Processors.