Ampere Computing Logo
Contact Sales
Ampere Computing Logo
Hero Image

Ampere AI

The best GPU-Free alternative for AI Inferencing workloads.

Get the Whitepaper

New: GPU-Free AI Inferencing Data       >Learn More

GPU-Free AI Inference Servers

Switch to GPU-Free AI Inference Platforms with Ampere Cloud Native Processors and Ampere Optimized AI Frameworks to maximize performance, energy efficiency, and affordability of AI Inference.

Now shipping through the following partners:

AI Efficiency

Reduce power consumption without sacrificing performance and build a sustainable future.

Computer Vision

Natural Language Processing

Recommender Engines

Right-Sizing AI Compute

Get the best price/performance benefits in the cloud and better value for AI inferencing compute

Read our Blog

FP16 vs FP32

FP16 data format boosts AI inference performance in the cloud

Computer Vision

Natural Language Processing

Recommender Engines

Customer Testimonials

"Using Ampere A1 instances on OCI with integrated Ampere Optimized AI library, we managed to right-size compute providing price-performance advantage on deep learning inferencing relative to GPUs and to other CPUs. We found an order of magnitude or more reduction in cloud resource costs, measured at 4 operating points for 2 cloud vendors, while avoiding operational complexity for changes in model serving resource needs and cloud offerings."

-Madhuri Yechuri, CEO, Elotl

Read More

"Switching to Ampere-optimized Tensorflow running on OCI A1 instances has enabled us to achieve a 75 percent cost saving for the training of the algorithms for our plastics and fabrics identification machines, while lowering our CO2 emissions - thanks to Ampere Altra’s high energy efficiency."

-Martin Holicky, CEO, Matoha

Read More

“This breakthrough Wallaroo/Ampere solution allows enterprises to improve inference performance, increase energy efficiency, and balance their ML workloads across available compute resources much more effectively, all of which is critical to meeting the huge demand for AI computing resources today also while addressing the sustainability impact of the explosion in AI.“

-Vid Jain, chief executive officer of Wallaroo.AI

Read More

AI Benchmarking

Ampere Optimized AI Frameworks deliver a significant inference performance improvement to applications developed on all major AI frameworks. Ampere AI currently supports the following frameworks:

  • PyTorch
  • TensorFlow
  • ONNX

All Docker images can be conveniently downloaded from the Ampere Computing AI Docker Hub. The software is free and runs seamlessly on all Ampere products.

Ampere Optimized AI Frameworks + Ampere processors deliver disruptive value for AI inference:

  • High Performance: Up to 4X performance advantage compared to CPUs built on x86 architecture

  • Energy Efficiency: 2.8x less power use than x86 processors and 3x smaller footprint. Learn More

  • Scalability: Optimized architecture, core counts surpassing x86 processors of AMD and Intel, and improved memory bandwidth accelerated by Ampere Optimized AI Frameworks.

  • Compatibility: Robust ecosystem offering extensive support from leading AI frameworks, libraries, and software tools, facilitating effortless integration.

System configurations, components, software versions, and testing environments that differ from those used in Ampere’s tests may result in different measurements than those obtained by Ampere. The system configurations and components used in our testing are detailed here

Computer Vision Relative Performance: ResNet-50 Throughput/Socket

Ampere Optimized AI Frameworks

Ampere Altra and Ampere Altra Max, with high performance Ampere Optimized AI Frameworks, offers the best-in-class Artificial Intelligence inference performance for all AI applications developed in the most popular frameworks including PyTorch, Tensorflow, and ONNXRuntime. Ampere Model Library (AML) offers pretrained models to help accelerate AI development.

Ampere helps customers achieve superior performance for AI workloads by integrating optimized inference layers into common AI frameworks.


Main Components

  • Framework Integration Layer: Provides full compatibility with popular developer frameworks. Software works with the trained networks “as is”. No conversions or are required.

  • Model Optimization Layer: Implements techniques such as structural network enhancements, changes to the processing order for efficiency, and data flow optimizations, without accuracy degradation.

  • Hardware Acceleration Layer: Includes a “just-in-time”, optimization compiler that utilizes a small number of Microkernels optimized for Ampere processors. This approach allows the inference engine to deliver high-performance on all frameworks.

Native FP16 Support

Ampere hardware uniquely offers native support of the FP16 data format providing nearly 2X speedup over FP32 with almost no accuracy loss for most AI models.

FP16, or "half-precision floating point," represents numbers using 16 bits, making computations faster and requiring less memory compared to the FP32 (single precision) data format. The FP16 data format is widely adopted in AI applications, specifically for AI inference workloads. It offers distinct advantages, especially in tasks like neural network inference, which require intensive computations and real-time responsiveness. Utilizing FP16 enables accelerated processing of AI models, resulting in enhanced performance, optimized memory usage, and improved energy efficiency without compromising on accuracy.

Learn more about the difference between FP16 and FP32 data formats

Benefits of FP16 for Computer Vision

Benefits of FP16 for Natural Language Processing

Benefits of FP16 for Recommender Engines


Downloads: Ampere Optimized AI Software


Ampere Optimized PyTorch

Ampere's inference acceleration engine is fully integrated with Pytorch framework. Pytorch models and software written with Pytorch API can run as-is, without any modifications.

> Docker Image

Ampere Optimized TensorFlow

Ampere's inference acceleration engine is fully integrated with Tensorflow framework. Tensorflow models and software written with Tensorflow API can run as-is, without any modifications.

Ampere Optimized ONNX Runtime

Ampere's inference acceleration engine is fully integrated with ONNX Runtime framework. ONNX models and software written with ONNX Runtime API can run as-is, without any modifications.


Ampere Model Library (AML)

Ampere Model Library (AML) is a collection of AI model architectures that handle the industry's most demanding workloads. Access the AML open GitHub repository to validate the excellent performance of the Ampere Optimized AI Frameworks on our Ampere Altra family of cloud-native processors.





Wallaroo and Ampere Accelerate AI Inference by 7X



AI Inference on Ampere Altra Max



AI- Ampere Vs. Graviton



AI Inference on Azure



Ampere AI Efficiency: Computer Vision



Ampere AI Efficiency: Natural Language Processing



Ampere AI Efficiency: Recommender Engine






Ampere Ready Software

See the worlds most popular workloads running on Ampere

Created At : November 9th 2023, 9:16:25 am
Last Updated At : April 1st 2024, 10:22:22 pm
Ampere Logo

Ampere Computing LLC

4655 Great America Parkway Suite 601

Santa Clara, CA 95054

 |  |  |  |  |  | 
© 2023 Ampere Computing LLC. All rights reserved. Ampere, Altra and the A and Ampere logos are registered trademarks or trademarks of Ampere Computing.
This site is running on Ampere Altra Processors.