Ampere AI

Best performance for your AI workloads

AI Efficiency

Reduce power consumption without sacrificing performance and build a sustainable future.

Computer Vision

Natural Language Processing

Recommender Engines

Right-Sizing AI Compute

Get the best price/performance benefits in the cloud and better value for AI inferencing compute

Read our Blog

FP16 vs FP32

FP16 data format boosts AI inference performance in the cloud

Computer Vision

Natural Language Processing

Recommender Engines

Customer Testimonials

"Using Ampere A1 instances on OCI with integrated Ampere Optimized AI library, we managed to right-size compute providing price-performance advantage on deep learning inferencing relative to GPUs and to other CPUs. We found an order of magnitude or more reduction in cloud resource costs, measured at 4 operating points for 2 cloud vendors, while avoiding operational complexity for changes in model serving resource needs and cloud offerings."

-Madhuri Yechuri, CEO, Elotl

"Switching to Ampere-optimized Tensorflow running on OCI A1 instances has enabled us to achieve a 75 percent cost saving for the training of the algorithms for our plastics and fabrics identification machines, while lowering our CO2 emissions - thanks to Ampere Altra’s high energy efficiency."

-Martin Holicky, CEO, Matoha

“This breakthrough Wallaroo/Ampere solution allows enterprises to improve inference performance, increase energy efficiency, and balance their ML workloads across available compute resources much more effectively, all of which is critical to meeting the huge demand for AI computing resources today also while addressing the sustainability impact of the explosion in AI.“

-Vid Jain, chief executive officer of Wallaroo.AI

AI Inference Servers

Save big on our AI Inference Servers today
with Ampere's AI Frameworks Pre-Installed

AI Benchmarking

Ampere Optimized AI Frameworks deliver a significant inference performance improvement to applications developed on all major AI frameworks. Ampere AI currently supports the following frameworks:

PyTorch
TensorFlow
ONNX

All Docker images can be conveniently downloaded from the Ampere Computing AI Docker Hub. The software is free and runs seamlessly on all Ampere products.

Ampere AI optimized frameworks + Ampere processors deliver disruptive value for AI inference:

High Performance: Up to 4X performance advantage compared to CPUs built on x86 architecture
Energy Efficiency: 2.8x less power use than x86 processors and 3x smaller footprint. Learn More
Scalability: Optimized architecture, core counts surpassing x86 processors of AMD and Intel, and improved memory bandwidth accelerated by Ampere Optimized AI Frameworks.
Compatibility: Robust ecosystem offering extensive support from leading AI frameworks, libraries, and software tools, facilitating effortless integration.

System configurations, components, software versions, and testing environments that differ from those used in Ampere’s tests may result in different measurements than those obtained by Ampere. The system configurations and components used in our testing are detailed here

Computer Vision Relative Performance: ResNet-50 Throughput/Socket

Ampere Optimized AI

Ampere Altra and Ampere Altra Max, with high performance Ampere optimized frameworks, offers the best-in-class Artificial Intelligence inference performance for all AI applications developed in the most popular frameworks including PyTorch, Tensorflow, and ONNXRuntime. Ampere Model Library (AML) offers pretrained models to help accelerate AI development.

Ampere helps customers achieve superior performance for AI workloads by integrating optimized inference layers into common AI frameworks.

Main Components

Framework Integration Layer: Provides full compatibility with popular developer frameworks. Software works with the trained networks “as is”. No conversions or are required.
Model Optimization Layer: Implements techniques such as structural network enhancements, changes to the processing order for efficiency, and data flow optimizations, without accuracy degradation.
Hardware Acceleration Layer: Includes a “just-in-time”, optimization compiler that utilizes a small number of Microkernels optimized for Ampere processors. This approach allows the inference engine to deliver high-performance on all frameworks.

Native FP16 Support

Ampere hardware uniquely offers native support of the FP16 data format providing nearly 2X speedup over FP32 with almost no accuracy loss for most AI models.

FP16, or "half-precision floating point," represents numbers using 16 bits, making computations faster and requiring less memory compared to the FP32 (single precision) data format. The FP16 data format is widely adopted in AI applications, specifically for AI inference workloads. It offers distinct advantages, especially in tasks like neural network inference, which require intensive computations and real-time responsiveness. Utilizing FP16 enables accelerated processing of AI models, resulting in enhanced performance, optimized memory usage, and improved energy efficiency without compromising on accuracy.

Learn more about the difference between FP16 and FP32 data formats

Benefits of FP16 for Computer Vision

Benefits of FP16 for Natural Language Processing

Benefits of FP16 for Recommender Engines

Applications

Downloads: Ampere Optimized AI Software

Ampere Optimized PyTorch

Ampere's inference acceleration engine is fully integrated with Pytorch framework. Pytorch models and software written with Pytorch API can run as-is, without any modifications.

> Docker Image

Ampere Optimized TensorFlow

Ampere's inference acceleration engine is fully integrated with Tensorflow framework. Tensorflow models and software written with Tensorflow API can run as-is, without any modifications.

> Docker Image > TensorFlow Serving

Ampere Optimized ONNX Runtime

Ampere's inference acceleration engine is fully integrated with ONNX Runtime framework. ONNX models and software written with ONNX Runtime API can run as-is, without any modifications.

> Docker Image

Ampere Model Library (AML)

Ampere Model Library (AML) is a collection of AI model architectures that handle the industry's most demanding workloads. Access the AML open GitHub repository to validate the excellent performance of the Ampere AI with optimized frameworks on our Ampere Altra family of cloud-native processors.

> Access AML

FAQs

What is Ampere AI?

What is an Ampere Optimized Framework?

What is Ampere Model Library (AML)?

How can I use an Ampere Optimized Framework?

What version of Python is supported?

Which operating system are the Docker images based on?

What control does the optimized framework software offer?

Do Ampere optimized frameworks support training?

I’m having issues using Ampere AI, who can I contact?

Resources

Briefs

Wallaroo and Ampere Accelerate AI Inference by 7X

AI Inference on Ampere Altra Max

AI- Ampere Vs. Graviton

AI Inference on Azure

Ampere AI Efficiency: Computer Vision

Ampere AI Efficiency: Natural Language Processing

Ampere AI Efficiency: Recommender Engine

Tutorials

Running AI on Scaleway COP-ARM Instances

Creating Ampere AI Virtual Machines on Google Cloud

Creating Ampere AI Virtual Machines on Microsoft Azure

Running AI On OCI Ampere A1 Instance

Ampere Optimized Frameworks

Documentation

TensorFlow Serving User Guide

Ampere Optimized ONNX Runtime Documentation

Ampere Optimized PyTorch Documentation

Ampere Optimized TensorFlow Documentation

Publications

Fp16 Vs Fp32 Data Formats

CPU AI Inference in the Cloud

Ampere AI Optimized Frameworks

Podcast

Evolution of Edge AI with Tony Rigoni

Ampere Ready Software

See the worlds most popular workloads running on Ampere

Created At : December 7th 2023, 6:32:47 pm

Last Updated At : September 25th 2024, 6:49:46 pm

Ampere Computing

4655 Great America Parkway

Suite 601 Santa Clara, CA 95054

| | | | |

This site runs on Ampere Processors.

.css-fa7ybv{display:inline-block;width:100%;}.css-bxak8j{margin-bottom:var(--chakra-space-2);}Ampere AI

.css-fnyzif{font-size:18px;font-weight:400;line-height:120%;}@media screen and (min-width: 48em){.css-fnyzif{font-size:20px;}}Best performance for your AI workloads

Best performance for your AI workloads

.css-2h8yz{font-size:21px;font-weight:600;padding-left:var(--chakra-space-2);border-left:8px solid #F93822;margin-top:40px;}@media screen and (min-width: 62em){.css-2h8yz{font-size:25px;padding-left:20px;}}

Customer Testimonials

AI Inference Servers

AI Benchmarking

.css-1c7a2tf{font-weight:600;margin-top:5px;margin-bottom:5px;}

Ampere AI optimized frameworks + Ampere processors deliver disruptive value for AI inference:

Ampere Optimized AI

Main Components

Native FP16 Support

Applications

Downloads: Ampere Optimized AI Software

.css-2w8jd3{font-weight:600;margin-top:4px;margin-bottom:4px;}Ampere Optimized PyTorch

Ampere Optimized TensorFlow

Ampere Optimized ONNX Runtime

Ampere Model Library (AML)

FAQs

Resources

Ampere Ready Software

Ampere AI

Best performance for your AI workloads

Ampere Optimized PyTorch