Ampere Computing Logo
Contact Sales
Ampere Computing Logo
Hero Image

LLM Inference with Ampere-based OCI A1

> Oracle Partner Page > AI Solutions   > AI Developer Center  

LLM Inference on OCI

Meet Your Performance Needs While Minimizing TCO

Ampere Cloud Native Processors with Ampere Optimized AI AI Frameworks are uniquely positioned to offer Large Language Model (LLM) Inference at performance levels that meet client needs both in terms of tokens per second (tps) and time till the first token, while providing the lowest cost per million tokens.

Deploy an AI Chatbot on an Ampere A1 Flex Compute Instance Using Minikube

Deploy an AI Chatbot on an Ampere A1 Flex Compute Instance Using Minikube

Choosing CPUs for Efficient Generative AI Deployments

Choosing CPUs for Efficient Generative AI Deployments

Democratizing Generative AI with CPU-based Inference

Democratizing Generative AI with CPU-based Inference

Serge Chat

This demo shows that the Ampere-developed chatbot called Serge running Llama 2 7B on Ampere-based OCI A1 matches the user experience provided by ChatGPT 3.5 based on the 3.5 GPT model. Serge, a simple chatbot made solely for showcase purposes, rivals the performance and the quality of output provided by ChatGPT 3.5 while running GPU-Free on efficient and scalable Ampere-based OCI A1 cloud instances.


LLM Chat Demo

Test the fine-tuned version of the open-source Llama 3 model running on Ampere-based OCI A1 cloud instance with Ampere® Optimized AI Frameworks. Ampere-developed Serge Chat runs at a real-time performance level with latency and token generation rate meeting the user needs.

Deploy On OCI

Access OCI Marketplace Listing

Try OCI Free

Ampere Optimized llama.ccp

Docker Hub

This Docker image can be run on bare metal Ampere® CPUs and Ampere® based VMs available in the cloud. 

> Docker Hub

GitHub

Release notes and binary executables are available on our GitHub

GitHub

Resources

Developer Resource

RAG Examples with Vector Embeddings

Ampere

Developer Resource

Python bindings for llama.cpp

Ampere

Connect with your peers and
get the latest tips, trends, and tools

Created At : June 14th 2024, 6:12:58 pm
Last Updated At : September 26th 2024, 3:46:56 pm
Ampere Logo

Ampere Computing LLC

4655 Great America Parkway Suite 601

Santa Clara, CA 95054

image
image
image
image
image
 |  |  |  |  |  | 
© 2024 Ampere Computing LLC. All rights reserved. Ampere, Altra and the A and Ampere logos are registered trademarks or trademarks of Ampere Computing.
This site is running on Ampere Altra Processors.