Amid the recent hype around AI at tech events like Computex, it’s easy to lose sight of the real-world practical applications of AI in digital services. HPE Discover this year offers us multiple perspectives on the common narrative.
The hype often overshadows practical considerations for implementing AI-enabled services, which must balance general-purpose tasks and AI inference – where trained models make predictions based on high query volumes. Consumers facing high costs and energy efficiency barriers to scaling AI inference can turn to a concept we call AI compute.
The ongoing narrative obfuscates a very real situation, where demand for compute capable of running AI leads to accelerator shortages, long waitlists at service providers, high cloud bills, and unquenchable power requirements for large GPU-accelerated systems.
As AI moves into mainstream production, organizations in every industry are weighing their options and are rightly skeptical of megawatt-fed racks and the costly price tag of this game-changing technology. AI compute addresses the skyrocketing growth in power consumption and intense demand for limited data center capacity worldwide by offering a much more efficient and balanced approach to AI-enabled services. In this environment, it’s clear that the status quo is unsustainable.
It is important to separate AI training from inference workloads, thereby optimizing the hardware for each. While AI training thrives with the parallel processing power of GPU- accelerated compute, AI inference, by contrast, requires less compute resources. Ampere’s innovations in CPU architecture provide an AI compute alternative, with a proven ability to lower cost and power consumption compared to legacy x86 processors with GPUs for AI inference. They offer a clear path to efficient computing for all the digital services required to implement a modern AI-enabled service.
As a digital enterprise or cloud service provider, you can leverage AI compute to deliver competitively superior services without compromising AI inference performance. This approach significantly reduces infrastructure costs by elastically allocating compute resources to meet the dynamic needs of a modern AI service. Ampere partners in both the cloud and through on-premise suppliers of servers such as HPE, Supermicro, Giga Computing, ASRock Rack, and others to build highly energy-efficient AI compute infrastructure for private cloud or hybrid cloud deployments.
This year at HPE Discover in Las Vegas, Ampere and HPE are showing off a variety of AI use cases where the HPE ProLiant RL300 server can deliver AI inference performance without the use of GPUs. It is a flexible, highly-efficient server, purpose-built for service providers and cloud-native enterprises. The platform is an excellent example of AI compute, matching the requirements of modern, scale-out, AI-enabled applications. It is tuned to target workloads such as computer vision at the edge, speech-to-text for transcription or translation services, video processing and CDNs, generative visual AI, and other natural language processing use cases, such as digital assistants and chatbots.
Ampere® Altra® family of processors support up to 128 high-performance cores with an innovative architecture that delivers predictable high performance, linear scaling, high energy efficiency, and two vector computational units per core, making it ideal for running AI inference and the myriad other services required in modern digital services. Industry- leading energy efficiency eliminates underutilized rack space in the data center and enables power-constrained edge environments.
AI compute is also powered by Ampere Optimized AI Frameworks. Combined with Ampere servers like the HPE ProLiant RL300, all the critical components for AI inference processing efficiently are present.
Ampere AI-enabled servers provide:
Optimized frameworks that can be used without any code changes or recompiling of models native to the framework.
Free of charge frameworks that can be obtained from a variety of sources or pre-installed with the purchase of an Ampere GPU-free inference server from our partners.
Choice of optimized AI frameworks for turnkey solutions including PyTorch, TensorFlow, ONNX RT, and now llama.cpp for leading generative AI workloads.
Flexibility of handling both general-purpose and AI inference workloads directly on a scalable server resource, utilizing standard orchestration tooling like Kubernetes.
Significantly better performance than legacy x86 processors for AI inference and better price-performance vs common GPU-enabled cloud instances.
Triple data center efficiency, using 2.8X less power and 3X less space than x86 architectures for AI compute infrastructures with mixed application-tiered services.
AI compute can deliver the power of AI without the complexities, costs, and environmental impact of larger GPU-accelerated systems. Whether optimizing budgets, working toward sustainable technology goals, or maximizing the flexibility and elasticity of your compute infrastructure, running AI compute on GPU-free servers with Ampere Altra processors unlocks innovation and energy efficiency. This stands in stark contrast to the typical hype seen in the GPU-based narrative lately.
If you are attending the HPE Discover event this year, please visit us at booth #2460 to see AI compute in action with a voice transcription service running Whisper in real time. You can also stop by to see our partner HPE demonstrate Yolov8 object detection live on the show floor, running on the HPE ProLiant RL300.