Performance benefit of running vbench with x264 on AmpereOne
Online video usage continues to rapidly expand, driving the usage of video encoding to compress videos while maintaining video quality. The H.264/MPEG-4 AVC standard is the most widely used video format today and we use FFmpeg with x264 to evaluate H.264 performance. “vbench: a Benchmark for Video Transcoding in the Cloud, a benchmark for the emerging video-as-a-service workload”, available here, is used for the input videos and x264 configurations. vbench consists of 15 input videos that were algorithmically selected to represent a large commercial corpus of millions of videos based on resolution, framerate, and complexity.
AmpereOne® A192-26X is designed to deliver exceptional performance for cloud native applications like video encoding using x264. AmpereOne® A192-26X is designed to deliver exceptional performance for cloud native applications. This is accomplished through an innovative architectural design, operating at consistent frequencies, and using single-threaded cores that make applications more resistant to noisy neighbor issues. This allows workloads to run in a predictable manner with minimal variance under increasing loads.
We use vbench "Upload" and "Video on Demand" configurations to evaluate performance and power usage. Upload uses a single pass transcoding without degrading the input video quality, representing the initial encoding when uploading a video to a video service, requiring speed and quality. The Video on Demand configuration uses a 2-pass transcoding configuration that requires speed and improved compression without degrading video quality. The Video on Demand first pass collects statistics used in the second pass to allocate more bits in the output video when encoding complex vs. simple frames.
To maximize ffmpeg throughput, we run multiple ffmpeg instances equal to the number of CPU cores available on the socket, using one ffmpeg thread per instance. All ffmpeg instances are run on one socket with a dedicated CPU core using numactl to set affinity. We calculate aggregate Frames Per Second (FPS) using the average time to transcode the 15 vbench input files for each ffmpeg process and measure the socket level power usage. We compare performance/watt or FPS/Watt, equivalent to Frames/Joule as 1 Watt = 1 Joule/second. We compare AmpereOne® A192-26X processor to AMD EPYC 9654 96-Core Processor (Genoa) and AMD 9754 EPYC 128-Core Processor (Bergamo) running Fedora Server 38 with 6.4 kernel. To minimize OS overhead, the ffmpeg binary, and all input and output files are stored on a ramdisk resulting in less than 2% of time in the kernel for the upload configuration and less than 3.5% for the Video on Demand for AmpereOne® A192-26X, AMD EPYC 9654 (Genoa) and AMD 9754 EPYC (Bergamo). We built recent versions of ffmpeg and libx264 2 with gcc on all platforms, following the instructions on FFmpeg Compilation Guide.
Fig. 1 shows leading performance/watt running vbench upload and Video on Demand configurations. For the upload & Video on Demand configurations, AmpereOne® A192-26X has 34% and 35% improved Frames/Joule (equivalent to FPS/Watt), for upload and Video on Demand (VoD) configurations, compared to AMD EPYC 9654 (Genoa). AmpereOne® A192-26X has 7% and 12% improved Frames/Joule (equivalent to FPS/Watt), for upload and Video on Demand configurations, compared to AMD 9754 EPYC (Bergamo).
x264 is the most popular video format in use and with the ever-increasing viewing of video online, is critical for compressing videos while maintaining quality to reduce video network bandwidth and video storage requirements. AmpereOne® A192-26X processors are designed to deliver exceptional performance and energy efficiency for cloud native applications. In Ampere’s testing, AmpereOne® A192-26X provides industry leading performance/Watt for cloud native applications like video encoding using x264 with up to 35% improved Frames/Joule (equivalent to FPS/Watt) compared to leading x86 processors.
Benefits of running x264 on AmpereOne® A192-26X
Cloud Native: Designed from the ground up for 'born in the cloud' workloads like video encoding using x264, AmpereOne® A192-26X delivers up to 35% higher performance/watt than the best x86 servers.
Energy Efficiency: With up to 192 energy-efficient Arm cores, AmpereOne® A192-26X can consume lower power while maintaining competitive levels of performance.
Lower Carbon Footprint: Outstanding performance and high energy efficiency result in AmpereOne® A192-26X demonstrating up to 35% higher performance/Watt, resulting in lower TCO and a smaller carbon footprint.
Consistency & Predictability: Single threaded cores running at fixed maximum frequencies ensure linear scaling under stringent SLAs and at high loads while running x264.
[1] Andrea Lottarini, Alex Ramirez, Joel Coburn, Martha A. Kim, Parthasarathy Ranganathan, Daniel Stodolsky, Mark Wachsler. vbench: Benchmarking Video Transcoding in the Cloud. In Proceedings of the Twenty-third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), ASPLOS ’18, New York, NY, USA, 2018. ACM. [ Download ]
[2] We built x264 using a recent git commit on Jan 28, 2023 from the code.videolan.org/videolan/x264 repository.
All data and information contained herein is for informational purposes only and Ampere reserves the right to change it without notice. This document may contain technical inaccuracies, omissions and typographical errors, and Ampere is under no obligation to update or correct this information. Ampere makes no representations or warranties of any kind, including but not limited to express or implied guarantees of noninfringement, merchantability, or fitness for a particular purpose, and assumes no liability of any kind. All information is provided “AS IS.” This document is not an offer or a binding commitment by Ampere. Use of the products contemplated herein requires the subsequent negotiation and execution of a definitive agreement or is subject to Ampere’s Terms and Conditions for the Sale of Goods.
System configurations, components, software versions, and testing environments that differ from those used in Ampere’s tests may result in different measurements than those obtained by Ampere.
©2024 Ampere Computing. All Rights Reserved. Ampere, Ampere Computing, Altra and the ‘A’ logo are all registered trademarks or trademarks of Ampere Computing. Arm is a registered trademark of Arm Limited (or its subsidiaries). All other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
Ampere Computing® / 4655 Great America Parkway, Suite 601 / Santa Clara, CA 95054 / amperecomputing.com