Ampere Computing Logo
Contact Sales
Ampere Computing Logo
High performance compute workloads

VP9 Video Encoding on Google Cloud Workload Brief

Tau T2A Machines Powered by Ampere Altra Processors

Overview
Results and Key Findings
Benchmarking Configuration
Key Findings and Conclusions
Overview

Ampere® Altra® processors are designed from the ground up to deliver exceptional performance for Cloud Native applications such as video encoding. With an innovative architecture that delivers high performance, linear scalability, and amazing energy efficiency, Ampere Altra allows workloads to run in a predictable manner with minimal variance under increasing loads. This enables industry leading performance/watt and a smaller carbon footprint for real world workloads like video encoding.

Google Cloud offers the cost-optimized Tau T2A VMs powered by Ampere Altra processors for scale-out Cloud Native workloads in multiple predetermined VM shapes – up to 48 vCPUS per VM, 4 GB of memory per vCPU, up to 32 Gbps networking bandwidth, and a wide range of network-attached storage options. These VMs are suitable for scale-out workloads like web servers, containerized microservices, data-logging processing, media transcoding, and Java applications.

This workload brief focuses on VP9, the open and royalty-free video coding format developed by Google. Advanced video codecs such as VP9 provide greater video compression at the expense of using greater computing resources and power compared to x264 based video compression.

Results and Key Findings

The Google Cloud Tau T2A VMs powered by Ampere Altra processors offer great performance in a variety of video encoding workloads, including VP9. Our previous studies highlight the performance of running x264 and x265 on Google Cloud Tau T2A VMs powered by Ampere Altra processors.

We evaluated the performance of the VP9 codec using input videos from the UGC Dataset, which were uploaded to YouTube and distributed under the Creative Commons license, with resolutions of 360, 480, 720, 1080 and 2160. We used the Live, Upload and Video on Demand (VOD) configurations based on “vbench: a benchmark for video transcoding in the cloud, a benchmark for the emerging video-as-a-service workload,” available here. The Live configuration is a single pass, low latency encoding, while the Upload configuration represents the initial upload to a video sharing service using a single pass encoding without degrading the input video quality and the VOD configuration is a 2-pass encoding for visually lossless output.

We run on 8 vCPU VMs, comparing Google Cloud Tau T2A which uses Ampere Altra processors with Google Cloud N2 (Intel Icelake) and N2D (AMD Milan) legacy x86 based VMs. For VP9 video encoding, we use a target bitrate for the output video of 1 bit/pixel/second for videos with framerate less than or equal to 30 FPS and 1.5 bit/pix/second for videos with framerate greater than 30 FPS. For example, the Sports_1080P-0063.mkv input is 1920x1080@25fps and has a target bitrate of 2,073,600 bits-per-second. For the Live configuration, we used all frames of the input videos. To reduce runtime for the VOD configuration we ran on 200 frames for each input video. For the upload configuration, we ran on 300, 100, 50, 30 & 10 frames for the 360, 480, 720, 1080 and 2160 inputs, respectively, resulting in a runtime of ~400-500s for all inputs for the upload configuration. The command line options we used to run VP9 and additional details are listed in the section Benchmarking Configuration below. First, we present our results for performance for the Live, Upload and VOD configurations followed by our price-performance results.

Fig.1: VP9 Live Configuration Video Encoding Performance of Google Cloud Tau T2A Virtual Machines powered by Ampere Altra Processors

Ampere Altra-based Google Cloud Tau T2A VMs outperform the x86 VMs on raw performance for the VP9 Live configuration. We plot relative performance averaged over 5 inputs. The T2A VM has 40% better performance than the N2 VM and 24% better compared to the N2D VM.

Fig.2:VP9 Upload Configuration Video Encoding Performance of Google Cloud Tau T2A Virtual Machines powered by Ampere Altra Processors

Ampere Altra-based Google Cloud Tau T2A VMs outperform the x86 VMs on raw performance for the VP9 Upload configuration. We plot relative performance averaged over 5 inputs. The T2A VM has 45% better performance than the N2 VM and 44% better compared to the N2D VM.

Fig.3:VP9 VOD Configuration Video Encoding Performance of Google Cloud Tau T2A Virtual Machines powered by Ampere Altra Processors

Ampere Altra-based Google Cloud Tau T2A VMs outperform the x86 VMs on raw performance for the VP9 VOD configuration. We plot relative performance averaged over 5 inputs. The T2A VM has 34% better performance than the N2 VM and 15% better compared to the N2D VM.

Fig.4:VP9 Live Video Encoding Price-Performance of Google Cloud Tau T2A Virtual Machines powered by Ampere Altra Processors

Comparing price-performance, the T2A VMs outperform the legacy x86 VMs even further. For the VP9 Live configuration, Altra T2A VM has 76% better price-performance than the N2 VM and 36% better compared to the N2D VM.

Fig.5:VP9 Upload Video Encoding Price-Performance of Google Cloud Tau T2A Virtual Machines powered by Ampere Altra Processors

The T2A VMs outperform the legacy x86 VMs even further in price-performance. For the VP9 Upload configuration, Ampere Altra based T2A VM has 83% better price-performance than the N2 VM and 58% better compared to the N2D VM.

Fig.6:VP9 VOD Video Encoding Price-Performance of Google Cloud Tau T2A Virtual Machines powered by Ampere Altra Processors

For the VP9 VOD configuration, the T2A VMs outperform the legacy x86 VMs even further in price-performance. Altra T2A VM has 70% better price-performance than the N2 VM and 26% better compared to the N2D VM.

Benchmarking Configuration
N2 standard 8 vCPUN2D standard 8 vCPUT2A standard 8 vCPU
Number of vCPUs888
Hourly Cost$0.388472$0.337968$0.308
Operating SystemUbuntu 22.04.1 LTSUbuntu 22.04.1 LTSUbuntu 22.04.1 LTS
Kernel version5.15.0-1025-gcp5.15.0-1025-gcp5.15.0-1025-gcp
VP9 Versionv1.12.0-231-g1450ec46ev1.12.0-231-g1450ec46ev1.12.0-231-g1450ec46e
Memory32GB32GB32GB
Disk10GB NVME10GB NVME10GB NVME
Clang VersionUbuntu clang version 15.0.6Ubuntu clang version 15.0.6Ubuntu clang version 15.0.6

We used the following input files from the UGC Dataset, uploaded to YouTube distributed under the Creative Commons license:

  • https://storage.googleapis.com/ugc-dataset/original_videos/Sports/480P/Sports_480P-0623.mkv
  • https://storage.googleapis.com/ugc-dataset/original_videos/Sports/720P/Sports_720P-00a1.mkv
  • https://storage.googleapis.com/ugc-dataset/original_videos/Sports/1080P/Sports_1080P-0063.mkv
  • https://storage.googleapis.com/ugc-dataset/original_videos/Sports/2160P/Sports_2160P-0455.mkv

We built VP9 using clang version 15 due to improved VP9 performance for all VMs compared to building with gcc. Clang was installed of each VM using the instructions at https://apt.llvm.org:

echo "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-15 main" | sudo tee -a /etc/apt/sources.list echo "deb-src http://apt.llvm.org/jammy/ llvm-toolchain-jammy-15 main"\ | sudo tee -a /etc/apt/sources.list wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key|sudo apt-key add - apt update apt-get install -y clang-15 lldb-15 lld-15

The latest VP9 version was download here

VP9 was configured to use the following optimization flags for Ampere Altra processors:

./configure --extra-cflags="-march=armv8.2-a+crypto+fp16+rcpc+dotprod" --extra-cxxflags="-march=armv8.2-a+crypto+fp16+rcpc+dotprod"

For the legacy x86 VMs, we used the default optimization flags, -m64 -O3, which are also used for the Ampere Altra build in addition to the extra flags listed above.

For each vCPU available in the VM, we ran the following command to evaluate VP9 performance for the Live, Upload & VOD configurations. Note, the debug option is used to make the output logfile deterministic.

Live configuration (single pass encoding): vpxenc --codec=vp9 --profile=0 --height=$H --width=$W --fps=$INPUT_FPS --limit=$NFRAMES -o $OUTPUT $INPUT --target-bitrate=$BITRATE --rt --cpu-used=6 --threads=1 –debug >& $LOG Upload configuration (single pass encoding): vpxenc --codec=vp9 --profile=0 --height=$H --width=$W --fps=$INPUT_FPS --limit=$NFRAMES -o $OUTPUT $INPUT --target-bitrate=$BITRATE --best --passes=1 --threads=1 –debug >& $LOG VOD configuration (2-pass encoding): vpxenc --codec=vp9 --profile=0 --height=$H --width=$W --fps=$INPUT_FPS --limit=$NFRAMES -o $OUTPUT $INPUT --target-bitrate=$BITRATE --good --cpu-used=4 --threads=1 --passes=2 --pass=1 --fpf=pass1_stat --debug" vpxenc --codec=vp9 --profile=0 --height=$H --width=$W --fps=$INPUT_FPS --limit=$NFRAMES -o $OUTPUT $INPUT --target-bitrate=$BITRATE --good --cpu-used=1 --threads=1 --passes=2 --pass=2 --fpf=pass1_stat --debug"
Key Findings and Conclusions

Video encoding is a key and popular workload in the cloud. VP9 is an important video format due to improved video compression compared to x264. Advanced codecs such as VP9 are computationally intensive and are increasingly being used to provide greater video compression to improve video quality while reducing storage and network costs.

In our tests, the Google Cloud Tau T2A VMs powered by the Ampere Altra Cloud Native processors delivered better performance and price-performance compared to legacy x86 VMs - up to 45% higher performance and 83% higher price-performance.

For more information about the Google Tau T2D virtual machines with Ampere Altra processors, please visit the Google Cloud blog for further details.

Created At : January 25th 2023, 9:58:43 pm
Last Updated At : May 31st 2023, 10:19:07 pm
Ampere Logo

Ampere Computing LLC

4655 Great America Parkway Suite 601

Santa Clara, CA 95054

image
image
image
image
 |  |  |  |  |  | 
© 2023 Ampere Computing LLC. All rights reserved. Ampere, Altra and the A and Ampere logos are registered trademarks or trademarks of Ampere Computing.
This site is running on Ampere Altra Processors.