VP9 Video Encoding on Google Cloud Workload Brief
Tau T2A Machines Powered by Ampere Altra Processors
Ampere® Altra® processors are designed from the ground up to deliver exceptional performance for Cloud Native applications such as video encoding. With an innovative architecture that delivers high performance, linear scalability, and amazing energy efficiency, Ampere Altra allows workloads to run in a predictable manner with minimal variance under increasing loads. This enables industry leading performance/watt and a smaller carbon footprint for real world workloads like video encoding.
Google Cloud offers the cost-optimized Tau T2A VMs powered by Ampere Altra processors for scale-out Cloud Native workloads in multiple predetermined VM shapes – up to 48 vCPUS per VM, 4 GB of memory per vCPU, up to 32 Gbps networking bandwidth, and a wide range of network-attached storage options. These VMs are suitable for scale-out workloads like web servers, containerized microservices, data-logging processing, media transcoding, and Java applications.
This workload brief focuses on VP9, the open and royalty-free video coding format developed by Google. Advanced video codecs such as VP9 provide greater video compression at the expense of using greater computing resources and power compared to x264 based video compression.
The Google Cloud Tau T2A VMs powered by Ampere Altra processors offer great performance in a variety of video encoding workloads, including VP9. Our previous studies highlight the performance of running x264 and x265 on Google Cloud Tau T2A VMs powered by Ampere Altra processors.
We evaluated the performance of the VP9 codec using input videos from the UGC Dataset, which were uploaded to YouTube and distributed under the Creative Commons license, with resolutions of 360, 480, 720, 1080 and 2160. We used the Live, Upload and Video on Demand (VOD) configurations based on “vbench: a benchmark for video transcoding in the cloud, a benchmark for the emerging video-as-a-service workload,” available here. The Live configuration is a single pass, low latency encoding, while the Upload configuration represents the initial upload to a video sharing service using a single pass encoding without degrading the input video quality and the VOD configuration is a 2-pass encoding for visually lossless output.
We run on 8 vCPU VMs, comparing Google Cloud Tau T2A which uses Ampere Altra processors with Google Cloud N2 (Intel Icelake) and N2D (AMD Milan) legacy x86 based VMs. For VP9 video encoding, we use a target bitrate for the output video of 1 bit/pixel/second for videos with framerate less than or equal to 30 FPS and 1.5 bit/pix/second for videos with framerate greater than 30 FPS. For example, the Sports_1080P-0063.mkv input is 1920x1080@25fps and has a target bitrate of 2,073,600 bits-per-second. For the Live configuration, we used all frames of the input videos. To reduce runtime for the VOD configuration we ran on 200 frames for each input video. For the upload configuration, we ran on 300, 100, 50, 30 & 10 frames for the 360, 480, 720, 1080 and 2160 inputs, respectively, resulting in a runtime of ~400-500s for all inputs for the upload configuration. The command line options we used to run VP9 and additional details are listed in the section Benchmarking Configuration below. First, we present our results for performance for the Live, Upload and VOD configurations followed by our price-performance results.
Ampere Altra-based Google Cloud Tau T2A VMs outperform the x86 VMs on raw performance for the VP9 Live configuration. We plot relative performance averaged over 5 inputs. The T2A VM has 40% better performance than the N2 VM and 24% better compared to the N2D VM.
Ampere Altra-based Google Cloud Tau T2A VMs outperform the x86 VMs on raw performance for the VP9 Upload configuration. We plot relative performance averaged over 5 inputs. The T2A VM has 45% better performance than the N2 VM and 44% better compared to the N2D VM.
Ampere Altra-based Google Cloud Tau T2A VMs outperform the x86 VMs on raw performance for the VP9 VOD configuration. We plot relative performance averaged over 5 inputs. The T2A VM has 34% better performance than the N2 VM and 15% better compared to the N2D VM.
Comparing price-performance, the T2A VMs outperform the legacy x86 VMs even further. For the VP9 Live configuration, Altra T2A VM has 76% better price-performance than the N2 VM and 36% better compared to the N2D VM.
The T2A VMs outperform the legacy x86 VMs even further in price-performance. For the VP9 Upload configuration, Ampere Altra based T2A VM has 83% better price-performance than the N2 VM and 58% better compared to the N2D VM.
For the VP9 VOD configuration, the T2A VMs outperform the legacy x86 VMs even further in price-performance. Altra T2A VM has 70% better price-performance than the N2 VM and 26% better compared to the N2D VM.
N2 standard 8 vCPU | N2D standard 8 vCPU | T2A standard 8 vCPU | |
---|---|---|---|
Number of vCPUs | 8 | 8 | 8 |
Hourly Cost | $0.388472 | $0.337968 | $0.308 |
Operating System | Ubuntu 22.04.1 LTS | Ubuntu 22.04.1 LTS | Ubuntu 22.04.1 LTS |
Kernel version | 5.15.0-1025-gcp | 5.15.0-1025-gcp | 5.15.0-1025-gcp |
VP9 Version | v1.12.0-231-g1450ec46e | v1.12.0-231-g1450ec46e | v1.12.0-231-g1450ec46e |
Memory | 32GB | 32GB | 32GB |
Disk | 10GB NVME | 10GB NVME | 10GB NVME |
Clang Version | Ubuntu clang version 15.0.6 | Ubuntu clang version 15.0.6 | Ubuntu clang version 15.0.6 |
We used the following input files from the UGC Dataset, uploaded to YouTube distributed under the Creative Commons license:
We built VP9 using clang version 15 due to improved VP9 performance for all VMs compared to building with gcc. Clang was installed of each VM using the instructions at https://apt.llvm.org:
echo "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-15 main" | sudo tee -a /etc/apt/sources.list
echo "deb-src http://apt.llvm.org/jammy/ llvm-toolchain-jammy-15 main"\ | sudo tee -a /etc/apt/sources.list
wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key|sudo apt-key add -
apt update
apt-get install -y clang-15 lldb-15 lld-15
The latest VP9 version was download here
VP9 was configured to use the following optimization flags for Ampere Altra processors:
./configure --extra-cflags="-march=armv8.2-a+crypto+fp16+rcpc+dotprod" --extra-cxxflags="-march=armv8.2-a+crypto+fp16+rcpc+dotprod"
For the legacy x86 VMs, we used the default optimization flags, -m64 -O3, which are also used for the Ampere Altra build in addition to the extra flags listed above.
For each vCPU available in the VM, we ran the following command to evaluate VP9 performance for the Live, Upload & VOD configurations. Note, the debug option is used to make the output logfile deterministic.
Live configuration (single pass encoding):
vpxenc --codec=vp9 --profile=0 --height=$H --width=$W --fps=$INPUT_FPS --limit=$NFRAMES -o $OUTPUT $INPUT --target-bitrate=$BITRATE --rt --cpu-used=6 --threads=1 –debug >& $LOG
Upload configuration (single pass encoding):
vpxenc --codec=vp9 --profile=0 --height=$H --width=$W --fps=$INPUT_FPS --limit=$NFRAMES -o $OUTPUT $INPUT --target-bitrate=$BITRATE --best --passes=1 --threads=1 –debug >& $LOG
VOD configuration (2-pass encoding):
vpxenc --codec=vp9 --profile=0 --height=$H --width=$W --fps=$INPUT_FPS --limit=$NFRAMES -o $OUTPUT $INPUT --target-bitrate=$BITRATE --good --cpu-used=4 --threads=1 --passes=2 --pass=1 --fpf=pass1_stat --debug"
vpxenc --codec=vp9 --profile=0 --height=$H --width=$W --fps=$INPUT_FPS --limit=$NFRAMES -o $OUTPUT $INPUT --target-bitrate=$BITRATE --good --cpu-used=1 --threads=1 --passes=2 --pass=2 --fpf=pass1_stat --debug"
Video encoding is a key and popular workload in the cloud. VP9 is an important video format due to improved video compression compared to x264. Advanced codecs such as VP9 are computationally intensive and are increasingly being used to provide greater video compression to improve video quality while reducing storage and network costs.
In our tests, the Google Cloud Tau T2A VMs powered by the Ampere Altra Cloud Native processors delivered better performance and price-performance compared to legacy x86 VMs - up to 45% higher performance and 83% higher price-performance.
For more information about the Google Tau T2D virtual machines with Ampere Altra processors, please visit the Google Cloud blog for further details.