Dpsv5 Virtual Machines Powered by Ampere Altra Processors
Ampere®Altra® processors are designed to deliver exceptional performance for cloud native applications like NGINX. They do so by using an innovative architectural design, operating at consistent frequencies, and using single-threaded cores that make applications more resistant to noisy neighbor issues. This allows workloads to run in a predictable manner with minimal variance under increasing loads. The processors are also designed to deliver exceptional energy efficiency. This translates to industry leading performance/watt capabilities and a lower carbon footprint.
Microsoft offers a comprehensive line of Azure Virtual Machines featuring the Ampere Altra Cloud Native processor that can run a diverse and broad set of scale-out workloads such as web servers, open-source databases, in-memory applications, big data analytics, gaming, media, and more. The Dpsv5 VMs are general-purpose VMs that provide 4 GB of memory per vCPU and a combination of vCPUs, memory, and local storage to cost-effectively run workloads that do not require larger amounts of RAM per vCPU. The Epsv5 VMs are memory-optimized VMs that provide 8 GB of memory per vCPU, which can benefit memory-intensive workloads including open-source databases, in-memory caching applications, gaming, and data analytics engines.
NGINX is an open source, high performance HTTP server that can also be used as a reverse proxy, load balancer, mail proxy, and HTTP cache. It uses a sophisticated event-driven architecture that allows it to scale to hundreds of thousands of concurrent connections on modern hardware. NGINX is the most popular web server among high-traffic websites as of 2021 with a 33.8% market share, according to W3Techs
In our tests, NGINX was hosted on 16 vCPU VMs. We compared the Ampere® Altra® based Microsoft Azure D16psv5 to the Intel Xeon Icelake based D16sv5 and the AMD Milan based D16asv5 VMs. Like most open-source software, NGINX is natively supported on aarch64. WRK is used as a load generator running on a 16 vCPU VM. Performance of NGINX is measured as requests per second under a Service Level Agreement (SLA). We use no more than 3ms for the 99th percentile latency (p99) as SLA. We configured the threads and concurrency levels to achieve maximum throughput while keeping the p99 latency under 3ms. The client and server VMs were configured to be part of the same subnet to achieve the best network throughput.
Here is our system configuration used for these tests:
Standard D16s_v5 | Standard D16as_v5 | Standard D16ps_v5 | |
---|---|---|---|
Number of vCPUs | 16 | 16 | 16 |
Monthly cost | $654 | $589 | $523 |
OS | Red Hat Enterprise Linux 9.1 (Plow) | Red Hat Enterprise Linux 9.1 (Plow) | Red Hat Enterprise Linux 9.1 (Plow) |
Kernel | 5.14.0-162.6.1.el9_1 | 5.14.0-162.6.1.el9_1 | 5.14.0-162.6.1.el9_1 |
Memory | 64GB | 64GB | 64GB |
NGINX version | 1.15.4 | 1.15.4 | 1.15.4 |
WRK version | 4.2.0 | 4.2.0 | 4.2.0 |
For these tests, we use NGINX with Lua and brotli. NGINX is widely deployed with Lua, which is a scripting language that can be embedded in NGINX, this provides ease of extensibility to add new features to NGINX benefits of using Lua with NGINX. In the context of web-services, compression is very important, this is because most edge devices heavily rely on compression to send data faster. We use brotli for level 5 compression in our benchmarking, brotli is commonly used with NGINX and Lua.
Below is the NGINX configuration file that was used for this benchmarking:
NGINX Configuration
user root;
worker_processes auto;
error_log logs/error.log;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
access_log off;
server {
listen 80;
server_name localhost;
location / {
root /var/www/html;
index index.html index.htm;
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
}
access_log off;
error_log off;
brotli on;
brotli_comp_level 5;
brotli_types text/plain text/css application/x-javascript application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
location /content {
rewrite_by_lua_block {
local uri = ngx.var.uri
if (uri == "/content/1") then
ngx.req.set_uri("/1.html")
elseif (uri == "/content/2") then
ngx.req.set_uri("/2.css")
elseif (uri == "/content/3") then
ngx.req.set_uri("/3.js")
end
}
header_filter_by_lua_block { ngx.header.content_length = nil }
body_filter_by_lua_block {
ngx.arg[1] = "<!-- What up " .. os.date() .. " -->" .. ngx.arg[1] .. "<!-- " .. os.clock() .. "-->"
ngx.arg[1] = string.gsub(ngx.arg[1], "\n", "")
ngx.arg[1] = string.gsub(ngx.arg[1], ">%s+<", "")
}
}
}
server {
root /var/www/html;
listen 443 ssl http2 default_server;
server_name _;
access_log off;
error_log off;
brotli on;
brotli_comp_level 5;
brotli_types text/plain text/css application/x-javascript application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
ssl_certificate "/etc/pki/tls/certs/NGINX_TEST_SSL.crt";
ssl_certificate_key "/etc/pki/tls/private/NGINX_TEST_SSL.key";
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_ciphers ALL:!aNULL:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP;
ssl_prefer_server_ciphers on;
location /content {
rewrite_by_lua_block {
local uri = ngx.var.uri
if (uri == "/content/1") then
ngx.req.set_uri("/1.html")
elseif (uri == "/content/2") then
ngx.req.set_uri("/2.css")
elseif (uri == "/content/3") then
ngx.req.set_uri("/3.js")
end
}
header_filter_by_lua_block { ngx.header.content_length = nil }
body_filter_by_lua_block {
ngx.arg[1] = "<!-- What up " .. os.date() .. " -->" .. ngx.arg[1] .. "<!-- " .. os.clock() .. "-->"
ngx.arg[1] = string.gsub(ngx.arg[1], "\n", "")
ngx.arg[1] = string.gsub(ngx.arg[1], ">%s+<", "")
}
}
}
}
We used the following compile time options to build NGINX with LuaJIT and brotli.
./configure --with-ld-opt="-Wl,-rpath,$PATH_TO_LUAJIT_LIB" --with-http_ssl_module --
with-http_stub_status_module --with-openssl=../openssl --with-http_v2_module --add-module=../ngx_devel_kit --add-module=../lua-nginx-module-0.10.13 --add-module=../ngx_brotli --prefix=/tmp/nginx
make -j16
sudo make install
On the client , the wrk command line used is shown below. We varied $thread and $connection to arrive at the best throughput under the 3ms p.99 latency SLA.
./wrk-4.2.0/wrk -t${thread} -c${connection} –duration=${duration} -H 'Accept-Encoding:br’ https://${NGINX_HOST}:443/content/1
The Microsoft Azure Dpsv5 VMs powered by Ampere Altra processors deliver compelling performance in a variety of NGINX web server configurations.
Ampere®Altra® based Microsoft Azure Dpsv5 VMs outperform their x86 VM counterparts. The performance results are normalized to Intel Xeon Icelake based D16sv5 VM. Ampere®Altra® D16psv5 VM has 37% better performance than Intel Xeon D16sv5 and 50% better than AMD Milan D16asv5.
Price-performance is an important consideration for a large-scale deployment in the cloud. Ampere®Altra® based Dpsv5 VMs offer 70% better price-performance than the corresponding x86 VMs used in the testing. In summary, Ampere®Altra® based VMs in Microsoft Azure offer better performance than the legacy x86 VMs at lower price for a typical NGINX configuration.
The Microsoft Azure Dpsv5 VMs powered by Ampere Altra processors are an excellent choice for cloud native workloads such as NGINX due to the compelling performance and price-performance value that they offer. For cloud application developers, transitioning NGINX applications from legacy x86 VMs to Ampere Altra VMs is seamless today due to the maturity of the aarch64 software ecosystem.
For more information about Azure virtual machines with Ampere Altra Arm-based processors, please visit the Azure-Ampere VMs general availability announcement for further details.
All data and information contained herein is for informational purposes only and Ampere reserves the right to change it without notice. This document may contain technical inaccuracies, omissions and typographical errors, and Ampere is under no obligation to update or correct this information. Ampere makes no representations or warranties of any kind, including but not limited to express or implied guarantees of noninfringement, merchantability, or fitness for a particular purpose, and assumes no liability of any kind. All information is provided “AS IS.” This document is not an offer or a binding commitment by Ampere. Use of the products contemplated herein requires the subsequent negotiation and execution of a definitive agreement or is subject to Ampere’s Terms and Conditions for the Sale of Goods.
System configurations, components, software versions, and testing environments that differ from those used in Ampere’s tests may result in different measurements than those obtained by Ampere.
Price performance was calculated using Microsoft's Virtual Machines Pricing, in September of 2022. Refer to individual tests for more information.
©2022 Ampere Computing. All Rights Reserved. Ampere, Ampere Computing, Altra and the ‘A’ logo are all registered trademarks or trademarks of Ampere Computing. Arm is a registered trademark of Arm Limited (or its subsidiaries). All other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
Ampere Computing® / 4655 Great America Parkway, Suite 601 / Santa Clara, CA 95054 / amperecomputing.com