Amar Dhamdhere, Senior Director of Product Management
The goal of all computing environments is to deliver the maximum possible performance within the constraints of cost, power, and physical space. To ensure that the cost and physical space of a server is being optimally utilized, there are many times where it makes sense to utilize the entire provisioned power of a single server to deliver maximum performance. One way to do this is to increase the core frequency until the maximum power of the CPU, its TDP, is reached. This operating mode is typically referred to as Turbo.
In a multi-tenant cloud computing environment, each of the cores may be owned by an entirely different user, container, or function. Also, many of the running services are latency sensitive and demand consistency in performance so any attempt to maximize instantaneous performance should balance the impact it has on performance variability. The ideal Turbo mode would provide predictable performance to all cores by using power headroom that consistently exists due to shifting power demand inside the CPU itself between cores, memory, and IO activity. The operator could choose to run at consistently lower frequency and power at times when high performance isn’t required, but this would be dictated by them and their users.
Unfortunately, the way that Turbo has been utilized in the past may have achieved other objectives for traditional enterprise environments but does not deliver the type of predictable performance demanded in a modern cloud. For one, it typically varies depending on how many cores are being utilized, with higher frequency when less cores are running and lower frequency with more cores are running. Since a highly efficient cloud aims to utilize all cores as often as possible, this means that as new users or containers are being deployed on the server to bring it to max performance, the frequency decreases. This creates variability amongst users and processes and is the opposite of what is desired in a high utilization multi-tenant environment. Also, the maximum turbo frequencies cannot be maintained across a wide variety of workloads, which means that the performance varies depending on what applications you or the strangers running on the same server are running. This type of noisy neighbor affect is also incompatible with the optimal cloud.
Since a highly efficient cloud computing environment is fundamentally different from the traditional enterprise setting, it demands a new type of turbo, one built for the cloud and for cloud native applications. Reducing frequency when core utilization increases is not desired, and frequency must be consistent across workloads and users. This will result in optimal performance and latency.
Ampere® Altra™, the world’s first cloud native processor, delivers this new type of Turbo performance to ensure predictable performance, high scalability, and power efficiency.
As with all other processors, Ampere Altra must stay within its power, thermal and current specification limits, but it delivers its maximum Turbo frequency in a predictable way while all cores are running and across the vast spectrum of cloud workloads.
Each Ampere® Altra™ CPU has a nominal frequency and a maximum frequency. These frequencies are the same for all cores in the CPU. For example, Q80-30 has a nominal frequency of 2.8 GHz and a maximum frequency of 3.0 GHz which is 1.071x times the nominal.These frequencies are the same whether you have one active core or 80 active cores. Figure 1 shows you the consistent normalized frequency deltas between base and turbo for all 80 cores. The 7.1% delta between max frequency and the nominal frequency remains consistent across the 80 cores whether one core is active or all the 80 are active. This design enables Ampere Altra to run workloads with a consistent and predictable performance while achieving ideal performance scaling, as shown in Figure 3. In addition to consistent frequencies, to run cloud workloads at the maximum frequency 100% of the time, Ampere tests and characterizes Ampere Altra processors with various cloud representative workloads such as database, data analytics, AI, media streaming, web serving, search, and others to ensure that these workloads always run within the CPUs thermal, power and current specification limits at the maximum frequency all of the time. Of course, Ampere Altra provides the flexibility to run its cores at various frequencies and ramp down the frequency when performance isn’t critical, but this is a choice given to the service owner rather than one hidden from them. Customers can expect cloud workloads running on the Ampere Altra processor to achieve consistent performance executing at the maximum frequency, wholly isolated from the noisy neighbor impact of other workloads running on the same processor.
Figure 1: Ampere® Altra™ Q80-30 Frequency deltas. Figure 2: Intel Xeon Scalable 8280 Frequency deltas.
In comparison, on the x86 platform, the maximum/turbo frequency depends on the number of active cores. For example, the Intel Xeon Platinum 8280 Processor has a base frequency of 2.7 GHz and a maximum turbo frequency of 4.0 GHz, which is only achievable with 1 or 2 cores active. This represents 1.3 GHz of variability. Even with maximum core utilization, all core turbo frequency is 3.3 GHz which is still 600 MHz of variability. Figure 2 shows the large normalized frequency deltas between turbo and base as core utilization increase that lead to inconsistent or unpredictable performance as workloads can run at a frequency in that range depending on the number of active workloads and the power profile of the workloads running on the core. These frequency deltas and multi-threaded cores lead to non-ideal performance scaling when more threads are active. There is a 46% degradation in the turbo frequency when 1 or 2 cores are active vs all cores being active. Figure 3 shows the non-ideal performance for the EPYC 7742 and Intel Xeon 8160 and we expect similar behavior on other x86 SKUs as well.
With Ampere’s focus on the cloud and edge markets, it has been able to launch a cloud-native processor that is architected and designed to deliver predictable performance, workload isolation eliminating noisy neighbor issues, and providing the maximum performance for cloud workloads.
When it comes to cloud workloads, predictable and maximum performance is the only way to go.
In my next blog, I will discuss how users can control the performance of every workload that’s running on the Ampere® Altra™ processor and details around the on-die power management processor that enables state-of-the-art power management features from the world’s first cloud-native Ampere® Altra™ processor.
Also stay tuned for a future blog discussing how eliminating simultaneous multithreading, also known as hyper-threading, further reduces performance variability for the cloud.
3 Referring to Intel’s Turbo Boost Technology and AMD’s Turbo Core technology.