Thursday, April 11, 2013

Analyzing Virtualization CPU Performance


There are always many questions around CPU performance. Most individuals I speak with are providing 2 or more vCPUs to the VMs mainly because they feel more is better. Having an extra vCPU cannot hurt performance, right? Well that is not always the case, but lets not limit our discussion to this one topic. What can affect CPU performance? There are primarily 4 of areas of concern:
  1. Idle VMs – Due to timer interrupts on idle VMs it can cause more overhead than when the VM is in use.
  2. CPU Affinity – You may receive a positive impact on performance for the virtual CPU that is pinned to a physical CPU using affinity, however this can cause a performance issue on the rest of the system as the scheduler must try to work around the affinity in order to balance the load. It is strongly advised to not use CPU affinity.
  3. SMP VMs – When using multiple processors there is a co-scheduling overhead
  4. Low CPU resources – When there is contention the CPU scheduler will force vCPUs of lower priority to queue their requests, now all VMs are operating slower

Now that you know the 4 items that can affect performance we need to understand the Key Metrics to monitor, they are:
  1. Host CPU used – the amount of time that the hosts physical CPU was used erring that sampling period. 
  2. irtual machine CPU used – the amount of time that the virtual CPU was actively using the physical CPU during that sampling period. Virtual machines that are utilizing virtual SMP can have their information displayed as an aggregate of all virtual CPUs within the virtual machine or per virtual CPU.
  3. Virtual machine CPU ready time – the amount of time the virtual CPU was ready but could not get scheduled to run on the physical CPU. Virtual machines that are utilizing virtual SMP can have their information displayed as an aggregate of all virtual CPUs within the virtual machine or per virtual CPU. 

We can monitor this with many tools both free and not so free tools, the free tools provided by VMware are the vSphere Client and esxtop/resxtop. Here are a couple of screen shots:

vSphere Client Chart Legend - CPU Ready Time in Milliseconds

resxtop Output - CPU Usage per VM

It is important to understand the differences between the information gathered through the vSphere client and esxtop. The vSphere client information is gathered utilizing 20 second intervals where esxtop information is gathered using 5 second intervals. It is always important to normalize the values for comparing the output from these two monitoring tools.

You can use the vSphere client to monitor CPU performance for the hosts, clusters, resource pools, virtual machines and vApps. There are two important items to take notes regarding use of the vSphere client performance charts:
  1. Only two counters at a time can be displayed in the chart.
  2. The virtual machine object shows aggregate data for all the CPUs within the virtual SMP virtual machine. The numbered objects tracked the performance of the individual virtual CPUs.

If you see a short spike in the CPU used or CPU ready that will normally indicate you are making good use of your host resources. If both values are constantly high the host is probably overcommitted. The values to watch for negative performance would be above a 90% for the CPU used to value and a CPU ready value that is above 20%. 

Do not forget that the time values are different between the different tools used to monitor the CPU values. If you are monitoring a physical number you will need to know the sampling time in order to figure out the percentages involved.

The #1 Warning sign is Ready Time and as expected this latency can affect the performance of the guest operating system. 2000ms or higher indicates contention issue. The below picture shows an example of a VM with inverted usage compared to Ready Time. 

Ready Time and Usage on a single VM
The #2 warning sign is a high used value which can be determined by host CPU used or VM CPU used. This is not always an indication of poor CPU performance. You may wonder why! Keep in mind this is not a physical CPU meaning that what the VM is using does not always indicate that the host is maxed out nor does it always indicate you need more vCPUs. It may just be telling you there is high utilization which is a good thing!

My personal favorite tool for analyzing performance is esxtop/resxtop. You can run esxtop directly on the host and resxtop on the vSphere Management Assistant Appliance. Below are the definitions you need to know in order to analyze what is happening:
  • PCPU USED(%) – CPU utilization per physical CPU
  • Pre-group statistics
    • %USED – Utilization (includes %SYS)
    • %SYS – VMkernel system activity
    • %RDY – Ready time
    • %WAIT – Wait and idling time
    • %CSTP – Pending co-schedule
    • %MLMTD – Time not running because of a CPU limit
    • NWLD – number of worlds associated with a given group

The following commands are the most common when monitoring CPU performance:
  • Press spacebar – immediately updates the current screen
  • Press s – prompts you for the delay between updates, in seconds. The default value is 5 seconds and the minimum value is 2 seconds.
  • Press V – that is a capital “V”. Displays virtual machine instances only. Do not confuse with the lowercase “v”, which is to view the storage virtual machine resource utilization. If you need to switch back to CPU utilization then type the c.
  • Press e – toggles whether CPU statistics are displayed expanded or unexpanded. The expanded display includes CPU resource utilization statistics broken down by individual worlds belonging to a virtual machine. All percentages of the individual worlds are percentage of a single physical CPU.

It is important to know what to look for and what tools can help you get that information, we will take a look at more CPU issues in later blog posts. Be sure to post any questions on http://www.vmtraining.net/forum/


1 comment:

Thanks for your comment!