Now that I have vRealize Operations Manager (vROps) working again I have discovered that out of the 572 active alerts 359 of them are for “Virtual machine has continuous high CPU usage causing stress”.
I have taken a closer look at these and noticed that one of them was a virtual machine that had been provisioned with SQL Server installed on it but wasn’t being used and there were no user databases on it.
By default, vROps considers a virtual machine to have CPU stress if the CPU demand goes over 70% of the configured CPU. This virtual machine has 2 vCPUs and is running on a host with 2.6Ghz processors in it, so vROps calculates the virtual machine has 5.2Ghz CPU capacity. Therefore, if the CPU demand goes over 3.64Ghz then it will be considered to be stressed.
With the standard vROps policy this alert looks at the worst 1 hour period over the last 30 days. Looking at the CPU demand on this virtual machine over the previous week I could see that between 6pm and 8pm it was using 100% of the CPU most of the time. In fact every Sunday for a period of about 2 hours this virtual machine uses very high CPU, the previous Sunday it was 10am until 12pm and the week before 1pm to 3pm. Further investigation showed that all of the virtual machines had high CPU at some point on a Sunday due to virus scanning. I don’t really care that the virtual machines had high CPU on a Sunday, so I changed the vROps policy to only analyse data for virtual machines Monday to Sunday.
This has reduced the amount of alerts for CPU stress by a considerable amount but I am still getting some due to some of the virtual machines using more than 70% of CPU for a period of 1 hour at some point over the previous 30 days. To reduce the amount of alerts I am receiving for CPU stress further I have configured the policy to consider the entire 30 day range instead of a sliding 60 minute window.