Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cpu][windows] cpu.Times(true) should not return percent values #611

Merged
merged 4 commits into from
Dec 9, 2018

Conversation

marcospedreiro
Copy link
Contributor

Details

  • Updates perCPUTimesWithContext(ctx context.Context) in cpu/cpu_windows.go to return CPU time metrics as opposed to percentage values such that cpu.Times(true) is consistent with cpu.Times(false) on windows and other platforms.
  • Adds a couple of tests to TestCpu_times() in cpu/cpu_test.go to validate that summing the per CPU user, system, and idle times is within a certain margin of the cpu time reported by the total.

The current class we're querying (win32_perfformatteddata_counters_processorinformation) returns percentage values. I used the WMI Code generator tool to experiment and see what types of classes and queries were available and saw win32_perfrawdata_counters_processorinformation. From the linked documentation, they are stated to return identical values, except the latter has a few extra fields.

However querying my windows computer for win32_perfrawdata_counters_processorinformation returned different values that were not percentages. I experimented locally with the new class and the following code snippet:

package main

import (
	"fmt"

	"github.com/marcospedreiro/gopsutil/cpu"
)

func main() {
	fmt.Println("[windows] cpu.Times(false)")
	t, err := cpu.Times(false)
	if err != nil {
		fmt.Println(err)
		return
	}
	for _, ct := range t {
		fmt.Println(ct)
	}

	fmt.Println("[windows] cpu.Times(true)")
	t, err = cpu.Times(true)
	if err != nil {
		fmt.Println(err)
		return
	}
	for _, ctf := range t {
		fmt.Println(ctf)
	}

	return
}

and got the output:

[windows] cpu.Times(false)
{"cpu":"cpu-total","user":69030.2,"system":13576.2,"idle":155644.0,"nice":0.0,"iowait":0.0,"irq":0.0,"softirq":0.0,"steal":0.0,"guest":0.0,"guestNice":0.0,"stolen":0.0}
[windows] cpu.Times(true)
{"cpu":"0,5","user":111033750000.0,"system":19630000000.0,"idle":266419843750.0,"nice":0.0,"iowait":0.0,"irq":196250000.0,"softirq":0.0,"steal":0.0,"guest":0.0,"guestNice":0.0,"stolen":0.0}
{"cpu":"0,4","user":112372968750.0,"system":21042968750.0,"idle":263667656250.0,"nice":0.0,"iowait":0.0,"irq":167031250.0,"softirq":0.0,"steal":0.0,"guest":0.0,"guestNice":0.0,"stolen":0.0}
{"cpu":"0,3","user":111817812500.0,"system":21075781250.0,"idle":264189843750.0,"nice":0.0,"iowait":0.0,"irq":182343750.0,"softirq":0.0,"steal":0.0,"guest":0.0,"guestNice":0.0,"stolen":0.0}
{"cpu":"0,2","user":121851093750.0,"system":16515312500.0,"idle":258717187500.0,"nice":0.0,"iowait":0.0,"irq":147968750.0,"softirq":0.0,"steal":0.0,"guest":0.0,"guestNice":0.0,"stolen":0.0}
{"cpu":"0,1","user":125105781250.0,"system":22633437500.0,"idle":249344375000.0,"nice":0.0,"iowait":0.0,"irq":199062500.0,"softirq":0.0,"steal":0.0,"guest":0.0,"guestNice":0.0,"stolen":0.0}
{"cpu":"0,0","user":108120156250.0,"system":34864062500.0,"idle":254101875000.0,"nice":0.0,"iowait":0.0,"irq":4299218750.0,"softirq":0.0,"steal":0.0,"guest":0.0,"guestNice":0.0,"stolen":0.0}

It appears that the win32_perfrawdata_counters_processorinformation may be giving us cpu time data as opposed to percentages, but what I've found in terms of documentation so far has not been helpful, not to mention that we are off by a couple orders of magnitude from the cpu-total values.

If we sum the user time values for each core:

111033750000.0 + 112372968750.0 + 111817812500.0 + 121851093750.0 + 125105781250.0 + 108120156250.0

we get

690301562500 which is the same value as the cpu total 69030.2 except larger by a factor of 10,000,000, which is the number of clock ticks per second on windows

Testing

  • Output of make build_test
# Supported operating systems
GOOS=linux GOARCH=amd64 go test ./... | grep -v "exec format error" | grep "build failed" && exit 1 || exit 0
GOOS=linux GOARCH=386 go test ./... | grep -v "exec format error" | grep "build failed" && exit 1 || exit 0
GOOS=linux GOARCH=arm go test ./... | grep -v "exec format error" | grep "build failed" && exit 1 || exit 0
GOOS=linux GOARCH=arm64 go test ./... | grep -v "exec format error" | grep "build failed" && exit 1 || exit 0
GOOS=freebsd go test ./... | grep -v "exec format error" | grep "build failed" && exit 1 || exit 0
CGO_ENABLED=0 GOOS=darwin go test ./... | grep -v "exec format error" | grep "build failed" && exit 1 || exit 0
GOOS=windows go test ./... | grep -v "exec format error" | grep "build failed" && exit 1 || exit 0
# Operating systems supported for building only (not implemented error if used)
GOOS=solaris go test ./... | grep -v "exec format error" | grep "build failed" && exit 1 || exit 0
GOOS=dragonfly go test ./... | grep -v "exec format error" | grep "build failed" && exit 1 || exit 0
GOOS=netbsd go test ./... | grep -v "exec format error" | grep "build failed" && exit 1 || exit 0
# cross build to OpenBSD not worked since process has "C"
CGO_ENABLED=1 GOOS=darwin go test ./... | grep -v "exec format error" | grep "build failed" && exit 1 || exit 0
Successfully built on all known operating systems

Tested on linux (Centos7), macOS 10.14.1, and Windows 10 with the following code:

package main

import (
	"fmt"

	"github.com/marcospedreiro/gopsutil/cpu"
)

func main() {
	fmt.Println("cpu time total")
	totalCPU, err := cpu.Times(false)
	if err != nil {
		fmt.Println(err)
		return
	}
	for _, ct := range totalCPU {
		fmt.Println(ct)
	}

	fmt.Println("cpu time per core")
	perCPU, err := cpu.Times(true)
	if err != nil {
		fmt.Println(err)
		return
	}

	var perCPUUserTimeSum float64
	var perCPUSystemTimeSum float64
	var perCPUIdleTimeSum float64
	for _, ct := range perCPU {
		fmt.Println(ct)
		perCPUUserTimeSum += ct.User
		perCPUSystemTimeSum += ct.System
		perCPUIdleTimeSum += ct.Idle
	}

	fmt.Printf("Total CPU User Time: %f   | Per CPU User Time Sum:   %f\n", totalCPU[0].User, perCPUUserTimeSum)
	fmt.Printf("Total CPU System Time: %f | Per CPU System Time Sum: %f\n", totalCPU[0].System, perCPUSystemTimeSum)
	fmt.Printf("Total CPU Idle Time: %f   | Per CPU Idle Time Sum:   %f\n", totalCPU[0].Idle, perCPUIdleTimeSum)
}

Example Windows 10 Output:

cpu time total
{"cpu":"cpu-total","user":17401.0,"system":3459.9,"idle":97673.6,"nice":0.0,"iowait":0.0,"irq":0.0,"softirq":0.0,"steal":0.0,"guest":0.0,"guestNice":0.0,"stolen":0.0}
cpu time per core
{"cpu":"0,5","user":2710.3,"system":503.7,"idle":16541.7,"nice":0.0,"iowait":0.0,"irq":8.9,"softirq":0.0,"steal":0.0,"guest":0.0,"guestNice":0.0,"stolen":0.0}
{"cpu":"0,4","user":2790.9,"system":525.0,"idle":16439.8,"nice":0.0,"iowait":0.0,"irq":9.8,"softirq":0.0,"steal":0.0,"guest":0.0,"guestNice":0.0,"stolen":0.0}
{"cpu":"0,3","user":2512.3,"system":481.2,"idle":16762.2,"nice":0.0,"iowait":0.0,"irq":10.4,"softirq":0.0,"steal":0.0,"guest":0.0,"guestNice":0.0,"stolen":0.0}
{"cpu":"0,2","user":4059.6,"system":511.1,"idle":15185.0,"nice":0.0,"iowait":0.0,"irq":10.0,"softirq":0.0,"steal":0.0,"guest":0.0,"guestNice":0.0,"stolen":0.0}
{"cpu":"0,1","user":3385.0,"system":452.2,"idle":15918.5,"nice":0.0,"iowait":0.0,"irq":13.3,"softirq":0.0,"steal":0.0,"guest":0.0,"guestNice":0.0,"stolen":0.0}
{"cpu":"0,0","user":1942.9,"system":986.7,"idle":16826.4,"nice":0.0,"iowait":0.0,"irq":288.3,"softirq":0.0,"steal":0.0,"guest":0.0,"guestNice":0.0,"stolen":0.0}
Total CPU User Time: 17401.046875   | Per CPU User Time Sum:   17401.062500
Total CPU System Time: 3459.859375 | Per CPU System Time Sum: 3459.875000
Total CPU Idle Time: 97673.562500   | Per CPU Idle Time Sum:   97673.625000

Example MacOS Output:

cpu time total
{"cpu":"cpu-total","user":2345.1,"system":1558.5,"idle":8707.3,"nice":0.0,"iowait":0.0,"irq":0.0,"softirq":0.0,"steal":0.0,"guest":0.0,"guestNice":0.0,"stolen":0.0}
cpu time per core
{"cpu":"cpu0","user":659.3,"system":563.9,"idle":1930.0,"nice":0.0,"iowait":0.0,"irq":0.0,"softirq":0.0,"steal":0.0,"guest":0.0,"guestNice":0.0,"stolen":0.0}
{"cpu":"cpu1","user":518.0,"system":243.1,"idle":2391.5,"nice":0.0,"iowait":0.0,"irq":0.0,"softirq":0.0,"steal":0.0,"guest":0.0,"guestNice":0.0,"stolen":0.0}
{"cpu":"cpu2","user":675.7,"system":527.9,"idle":1949.0,"nice":0.0,"iowait":0.0,"irq":0.0,"softirq":0.0,"steal":0.0,"guest":0.0,"guestNice":0.0,"stolen":0.0}
{"cpu":"cpu3","user":492.1,"system":223.7,"idle":2436.8,"nice":0.0,"iowait":0.0,"irq":0.0,"softirq":0.0,"steal":0.0,"guest":0.0,"guestNice":0.0,"stolen":0.0}
Total CPU User Time: 2345.078125   | Per CPU User Time Sum:   2345.078125
Total CPU System Time: 1558.500000 | Per CPU System Time Sum: 1558.500000
Total CPU Idle Time: 8707.257812   | Per CPU Idle Time Sum:   8707.257812

example Centos7 Output

cpu time total
{"cpu":"cpu-total","user":17.7,"system":33.2,"idle":3811.5,"nice":0.0,"iowait":23.4,"irq":0.0,"softirq":3.1,"steal":0.0,"guest":0.0,"guestNice":0.0,"stolen":0.0}
cpu time per core
{"cpu":"cpu0","user":9.6,"system":21.7,"idle":1900.8,"nice":0.0,"iowait":7.3,"irq":0.0,"softirq":1.8,"steal":0.0,"guest":0.0,"guestNice":0.0,"stolen":0.0}
{"cpu":"cpu1","user":8.1,"system":11.5,"idle":1910.7,"nice":0.0,"iowait":16.1,"irq":0.0,"softirq":1.4,"steal":0.0,"guest":0.0,"guestNice":0.0,"stolen":0.0}
Total CPU User Time: 17.670000   | Per CPU User Time Sum:   17.670000
Total CPU System Time: 33.190000 | Per CPU System Time Sum: 33.190000
Total CPU Idle Time: 3811.480000   | Per CPU Idle Time Sum:   3811.480000

Relevant Issues

@marcospedreiro marcospedreiro changed the title [Windows] cpu.Times(true) should not return percent values [cpu][windows] cpu.Times(true) should not return percent values Nov 20, 2018
cpu/cpu_test.go Outdated
perCPUIdleTimeSum += pc.Idle
}
margin := 2.0
if !isWithinMargin(perCPUUserTimeSum, cpuTotal[0].User, margin) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use assert.InEpsilon (or assert.InDelta) instead.

@Lomanic
Copy link
Collaborator

Lomanic commented Nov 20, 2018

Thanks for this great PR and your explanations, I will test it in the coming days. 👍

@marcospedreiro
Copy link
Contributor Author

marcospedreiro commented Nov 20, 2018

Updated the test suite based on your comment (good call out I forgot about that!)

Thanks! Let me know what you find. I'm not 100% sure yet as to why dividing by 10,000,000 gives the desired values because I haven't been able to identify what the units of win32_perfrawdata_counters_processorinformation are in.

Looking at win32_perfrawdata_counters_processorinformation percentusertime_properties I see that the counter type is CounterType | 542180608. Searching for that I see that the number 542180608 corresponds to PERF_100NSEC_TIMER which has documentation here:

  • Description: This counter type shows the active time of a component as a percentage of the total elapsed time of the sample interval. It measures time in units of 100ns. Counters of this type are designed to measure the activity of one component at a time.
  • Generic type: Percentage
  • Formula: (N1 - N0) / (D1 - D0) x 100 where the denominator (D) represents the total elapsed time of the sample interval and the numerator (N) represents the portions of the sample interval during which the monitored components were active.
  • Average: (Nx - N0) / (Dx - D0) x 100
  • Example: Processor\ % User Time

From the Microsoft documentation stating that there are 1e7 ticks/second I get the following:

1e7 ticks/second * 1 second / 1e9 nanoseconds = 1 Tick / 100 Nanoseconds

@Lomanic
Copy link
Collaborator

Lomanic commented Nov 21, 2018

Which Windows versions were these modifications tested on?

@marcospedreiro
Copy link
Contributor Author

  • Windows 10 Version 1803 (OS Build 17134.407)
  • Windows Server 2008 R2 Standard

@marcospedreiro
Copy link
Contributor Author

@Lomanic Just checking in, were you able to test these changes? Let me know if there are any modifications you want me to make.

@Lomanic
Copy link
Collaborator

Lomanic commented Nov 27, 2018

Didn't have time to do that during the weekend sorry. Probably this one to come.

@Lomanic Lomanic merged commit eead265 into shirou:master Dec 9, 2018
@Lomanic
Copy link
Collaborator

Lomanic commented Dec 9, 2018

Thanks a lot for this great PR @marcospedreiro, appreciated!

notnoop pushed a commit to hashicorp/nomad that referenced this pull request Jan 27, 2020
Latest gosutil includes two backward incompatible changes:

First, it removed unused Stolen field in
shirou/gopsutil@cae8efc#diff-d9747e2da342bdb995f6389533ad1a3d
.

Second, it updated the Windows cpu stats calculation to be inline with
other platforms, where it returns absolate stats rather than
percentages.  See shirou/gopsutil#611.
notnoop pushed a commit to hashicorp/nomad that referenced this pull request Feb 13, 2020
Latest gosutil includes two backward incompatible changes:

First, it removed unused Stolen field in
shirou/gopsutil@cae8efc#diff-d9747e2da342bdb995f6389533ad1a3d
.

Second, it updated the Windows cpu stats calculation to be inline with
other platforms, where it returns absolate stats rather than
percentages.  See shirou/gopsutil#611.
greut pushed a commit to greut/nomad that referenced this pull request Mar 15, 2020
Latest gosutil includes two backward incompatible changes:

First, it removed unused Stolen field in
shirou/gopsutil@cae8efc#diff-d9747e2da342bdb995f6389533ad1a3d
.

Second, it updated the Windows cpu stats calculation to be inline with
other platforms, where it returns absolate stats rather than
percentages.  See shirou/gopsutil#611.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants