Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement process collector for Windows #596

Merged
merged 3 commits into from
Jun 14, 2019

Conversation

carlpett
Copy link
Member

@carlpett carlpett commented Jun 6, 2019

This PR implements a process collector for Windows, resolves #376.

Some discussion points:

  • The process_cpu_seconds_total uses an underlying data source that is, from my understanding, incremented with a resolution of ~16 ms. There might be better APIs to use? The ones I've found work with cycles, though, which isn't that easy to convert to seconds.
  • There aren't really "file descriptors" on Windows. I did an interpretation of it to be handles (which covers lots of different things in addition to files) instead. And there is a hard-coded max of 16M handles per process.
  • I'm not fully sure I did the mapping for Linux memory concepts to Windows correctly. Would love a sanity check there.

Signed-off-by: Calle Pettersson <carlpett@users.noreply.github.com>
@carlpett
Copy link
Member Author

carlpett commented Jun 7, 2019

ping @beorn7, what do you think? (Apart from my apparently having missed something with go modules, I'll fix that)

Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very cool at a first glance. I'll have a closer look ASAP.

Also, it would make sense to have somebody with some Windows foo looking at this. (Anybody?)

@beorn7
Copy link
Member

beorn7 commented Jun 11, 2019

The test failures are because you haven't updated the go.mod file. (Let me know if you need help with the Go modules part.)

}
ch <- MustNewConstMetric(c.vsize, GaugeValue, float64(mem.WorkingSetSize))
ch <- MustNewConstMetric(c.maxVsize, GaugeValue, float64(mem.PeakWorkingSetSize))
ch <- MustNewConstMetric(c.rss, GaugeValue, float64(mem.PrivateUsage))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think RSS is WorkingSetSize

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the Windows terminology is kind of weird. WSS is something else than RSS, but apparently, when Windows says "WSS", it means "RSS".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that seems correct.

The "working set" of a process is the set of memory pages currently visible to the process in physical RAM memory. These pages are resident and available for an application to use without triggering a page fault

Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments.

This looks good in general, we just need to figure out the memory nomenclature of MS Windows.

prometheus/process_collector_other.go Outdated Show resolved Hide resolved
prometheus/process_collector_windows.go Show resolved Hide resolved
return
}
ch <- MustNewConstMetric(c.vsize, GaugeValue, float64(mem.WorkingSetSize))
ch <- MustNewConstMetric(c.maxVsize, GaugeValue, float64(mem.PeakWorkingSetSize))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxVsize is the maximum possible virtual memory size, not the observed peak. Not sure if there is a way to get this from somewhere on MS Windows. Perhaps it has to be hardcoded depending on whether it's 64bit or 32bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems it might depend on OS version as well. I found this source https://blogs.technet.microsoft.com/markrussinovich/2008/11/17/pushing-the-limits-of-windows-virtual-memory/ where they claim 8TB on 64 bit. They have a reference table which goes to Windows 2012. On my Windows lab host, I could reserve 128TB, though.

I'll see if this is available through some API, but otherwise I don't know. Not super keen on the idea of maintaining a lookup table which is hard to know when it needs to be updated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, on Linux it seems we represent unlimited with -1. Might make sense here too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could not export it. I doubt there's many 32bit systems out there that would care (and it's a hardcoded value if they need it).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's also an option, of course. But 32-bit isn't actually something we can hardcode, if I understand it correctly. The value will depend on if the OS is 32 bit or not, and if booted with a 2/2 GB or 3/1 GB split between OS and application. On a 64 bit OS, it depends on if the IMAGE_FILE_LARGE_ADDRESS_AWARE flag is set on the binary by the go compiler, which seems to be the case: https://github.com/golang/go/blob/e883d000f4ce0c47711c3a7c59df8bb2f0ec557f/src/cmd/link/internal/ld/pe.go#L785-L788

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But again, it is not clear if those hoops are worth jumping through.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In doubt, let's just not export this metric, as Brian suggested.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. In that case, should we drop the "max fds" too? That value is pretty uninteresting (although "correct"), and I added it to try to keep parity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's keep it as it is the technically correct value and doesn't imply any maintenance overhead to keep it correct.

}
ch <- MustNewConstMetric(c.vsize, GaugeValue, float64(mem.WorkingSetSize))
ch <- MustNewConstMetric(c.maxVsize, GaugeValue, float64(mem.PeakWorkingSetSize))
ch <- MustNewConstMetric(c.rss, GaugeValue, float64(mem.PrivateUsage))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the Windows terminology is kind of weird. WSS is something else than RSS, but apparently, when Windows says "WSS", it means "RSS".

c.reportError(ch, nil, err)
return
}
ch <- MustNewConstMetric(c.vsize, GaugeValue, float64(mem.WorkingSetSize))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concluding from the comments below, the WorkingSetSize is more like the RSS. I have no clue how to get something like the vsize on Windows.

prometheus/process_collector_windows.go Outdated Show resolved Hide resolved
@beorn7
Copy link
Member

beorn7 commented Jun 11, 2019

Wild guess: For vsize, we have to check PagefileUsage. If it is zero, we have to use PrivateUsage.

But it would be really good if somebody with a good understanding of Windows's memory management could help out.

Signed-off-by: Calle Pettersson <carlpett@users.noreply.github.com>
@carlpett
Copy link
Member Author

@beorn7 From my understanding of PROCESS_MEMORY_COUNTERS_EX, PrivateUsage and PagefileUsage are the same, but the latter is deprecated and always zero (as some sort of backwards compatibility with the non-EX PROCESS_MEMORY_COUNTERS, I suppose)?

@beorn7
Copy link
Member

beorn7 commented Jun 11, 2019

It seemed to me that PagefileUsage could be non-zero on older Windows version (which then would not have PrivateUsage?). The MS documentation is not really clear to me. Feels like a blast from the past...

@carlpett
Copy link
Member Author

Yeah, it is weirdly formulated. My interpretation is that PagefileUsage=PrivateUsage, except on older versions where PagefileUsage=0 (but PrivateUsage is still set). But I may be wrong.

@beorn7
Copy link
Member

beorn7 commented Jun 13, 2019

In lack of better information, let's do it as discussed. I think the only remaining change is to remove the c.maxVsize metric. Then we can merge this.

@carlpett carlpett changed the title WIP: Implement process collector for Windows Implement process collector for Windows Jun 14, 2019
Signed-off-by: Calle Pettersson <calle@cape.nu>
@carlpett
Copy link
Member Author

Done!

@beorn7 beorn7 merged commit c5f4190 into prometheus:master Jun 14, 2019
@beorn7
Copy link
Member

beorn7 commented Jun 14, 2019

Thanks 1M!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make process_… metrics work on MS Windows
3 participants