Add the officially recommended metrics #25

SimenB · 2016-08-31T15:34:53Z

Hi there!

All Prometheus clients are recommended to export some basic metrics https://prometheus.io/docs/instrumenting/writing_clientlibs/#standard-and-runtime-collectors.

Descriptions stolen from the official Go client, https://github.com/prometheus/client_golang/blob/master/prometheus/process_collector.go.

This PR adds all of them, although most will only work on Linux.

No tests added in case you're not interested in the addition.

Example file for testing:

var prom = require('./');

prom.defaultMetrics([], 500);

setInterval(function () {
    console.log(prom.register.metrics());
}, 500);

Example Dockerfile for testing

FROM node:6

COPY . .

CMD ["node", "testfile.js"]

Example output of running app:

# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total gauge
process_cpu_seconds_total 33.73
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1472657378
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 30420000
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1174712000
# HELP process_heap_bytes Process heap size in bytes.
# TYPE process_heap_bytes gauge
process_heap_bytes 1131760000
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 13
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 198652

Could probably add some metrics regarding stuff like GC etc.

EDIT: In hindsight, I can see that I should just check process.platform for linux instead of try-catching... I'll make that change tomorrow along with GC metrics!

EDIT2: Here is the table with recommended metrics:

Metric name	Help string	Unit
`process_cpu_seconds_total`	Total user and system CPU time spent in seconds.	seconds
`process_open_fds`	Number of open file descriptors.	file descriptors
`process_max_fds`	Maximum number of open file descriptors.	file descriptors
`process_virtual_memory_bytes`	Virtual memory size in bytes.	bytes
`process_resident_memory_bytes`	Resident memory size in bytes.	bytes
`process_heap_bytes`	Process heap size in bytes.	bytes
`process_start_time_seconds`	Start time of the process since unix epoch in seconds.	seconds

siimon · 2016-08-31T23:38:29Z

Interesting, thanks!

I'm away for this week and half of next so won't have a chance to dive into the code. But all of your suggestions seems reasonable and should be implemented!

A few other thoughts,

Why have a defaultMetrics? Shouldn't we include them in the metrics function, with a setting if they should be included or not?
Is it possible to include GC metrics without adding a 3rd party dependency? If not I'm hesitant to add it
Maybe we should add a option object where you can disable any of the defaultMetrics?
Adding some documentation to the readme would be nice as well!

Did my thoughts make any sense?

SimenB · 2016-09-01T05:47:23Z

That'd mean there'd be no way to disabling actually getting the metrics, which is potentially a somewhat expensive operation (as it's I/O).
IDK. I remember there being added GC metrics to Node back in 2013 or something, but it seems to have been removed in later iterations... I'll dig around!
That I can add
Yup, I dropped this, along with tests, in case you weren't interested in this change 😄

One last things; at work we have our own wrapper around this library adding a few more metrics (event loop lag, OS load, process handles and process requests), that are all something node exposes. Would you be interested in adding those as well?

siimon · 2016-09-01T07:59:47Z

actually, when thinking about this point 1 and 3 are pretty much the same thing. Would it make any sense to add a configuration object into the metrics function where you can disable/enable what metrics you want? Then it's just a matter of selecting sensible defaults.

I like your ideas, we do the same thing. I don't see any downside to this, as long as we have a good api where it's easy to enable and disable any of them. Just keep in mind that I'm trying to keep the number of dependencies as low as possible!

SimenB · 2016-09-01T08:05:54Z

There is no "start measuring" right now, so disabling them can't be done until you first query for them, and that'd mean keeping some state around. So I still prefer having a "start polling default metrics", and this function can take options disabling them or not.

All of those extras are dependency free, just using node's own APIs 😄

I'm back at work now, so I'll spend some time on it.

siimon · 2016-09-01T08:13:11Z

True, sounds great! :)

SimenB · 2016-09-01T08:56:36Z

I opened a PR with Prometheus to document this a bit better btw. If you have some comments, that'd be appreciated!

prometheus/docs#529

SimenB · 2016-09-01T11:41:38Z

@siimon This is ready to go now 😄

SimenB · 2016-09-01T11:53:52Z

It's a bit hard to get the tests to pass when it depends on both node version and os. Ideas?

SimenB · 2016-09-01T14:22:19Z

@siimon Latest commit makes it pass, but I'm not happy with it... Ideas?

SimenB · 2016-09-01T15:23:33Z

I dug into GC, and it got pulled from Node here: nodejs/node#174

And open issue about adding same sort of data back is here: nodejs/node#4496

And this userland module seems to fit the bill. I can create a project exposing this along with registering it with prom-client, which can be linked in the readme here?

siimon · 2016-09-02T01:54:39Z

A couple of ideas,

Update the travis ci with more node versions, maybe one test per major version or something?
Not sure if node allows it, but the platform variable is probably mutable, so you could change it in your test, then put it back when your test is done. Not that pretty either but it works.

SimenB · 2016-09-02T06:51:57Z

Added separate PR for it, which I'll rebase upon (Test on all supported majors of node, and OSX #28)
Good idea! I've done something similar with process.version in one of my own projects

SimenB · 2016-09-04T08:13:54Z

I'll try to wrap up the tests at work tomorrow, but I came over prometheus/docs#506, and it touches upon the fact that default metrics should be registered by default, but able to be turned off.

Thoughts on that? IDK if it'd be considered a breaking change, but it's just a number anyways 😄

siimon · 2016-09-04T12:19:04Z

Yeah, the best thing would be if users of this lib doesn't have to do anything, and as you said I think we should do a major bump so we don't introduce any weird duplicate metrics for those who are doing this by themselves.

jsha · 2016-09-04T14:04:23Z

Thanks for working on this! I am planning to make use of it as soon as it lands.

I agree these metrics should be enabled by default, and agree that that's consistent with the Prometheus docs.

SimenB · 2016-09-05T12:14:05Z

@siimon Should be good to go now.

I suggest removing the lower case exports (client.counter etc.) in this major as well 😄

SimenB · 2016-09-05T13:48:36Z

test/defaultMetrics.js

+			value: 'my-bogus-platform'
+		});
+
+		if(cpuUsage) {


Have to support both node@<6.1 and node@>=6.1

SimenB · 2016-09-05T19:49:29Z

I'll create a separate PR for some other useful metrics, such as event loop lag and process handles/requests. Would be nice to include that as well.

@jsha @siimon Any ideas if memoryUsage could be used better? WRT what prometheus expects. I updated OP with the table of data they suggest.

jsha · 2016-09-05T23:44:44Z

I think process_heap_bytes is intended to be the total size of the heap, which I believe better matches process.memoryUsage().heapTotal. I think it would also be useful to report .heapUsed, which I don't think corresponds to any of the standard Prometheus metrics.

SimenB · 2016-09-06T07:32:31Z

I did them as they are because of the description in the original word document: https://docs.google.com/document/d/1Q0MXWdwp1mdXCzNRak6bW5LLVylVRXhdi7_21Sg15xQ/edit

jsha · 2016-09-06T17:45:55Z

The word document says:

Heap size: process_heap_bytes. ??? DATA as reported from ps, this comes from VmData in /proc/self/status ???

But in man ps I don't see anything about a DATA section. The closest thing I can see in procps-ng version 3.3.10 is:

drs         DRS       data resident set size, the amount of physical memory devoted to other than executable code.

If I run a node REPL, and run ps ao args,drs | grep node I get:

node                        917122

Or ~9MB. From the REPL itself I get:

> process.memoryUsage().heapTotal
10619424

Not sure why Node thinks it has more heap than ps does. It could be a distinction between resident size and non-resident. At any rate, I don't have strong opinions here. I'm happy to proceed with what you've got implemented for process_heap_bytes.

SimenB · 2016-09-06T18:01:01Z

Ok, thanks for the analysis 😄 Should be GTG then, shouldn't be any problem to iterate on this later if we figure out we've done it wrong

siimon · 2016-09-06T19:36:46Z

Great :)

One thing we talked about earlier was enabling this by default, if I understood you correctly, the prom docs mentioned they should be enabled by default?

If that is the case, my suggestion is to enable them right away, and then instead have an exported function that stops collecting if you want to do that in your service.

I won't have any problems having it like that if we make a major release which I think we should

SimenB · 2016-09-06T19:38:39Z

It does collect automatically as soon as you require index.js. And yup, it's recommended in the docs to collect automatically.

And I agree it should be major bump.

siimon · 2016-09-06T19:42:49Z

Bah, thanks, i'll do a test spin and stop writing dump comments then :)

siimon · 2016-09-06T20:30:09Z

I've no further comments, @jsha do you have any input before merging?

jsha · 2016-09-06T20:51:28Z

LGTM

siimon mentioned this pull request Sep 4, 2016

Default stats for memory and CPU usage #31

Closed

Set to Number param instead of float

e19ac0a

SimenB reviewed Sep 5, 2016
View reviewed changes

test/defaultMetrics.js

value: 'my-bogus-platform'

});

if(cpuUsage) {

Copy link

Collaborator Author

SimenB Sep 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to support both node@<6.1 and node@>=6.1

Add the officially recommended metrics

aeeef08

SimenB mentioned this pull request Sep 5, 2016

Extra metrics #34

Merged

siimon merged commit a74e5b3 into siimon:master Sep 7, 2016

SimenB deleted the standard-metrics branch September 7, 2016 09:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the officially recommended metrics #25

Add the officially recommended metrics #25

SimenB commented Aug 31, 2016 •

edited

Loading

siimon commented Aug 31, 2016

SimenB commented Sep 1, 2016

siimon commented Sep 1, 2016 •

edited

Loading

SimenB commented Sep 1, 2016

siimon commented Sep 1, 2016

SimenB commented Sep 1, 2016

SimenB commented Sep 1, 2016

SimenB commented Sep 1, 2016

SimenB commented Sep 1, 2016

SimenB commented Sep 1, 2016

siimon commented Sep 2, 2016

SimenB commented Sep 2, 2016

SimenB commented Sep 4, 2016

siimon commented Sep 4, 2016

jsha commented Sep 4, 2016

SimenB commented Sep 5, 2016 •

edited

Loading

SimenB Sep 5, 2016

SimenB commented Sep 5, 2016

jsha commented Sep 5, 2016

SimenB commented Sep 6, 2016

jsha commented Sep 6, 2016

SimenB commented Sep 6, 2016

siimon commented Sep 6, 2016

SimenB commented Sep 6, 2016 •

edited

Loading

siimon commented Sep 6, 2016

siimon commented Sep 6, 2016

jsha commented Sep 6, 2016

Add the officially recommended metrics #25

Add the officially recommended metrics #25

Conversation

SimenB commented Aug 31, 2016 • edited Loading

siimon commented Aug 31, 2016

SimenB commented Sep 1, 2016

siimon commented Sep 1, 2016 • edited Loading

SimenB commented Sep 1, 2016

siimon commented Sep 1, 2016

SimenB commented Sep 1, 2016

SimenB commented Sep 1, 2016

SimenB commented Sep 1, 2016

SimenB commented Sep 1, 2016

SimenB commented Sep 1, 2016

siimon commented Sep 2, 2016

SimenB commented Sep 2, 2016

SimenB commented Sep 4, 2016

siimon commented Sep 4, 2016

jsha commented Sep 4, 2016

SimenB commented Sep 5, 2016 • edited Loading

SimenB Sep 5, 2016

Choose a reason for hiding this comment

SimenB commented Sep 5, 2016

jsha commented Sep 5, 2016

SimenB commented Sep 6, 2016

jsha commented Sep 6, 2016

SimenB commented Sep 6, 2016

siimon commented Sep 6, 2016

SimenB commented Sep 6, 2016 • edited Loading

siimon commented Sep 6, 2016

siimon commented Sep 6, 2016

jsha commented Sep 6, 2016

SimenB commented Aug 31, 2016 •

edited

Loading

siimon commented Sep 1, 2016 •

edited

Loading

SimenB commented Sep 5, 2016 •

edited

Loading

SimenB commented Sep 6, 2016 •

edited

Loading