Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add sys/class/net parsing from procfs and expose its metrics #851

Merged
merged 10 commits into from
Jul 16, 2018

Conversation

klatys
Copy link
Contributor

@klatys klatys commented Mar 8, 2018

related to #300
Hi @SuperQ , I just implemented the mentioned funcionality from procfs into node_exporter, but have several questions:

  • I'm not sure where did we land on the reflect/noreflect discussion with @grobie . Here it's used again, but I can write it into static calls, if you wish so
  • Since the class/net is a linux feature, how to handle other collector files? I came up with several solutions, but none of them seems quite right:
    • check arch in common and call the function then (though I think the compiler would see through that)
    • add empty function in other platforms's files (e.g. netdev_bsd.go)
    • collect the data in getNetDevStats
    • introduce separate collector (this one seems like the best choice, but not sure whether it is overkill?)
  • What naming should be used for metrics? Atm filenames are used, but then again the naming is based on sys/class/net structure, which might be different for other platforms (if there were any).
    • Also speed is in bits (but per second), so not sure whether any suffix is correct here
  • I had to update vendor for procfs, mostly it's just licence headers, but since it's WIP, I'm not sure whether this is a correct way.

@SuperQ
Copy link
Member

SuperQ commented Apr 12, 2018

Ping, this needs some cleanup.

@klatys
Copy link
Contributor Author

klatys commented Apr 13, 2018

@SuperQ Absolutely, though I'd be glad for pointing me in the right direction in this point mostly

Since the class/net is a linux feature, how to handle other collector files? I came up with several solutions, but none of them seems quite right

@discordianfish
Copy link
Member

  • yes, better avoid reflect

  • I think a separate collector, maybe netclass, would make sense

  • Naming: Good question.. In general we either expose raw values by just iterating over lines/directories so we don't need to maintain a mapping which might change between kernel versions etc. Or, if we need to liste all names anyway, we would make sure they are following our best practices. I think the later one would apply here more. So make sure to follow https://prometheus.io/docs/practices/naming/
    Not really sure about the speed name.. I think speed_bits would work, since it somewhat implies that it's bits per second (since everything is suppose to in seconds). But maybe being explicit and call it speed_bits_per_second is better. @SuperQ preferences?

@brian-brazil
Copy link
Contributor

bytes_per_second would be better I think, as that's what the output of rate() on the bytes sent will be.

@klatys
Copy link
Contributor Author

klatys commented Apr 13, 2018

thanks for the answers guys! I'll get on it soon.

@SuperQ
Copy link
Member

SuperQ commented Apr 13, 2018

Yes to:

  • netclass collector
  • Either speed_bytes or bytes_per_second. We do want bytes so it's the same unit as node_network_receive_bytes_total.
  • Avoid reflect, at the cost of a little more maintenance / copy-paste.

@discordianfish
Copy link
Member

Makes sense, let's go with bytes

@poblahblahblah
Copy link

I'm interested in getting this into node_exporter. Is there anything I can do to move this along?

@klatys
Copy link
Contributor Author

klatys commented May 23, 2018

Yes, currently this is waiting for prometheus/procfs#87 can we move on with this guys?

@@ -73,5 +73,6 @@ func (c *netDevCollector) Update(ch chan<- prometheus.Metric) error {
ch <- prometheus.MustNewConstMetric(desc, prometheus.CounterValue, v, dev)
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems unrelated.

@@ -0,0 +1,169 @@
// Copyright 2015 The Prometheus Authors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a new file, let's make this 2018.

@SuperQ
Copy link
Member

SuperQ commented May 30, 2018

Let's add some fixtures to this, and add netclass to the end-to-end test.

@SuperQ
Copy link
Member

SuperQ commented May 30, 2018

The pushMetric() function needs to have the option to push the type. Maybe add a param for prometheus.ValueType and pass prometheus.GaugeValue or prometheus.CounterValue as appropriate.

Also, it looks like this PR needs to be rebased against master.

@klatys
Copy link
Contributor Author

klatys commented May 30, 2018

@SuperQ I guess your point is to the
node_network_carrier_changes_total non-counter metrics should not have "_total" suffix
fail in tests which I think is rather bad metric name (although the file name is literally carrier_changes_total).

Every metric is changed with each restart of the machine, so counter type is imo wrong here (that's why gauge is used).

So I suppose using sth like node_network_carrier_changes , ..carrier_up_changes , carrier_down_changes should be enough?

@SuperQ
Copy link
Member

SuperQ commented May 30, 2018

Counters being reset to 0 on machine restart is completely fine and normal from a Prometheus perspective. The system is designed to handle this automatically. So if these only go up for the lifetime of a device, and only reset to 0, they are counters.

@SuperQ
Copy link
Member

SuperQ commented May 30, 2018

Also, it looks like you need to reword your early commit with the DCO signoff.

klatys added 6 commits May 30, 2018 13:38
Signed-off-by: Jan Klat <jenik@klatys.cz>
…eparate collector, change metric naming

Signed-off-by: Jan Klat <jenik@klatys.cz>
Signed-off-by: Jan Klat <jenik@klatys.cz>
Signed-off-by: Jan Klat <jenik@klatys.cz>
Signed-off-by: Jan Klat <jenik@klatys.cz>
Signed-off-by: Jan Klat <jenik@klatys.cz>
@klatys
Copy link
Contributor Author

klatys commented May 30, 2018

Oh ok, I honestly got a different view from docs, especially

Counters should not be used to expose current counts of items whose number can also go down

Signed off, but the test is failing on changed values for network interfaces... Is there a way to omit this somehow?

-node_network_iface_id{interface="eth0"} 605
+node_network_iface_id{interface="eth0"} 1337
-node_network_iface_link{interface="eth0"} 606
+node_network_iface_link{interface="eth0"} 1338
-node_network_up{address="02:42:ac:13:00:03",broadcast="ff:ff:ff:ff:ff:ff",duplex="full",ifalias="",interface="eth0",operstate="up"} 1
+node_network_up{address="02:42:ac:18:00:03",broadcast="ff:ff:ff:ff:ff:ff",duplex="full",ifalias="",interface="eth0",operstate="up"} 1

@SuperQ
Copy link
Member

SuperQ commented May 30, 2018

I think the test diff is due to the fact that the fixtures do not include the fixture data, and you're getting leaked info from the docker container.

Signed-off-by: Jan Klat <jenik@klatys.cz>
@SuperQ
Copy link
Member

SuperQ commented May 30, 2018

It looks like the fixtures need an iface_id and iface_link file.

@klatys
Copy link
Contributor Author

klatys commented May 31, 2018

The fixtures are passed as a option(--path.sysfs=) thats what I was missing... While testing that I also discovered bug in procfs which is referenced above. Hopefuly after that fixtures will work correctly

@discordianfish
Copy link
Member

Tests are still failing though:

>> running end-to-end tests
./end-to-end-test.sh
--- collector/fixtures/e2e-output.txt	2018-05-30 13:45:44.663039377 +0000
+++ /tmp/node_exporter_e2e_test.cnTQB1/e2e-output.txt	2018-05-30 13:47:11.255262772 +0000
@@ -1690,11 +1690,11 @@
 node_network_flags{interface="lo"} 9
 # HELP node_network_iface_id iface_id value of /sys/class/net/<iface>.
 # TYPE node_network_iface_id gauge
-node_network_iface_id{interface="eth0"} 605
+node_network_iface_id{interface="eth0"} 9393
 node_network_iface_id{interface="lo"} 1
 # HELP node_network_iface_link iface_link value of /sys/class/net/<iface>.
 # TYPE node_network_iface_link gauge
-node_network_iface_link{interface="eth0"} 606
+node_network_iface_link{interface="eth0"} 9394
 node_network_iface_link{interface="lo"} 1
 # HELP node_network_iface_link_mode iface_link_mode value of /sys/class/net/<iface>.
 # TYPE node_network_iface_link_mode gauge
LOG =====================

Signed-off-by: Jan Klat <jenik@klatys.cz>
)

var (
netclassIgnoredDevices = kingpin.Flag("collector.netclass.ignored-devices", "Regexp of net devices to ignore for netclass collector.").Default("^$").String()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to include some interfaces in this default list? Maybe lo and/or docker.*?

Signed-off-by: Jan Klat <jenik@klatys.cz>
@klatys
Copy link
Contributor Author

klatys commented May 31, 2018

Can someone please let me know what is failing in buildkite? I don't have access to that org. Thanks!

@SuperQ
Copy link
Member

SuperQ commented May 31, 2018

You probably need to update the collector/fixtures/e2e-64k-page-output.txt to match the other fixture update.

Signed-off-by: Jan Klat <jenik@klatys.cz>
@klatys
Copy link
Contributor Author

klatys commented May 31, 2018

Thanks! Should I set the default interface list as well or do you want approve from other Members?

Copy link
Member

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@klatys
Copy link
Contributor Author

klatys commented Jul 11, 2018

ping @discordianfish I'd be happy to have these in node exporter, the sooner the better :) thanks a lot!

Copy link
Member

@discordianfish discordianfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@discordianfish
Copy link
Member

Sorry, this slipped through. Merging now!

@discordianfish discordianfish merged commit c4102f1 into prometheus:master Jul 16, 2018
@klatys
Copy link
Contributor Author

klatys commented Jul 16, 2018

Thanks guys for all the help. Appreciate it!

# HELP node_network_carrier carrier value of /sys/class/net/<iface>.
# TYPE node_network_carrier gauge
node_network_carrier{interface="eth0"} 1
# HELP node_network_carrier_changes_total carrier_changes_total value of /sys/class/net/<iface>.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

carrier_changes_total value of -> carrier_changes value of please fix it. And others in this PR

oblitorum pushed a commit to shatteredsilicon/node_exporter that referenced this pull request Apr 9, 2024
…eus#851)

* add sys/class/net parsing from procfs and expose its metrics

Signed-off-by: Jan Klat <jenik@klatys.cz>

* change code to use int pointers per procfs change, move netclass to separate collector, change metric naming

Signed-off-by: Jan Klat <jenik@klatys.cz>

* bump year in licence, remove redundant newline, correct fixtures

Signed-off-by: Jan Klat <jenik@klatys.cz>

* fix style

Signed-off-by: Jan Klat <jenik@klatys.cz>

* change carrier changes to counter type

Signed-off-by: Jan Klat <jenik@klatys.cz>

* fix e2e output

Signed-off-by: Jan Klat <jenik@klatys.cz>

* add fixtures

Signed-off-by: Jan Klat <jenik@klatys.cz>

* update vendor, use fixtures correctly

Signed-off-by: Jan Klat <jenik@klatys.cz>

* change fixtures (device in /sys/class/net should be symlinked)

Signed-off-by: Jan Klat <jenik@klatys.cz>

* correct fixtures for 64k page, updated readme

Signed-off-by: Jan Klat <jenik@klatys.cz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants