Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added conntrack statistics metrics #1155

Merged
merged 8 commits into from
Jun 23, 2021
Merged

Added conntrack statistics metrics #1155

merged 8 commits into from
Jun 23, 2021

Conversation

kozl
Copy link
Contributor

@kozl kozl commented Nov 14, 2018

I've added conntrack statistics in netfilter collector. It collects data from /proc/net/stat/nf_conntrack, so it doesn't require any special permissions unlike #1093
We are using this patched version in production and it works fine.

@kozl kozl force-pushed the master branch 3 times, most recently from 7340068 to 0b84a1b Compare November 14, 2018 14:26
Signed-off-by: Aleksandr Kozlov <avlkozlov@avito.ru>
@ti-mo
Copy link

ti-mo commented Nov 14, 2018

Drive-by comment: see torvalds/linux@8e8118f for more info on searched, new, delete and delete_list. These are removed in v4.9 onwards, and they don't provide much value to the user. They're 32-bit counters and will wrap often. Might not be very useful to collect, but it's up to you of course. :)

@carlpett
Copy link
Member

Nice if this can be done without extra permissions! Interesting though, when I was looking into this, I got the impression that this file wasn't in use any more (based on comments in the conntrack utilities source). Do you know what is required for it to exist?
I managed to find one of the systems I use that has it, but the vast majority do not.

@discordianfish
Copy link
Member

Great! But the parsing should be added to https://github.com/prometheus/procfs and then used here.

@ti-mo
Copy link

ti-mo commented Nov 15, 2018

@carlpett Good point, seems netdata went through the same exercise: netdata/netdata#161 is worth reading through. (netdata/netdata#161 (comment) in particular)

TL;DR: /proc/net/nf_conntrack should be avoided as it's the equivalent of conntrack -L (and requires elevated permissions to access), but /proc/net/stat/nf_conntrack is fine and can be accessed by a normal user. Both files are still present on my 4.18 machine. Are you sure the Conntrack kernel module is loaded on the machine(s) where the file is missing?

@carlpett
Copy link
Member

@ti-mo Yes, conntrack is loaded, and conntrack -S works well. They are ubuntu-based and running a 4.15 kernel.
Checking the kernel config, CONFIG_NF_CONNTRACK_PROCFS is unset, so it appears to default to disabled.
The machine where it works (my laptop) has a 4.17 Fedora kernel, where CONFIG_NF_CONNTRACK_PROCFS is explicitly enabled.

@kozl
Copy link
Contributor Author

kozl commented Nov 15, 2018

Nice if this can be done without extra permissions! Interesting though, when I was looking into this, I got the impression that this file wasn't in use any more (based on comments in the conntrack utilities source). Do you know what is required for it to exist?
I managed to find one of the systems I use that has it, but the vast majority do not.

Yes, conntrack itself uses netlink, But there is a fallback to procfs if netlink isn't available.

I will add procfs parsing in https://github.com/prometheus/procfs and remove unused counters as described in torvalds/linux@8e8118f.

@ti-mo
Copy link

ti-mo commented Nov 15, 2018

@carlpett That's bad news, that means this won't work on newer Ubuntu out of the box. :/

@carlpett
Copy link
Member

carlpett commented Nov 15, 2018

That's bad news, that means this won't work on newer Ubuntu out of the box. :/

Yes, given that this is what most cloud-providers have as their default image, such as managed Kubernetes services. Still, better than not having these metrics at all, or having to run as root.

@ti-mo
Copy link

ti-mo commented Nov 15, 2018

Then should we maybe emulate ctnetlink's behaviour in trying netlink (in case it's started as root or has NET_ADMIN), then procfs and ultimately throwing a warning if neither are available? I see the current behaviour is documented as:

does nothing if no /proc/sys/net/netfilter/ present

@discordianfish WDYT?

@discordianfish
Copy link
Member

I think only using procfs should be sufficient, right? But not feeling strong either way. @SuperQ wdyt?

@ti-mo
Copy link

ti-mo commented Nov 19, 2018

@discordianfish That would be sufficient if procfs support wasn't compiled out by default in a major distro's kernel. (Ubuntu)

@kozl
Copy link
Contributor Author

kozl commented Dec 11, 2018

Seems really would be better to add netlink support, in case of missing conntrack procfs. I'll get to that soon.

@discordianfish
Copy link
Member

@kozl I'd be okay with having functionality that requires elevated permissions as long as they aren't full root but need to check with @SuperQ before you spend time on this and we can't get it merge anyway.

@carlpett
Copy link
Member

Should be possible to just grab the stuff from #1093 (possibly with some adaptations). It is enough with CAP_NET_ADMIN, as per the discussion there.

@discordianfish
Copy link
Member

@SuperQ Ping. As I suggested in #1093, I would relax the no-capacilites rule for the node-exporter:

  • Still require non-root
  • Allow requiring capability in collectors not enabled by default if:
    1.) It closely fits the node-exporter use case, e.g it's kernel stats
    2.) There is no better way to retrieve them (a cronjob running as root is not a better way)

If you're okay, I'd just send a PR to update the contribution docs.

@brian-brazil
Copy link
Contributor

There is no better way to retrieve them (a cronjob running as root is not a better way)

What does "better" mean in this case?

I'd consider any capability like NET_ADMIN to be functionally equivalent to root, as for example you could use it to tap localhost communication.

@SuperQ
Copy link
Member

SuperQ commented Jan 2, 2019

I agree with @brian-brazil, CAP_NET_ADMIN is somewhat dangerous. Maybe not root, but still pretty strong.

The list of granted permissions is slightly concerning:

Perform various network-related operations:

  • interface configuration;
  • administration of IP firewall, masquerading, and accounting;
  • modify routing tables;
  • bind to any address for transparent proxying;
  • set type-of-service (TOS)
  • clear driver statistics;
  • set promiscuous mode;
  • enabling multicasting;
  • use setsockopt(2) to set the following socket options:
    SO_DEBUG, SO_MARK, SO_PRIORITY (for a priority outside the
    range 0 to 6), SO_RCVBUFFORCE, and SO_SNDBUFFORCE.

@discordianfish However, I think your proposal to change the contribution docs is reasonable. If we're going to allow capabilities for some collectors, we should explicitly list which capabilities we want to allow.

If we really can't avoid privileged netlink access, we should add CAP_NET_ADMIN to the "OK" list.

@SuperQ
Copy link
Member

SuperQ commented Jan 2, 2019

One more thought. If we're going to get into the privileges game, we should make sure it's possible to startup, get the privileged access we need, and if possible drop back to un-privileged level.

@brian-brazil
Copy link
Contributor

we should make sure it's possible to startup, get the privileged access we need, and if possible drop back to un-privileged level.

That in practice likely means starting as root, then switching to another user. That's something which is very hard to get right.

@SuperQ
Copy link
Member

SuperQ commented Jan 3, 2019

@brian-brazil Yes, I did a bit of research, apparently privilege dropping is not really supported by Go.

@matthiasr
Copy link
Contributor

matthiasr commented Jan 3, 2019 via email

@discordianfish
Copy link
Member

It certainly is less than ideal but what options do we have? And since the node-exporter is not shelling out or doing unsafe parsing, the attach surface is limited.

I also would leave the adding capabilities to the user. It should be possible with setcap, newer capsh versions and on Docker it should be just docker run --cap-add .. (needs to be confirmed).

@ti-mo
Copy link

ti-mo commented Jan 7, 2019

Personally, I don't see a problem with giving the user a choice to explicitly enable this collector when it requires privileges. There is a clear overview in the README which collectors are enabled and disabled by default, and the exporter simply sends data, it doesn't receive or otherwise parse any input, as @discordianfish mentioned.

Then again, if the maintainers (understandably) don't want it, creating a conntrack-exporter project that serves up these statistics would be trivial. It's just a shame that node-exporter users won't get this out of the box.

Initially, I wasn't aware that node-exporter doesn't require root. I just looked into how Debian distributes node-exporter, and they do indeed run it as a non-privileged user. I would wager that many users that grab the binary from the release page and slap it onto their boxes don't take that precaution. 🙂

@discordianfish --cap-add net_admin is indeed the way to do this for a Dockerized exporter.
Edit: for a systemd unit: AmbientCapabilities=CAP_NET_ADMIN can be added to a unit file. capsh requires the partition the binary resides on to be mounted without nosuid, which can sometimes not be the case (homedir partitions, etc.).

Ultimately, this is up to the maintainers to decide. I think most of the arguments before and against have been given in this thread.

@kozl
Copy link
Contributor Author

kozl commented May 27, 2019

Any chance this PR being merged?

Copy link
Member

@discordianfish discordianfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think we should just merge this now. Just a final minor change.

collector/conntrack_linux.go Outdated Show resolved Hide resolved
@liuxu623
Copy link

liuxu623 commented Jun 9, 2020

@kozl You should add node_nf_conntrack_stat_* to collector/fixtures/e2e-output.txt.

Signed-off-by: Aleksandr Kozlov <avlkozlov@avito.ru>
@kozl
Copy link
Contributor Author

kozl commented Jun 9, 2020

Thanks! Finally, fixed tests on CI

@kozl
Copy link
Contributor Author

kozl commented Jun 11, 2020

Can we merge this now?)

@delulu
Copy link

delulu commented Jul 21, 2020

I'm really looking forward to this feature, when will it be merged and release?

@vykulakov
Copy link

I'm waiting for this feature as well and as I can see all work was done and someone should just approve this PR. @kozl could you please fix failed tests?

@SuperQ
Copy link
Member

SuperQ commented Jul 21, 2020

I think we could merge this without the codespell fix, it's already fixed in master.

One thing I just realized is that there is overlap with this and the proposed lnstat collector. See prometheus/procfs#316 and #1771.

@kozl
Copy link
Contributor Author

kozl commented Dec 2, 2020

Guys, I don't quite understand what exactly I have to do now to make this PR accepted. It has been hanging here open for more than two years without much success. I don't think that's ok. I'm tired of making more and more changes. If you don't want to accept it, just decline it and we'll finish with it.

@SuperQ
Copy link
Member

SuperQ commented Dec 7, 2020

Sorry for making this review take so long. I will try and take one pass over it and merge it. If we have to refactor, I'll take care of it.

Copy link
Member

@discordianfish discordianfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. And sorry again for all the back and forth.. @SuperQ wanna rebase this?

@DiCanio
Copy link

DiCanio commented May 20, 2021

Just wanted to ping on this so it doesn't get overlooked.

@SuperQ, @discordianfish is there any information on how this gets addressed in the near future?

@discordianfish
Copy link
Member

@SuperQ From my POV this is/was ready to get merged. @SuperQ wanted to rebase/fix conflicts. But looks like he didn't get to it. If that's something you are willing to do, go for it!

@SuperQ
Copy link
Member

SuperQ commented Jun 23, 2021

Ok, I fixed the merge conflict, but the github web editor erased the end of file newline. I'm just going to ignore this and fix it in master.

Copy link
Member

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SuperQ SuperQ merged commit 02ee897 into prometheus:master Jun 23, 2021
SuperQ added a commit that referenced this pull request Jul 12, 2021
NOTE: Ignoring invalid network speed will be the default in 2.x
NOTE: Filesystem collector flags have been renamed. `--collector.filesystem.ignored-mount-points` is now `--collector.filesystem.mount-points-exclude` and `--collector.filesystem.ignored-fs-types` is now `--collector.filesystem.fs-types-exclude`. The old flags will be removed in 2.x.

* [CHANGE] Rename filesystem collector flags to match other collectors #2012
* [CHANGE] Make node_exporter print usage to STDOUT #2039
* [FEATURE] Add conntrack statistics metrics #1155
* [FEATURE] Add ethtool stats collector #1832
* [FEATURE] Add flag to ignore network speed if it is unknown #1989
* [FEATURE] Add tapestats collector for Linux #2044
* [ENHANCEMENT] Add ErrorLog plumbing to promhttp #1887
* [ENHANCEMENT] Add time zone offset metric #2060
* [BUGFIX] Add ErrorLog plumbing to promhttp #1887
* [BUGFIX] Handle errors from disabled PSI subsystem #1983
* [BUGFIX] Fix panic when using backwards compatible flags #2000
* [BUGFIX] Only initiate collectors once #2048
* [BUGFIX] Handle small backwards jumps in CPU idle #2067

Signed-off-by: Ben Kochie <superq@gmail.com>
@SuperQ SuperQ mentioned this pull request Jul 12, 2021
SuperQ added a commit that referenced this pull request Jul 15, 2021
NOTE: Ignoring invalid network speed will be the default in 2.x
NOTE: Filesystem collector flags have been renamed. `--collector.filesystem.ignored-mount-points` is now `--collector.filesystem.mount-points-exclude` and `--collector.filesystem.ignored-fs-types` is now `--collector.filesystem.fs-types-exclude`. The old flags will be removed in 2.x.

* [CHANGE] Rename filesystem collector flags to match other collectors #2012
* [CHANGE] Make node_exporter print usage to STDOUT #2039
* [FEATURE] Add conntrack statistics metrics #1155
* [FEATURE] Add ethtool stats collector #1832
* [FEATURE] Add flag to ignore network speed if it is unknown #1989
* [FEATURE] Add tapestats collector for Linux #2044
* [FEATURE] Add nvme collector #2062
* [ENHANCEMENT] Add ErrorLog plumbing to promhttp #1887
* [ENHANCEMENT] Add more Infiniband counters #2019
* [ENHANCEMENT] netclass: retrieve interface names and filter before parsing #2033
* [ENHANCEMENT] Add time zone offset metric #2060
* [BUGFIX] Handle errors from disabled PSI subsystem #1983
* [BUGFIX] Fix panic when using backwards compatible flags #2000
* [BUGFIX] Fix wrong value for OpenBSD memory buffer cache #2015
* [BUGFIX] Only initiate collectors once #2048
* [BUGFIX] Handle small backwards jumps in CPU idle #2067

Signed-off-by: Ben Kochie <superq@gmail.com>
paulfantom added a commit to paulfantom/node_exporter that referenced this pull request Jul 23, 2021
v1.2.0

* tag 'v1.2.0': (50 commits)
  Release 1.2.0
  Fix conntrack collector log noise
  Add tapestats to collect tape devices statistics
  Update common Prometheus files
  Handle small backwards jumps in CPU idle
  Add more IB counters
  mod: update procfs dependency to v0.7.0
  Add nvme collector
  Use new client_golang collectors package.
  Update go-kstat location
  Update Go modules
  Add time zone offset metric
  netclass: retrieve interface names and filter before parsing
  Fix Eof newline in collector/conntrack_linux.go
  Added conntrack statistics metrics (prometheus#1155)
  Fix build
  Update collector/ethtool_linux.go
  Update logic
  Only iniate collectors once
  Add ErrorLog plumbing to promhttp
  ...
return err
}

ch <- prometheus.MustNewConstMetric(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all in fact counters, not gauges.

oblitorum pushed a commit to shatteredsilicon/node_exporter that referenced this pull request Apr 9, 2024
* Added conntrack statistics metrics

Signed-off-by: Aleksandr Kozlov <avlkozlov@avito.ru>
Co-authored-by: Aleksandr Kozlov <avlkozlov@avito.ru>
Co-authored-by: Ben Kochie <superq@gmail.com>
oblitorum pushed a commit to shatteredsilicon/node_exporter that referenced this pull request Apr 9, 2024
NOTE: Ignoring invalid network speed will be the default in 2.x
NOTE: Filesystem collector flags have been renamed. `--collector.filesystem.ignored-mount-points` is now `--collector.filesystem.mount-points-exclude` and `--collector.filesystem.ignored-fs-types` is now `--collector.filesystem.fs-types-exclude`. The old flags will be removed in 2.x.

* [CHANGE] Rename filesystem collector flags to match other collectors prometheus#2012
* [CHANGE] Make node_exporter print usage to STDOUT prometheus#2039
* [FEATURE] Add conntrack statistics metrics prometheus#1155
* [FEATURE] Add ethtool stats collector prometheus#1832
* [FEATURE] Add flag to ignore network speed if it is unknown prometheus#1989
* [FEATURE] Add tapestats collector for Linux prometheus#2044
* [FEATURE] Add nvme collector prometheus#2062
* [ENHANCEMENT] Add ErrorLog plumbing to promhttp prometheus#1887
* [ENHANCEMENT] Add more Infiniband counters prometheus#2019
* [ENHANCEMENT] netclass: retrieve interface names and filter before parsing prometheus#2033
* [ENHANCEMENT] Add time zone offset metric prometheus#2060
* [BUGFIX] Handle errors from disabled PSI subsystem prometheus#1983
* [BUGFIX] Fix panic when using backwards compatible flags prometheus#2000
* [BUGFIX] Fix wrong value for OpenBSD memory buffer cache prometheus#2015
* [BUGFIX] Only initiate collectors once prometheus#2048
* [BUGFIX] Handle small backwards jumps in CPU idle prometheus#2067

Signed-off-by: Ben Kochie <superq@gmail.com>
oblitorum pushed a commit to shatteredsilicon/node_exporter that referenced this pull request Apr 9, 2024
* Added conntrack statistics metrics

Signed-off-by: Aleksandr Kozlov <avlkozlov@avito.ru>
Co-authored-by: Aleksandr Kozlov <avlkozlov@avito.ru>
Co-authored-by: Ben Kochie <superq@gmail.com>
oblitorum pushed a commit to shatteredsilicon/node_exporter that referenced this pull request Apr 9, 2024
NOTE: Ignoring invalid network speed will be the default in 2.x
NOTE: Filesystem collector flags have been renamed. `--collector.filesystem.ignored-mount-points` is now `--collector.filesystem.mount-points-exclude` and `--collector.filesystem.ignored-fs-types` is now `--collector.filesystem.fs-types-exclude`. The old flags will be removed in 2.x.

* [CHANGE] Rename filesystem collector flags to match other collectors prometheus#2012
* [CHANGE] Make node_exporter print usage to STDOUT prometheus#2039
* [FEATURE] Add conntrack statistics metrics prometheus#1155
* [FEATURE] Add ethtool stats collector prometheus#1832
* [FEATURE] Add flag to ignore network speed if it is unknown prometheus#1989
* [FEATURE] Add tapestats collector for Linux prometheus#2044
* [FEATURE] Add nvme collector prometheus#2062
* [ENHANCEMENT] Add ErrorLog plumbing to promhttp prometheus#1887
* [ENHANCEMENT] Add more Infiniband counters prometheus#2019
* [ENHANCEMENT] netclass: retrieve interface names and filter before parsing prometheus#2033
* [ENHANCEMENT] Add time zone offset metric prometheus#2060
* [BUGFIX] Handle errors from disabled PSI subsystem prometheus#1983
* [BUGFIX] Fix panic when using backwards compatible flags prometheus#2000
* [BUGFIX] Fix wrong value for OpenBSD memory buffer cache prometheus#2015
* [BUGFIX] Only initiate collectors once prometheus#2048
* [BUGFIX] Handle small backwards jumps in CPU idle prometheus#2067

Signed-off-by: Ben Kochie <superq@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.