-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added conntrack statistics metrics #1155
Conversation
7340068
to
0b84a1b
Compare
Signed-off-by: Aleksandr Kozlov <avlkozlov@avito.ru>
Drive-by comment: see torvalds/linux@8e8118f for more info on searched, new, delete and delete_list. These are removed in v4.9 onwards, and they don't provide much value to the user. They're 32-bit counters and will wrap often. Might not be very useful to collect, but it's up to you of course. :) |
Nice if this can be done without extra permissions! Interesting though, when I was looking into this, I got the impression that this file wasn't in use any more (based on comments in the conntrack utilities source). Do you know what is required for it to exist? |
Great! But the parsing should be added to https://github.com/prometheus/procfs and then used here. |
@carlpett Good point, seems netdata went through the same exercise: netdata/netdata#161 is worth reading through. (netdata/netdata#161 (comment) in particular) TL;DR: |
@ti-mo Yes, conntrack is loaded, and |
Yes, I will add procfs parsing in https://github.com/prometheus/procfs and remove unused counters as described in torvalds/linux@8e8118f. |
@carlpett That's bad news, that means this won't work on newer Ubuntu out of the box. :/ |
Yes, given that this is what most cloud-providers have as their default image, such as managed Kubernetes services. Still, better than not having these metrics at all, or having to run as root. |
Then should we maybe emulate ctnetlink's behaviour in trying netlink (in case it's started as root or has
@discordianfish WDYT? |
I think only using procfs should be sufficient, right? But not feeling strong either way. @SuperQ wdyt? |
@discordianfish That would be sufficient if procfs support wasn't compiled out by default in a major distro's kernel. (Ubuntu) |
Seems really would be better to add netlink support, in case of missing conntrack procfs. I'll get to that soon. |
Should be possible to just grab the stuff from #1093 (possibly with some adaptations). It is enough with CAP_NET_ADMIN, as per the discussion there. |
@SuperQ Ping. As I suggested in #1093, I would relax the no-capacilites rule for the node-exporter:
If you're okay, I'd just send a PR to update the contribution docs. |
What does "better" mean in this case? I'd consider any capability like NET_ADMIN to be functionally equivalent to root, as for example you could use it to tap localhost communication. |
I agree with @brian-brazil, The list of granted permissions is slightly concerning:
@discordianfish However, I think your proposal to change the contribution docs is reasonable. If we're going to allow capabilities for some collectors, we should explicitly list which capabilities we want to allow. If we really can't avoid privileged netlink access, we should add |
One more thought. If we're going to get into the privileges game, we should make sure it's possible to startup, get the privileged access we need, and if possible drop back to un-privileged level. |
That in practice likely means starting as root, then switching to another user. That's something which is very hard to get right. |
@brian-brazil Yes, I did a bit of research, apparently privilege dropping is not really supported by Go. |
I am in favor of giving users a choice. Aside from documenting what we
accept for contributors, how can we make the tradeoffs clear for node
administrators?
…On Thu, Jan 3, 2019, 09:45 Ben Kochie ***@***.***> wrote:
@brian-brazil <https://github.com/brian-brazil> Yes, I did a bit of
research, apparently privilege dropping is not really supported by Go.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1155 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAICBtmAvYIwW-qwGMZSNZePpGDJOxq-ks5u_cM5gaJpZM4Yd05j>
.
|
It certainly is less than ideal but what options do we have? And since the node-exporter is not shelling out or doing unsafe parsing, the attach surface is limited. I also would leave the adding capabilities to the user. It should be possible with setcap, newer |
Personally, I don't see a problem with giving the user a choice to explicitly enable this collector when it requires privileges. There is a clear overview in the README which collectors are enabled and disabled by default, and the exporter simply sends data, it doesn't receive or otherwise parse any input, as @discordianfish mentioned. Then again, if the maintainers (understandably) don't want it, creating a Initially, I wasn't aware that node-exporter doesn't require root. I just looked into how Debian distributes node-exporter, and they do indeed run it as a non-privileged user. I would wager that many users that grab the binary from the release page and slap it onto their boxes don't take that precaution. 🙂 @discordianfish Ultimately, this is up to the maintainers to decide. I think most of the arguments before and against have been given in this thread. |
Any chance this PR being merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think we should just merge this now. Just a final minor change.
@kozl You should add |
Signed-off-by: Aleksandr Kozlov <avlkozlov@avito.ru>
Thanks! Finally, fixed tests on CI |
Can we merge this now?) |
I'm really looking forward to this feature, when will it be merged and release? |
I'm waiting for this feature as well and as I can see all work was done and someone should just approve this PR. @kozl could you please fix failed tests? |
I think we could merge this without the codespell fix, it's already fixed in master. One thing I just realized is that there is overlap with this and the proposed |
Guys, I don't quite understand what exactly I have to do now to make this PR accepted. It has been hanging here open for more than two years without much success. I don't think that's ok. I'm tired of making more and more changes. If you don't want to accept it, just decline it and we'll finish with it. |
Sorry for making this review take so long. I will try and take one pass over it and merge it. If we have to refactor, I'll take care of it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. And sorry again for all the back and forth.. @SuperQ wanna rebase this?
Just wanted to ping on this so it doesn't get overlooked. @SuperQ, @discordianfish is there any information on how this gets addressed in the near future? |
Ok, I fixed the merge conflict, but the github web editor erased the end of file newline. I'm just going to ignore this and fix it in master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
NOTE: Ignoring invalid network speed will be the default in 2.x NOTE: Filesystem collector flags have been renamed. `--collector.filesystem.ignored-mount-points` is now `--collector.filesystem.mount-points-exclude` and `--collector.filesystem.ignored-fs-types` is now `--collector.filesystem.fs-types-exclude`. The old flags will be removed in 2.x. * [CHANGE] Rename filesystem collector flags to match other collectors #2012 * [CHANGE] Make node_exporter print usage to STDOUT #2039 * [FEATURE] Add conntrack statistics metrics #1155 * [FEATURE] Add ethtool stats collector #1832 * [FEATURE] Add flag to ignore network speed if it is unknown #1989 * [FEATURE] Add tapestats collector for Linux #2044 * [ENHANCEMENT] Add ErrorLog plumbing to promhttp #1887 * [ENHANCEMENT] Add time zone offset metric #2060 * [BUGFIX] Add ErrorLog plumbing to promhttp #1887 * [BUGFIX] Handle errors from disabled PSI subsystem #1983 * [BUGFIX] Fix panic when using backwards compatible flags #2000 * [BUGFIX] Only initiate collectors once #2048 * [BUGFIX] Handle small backwards jumps in CPU idle #2067 Signed-off-by: Ben Kochie <superq@gmail.com>
NOTE: Ignoring invalid network speed will be the default in 2.x NOTE: Filesystem collector flags have been renamed. `--collector.filesystem.ignored-mount-points` is now `--collector.filesystem.mount-points-exclude` and `--collector.filesystem.ignored-fs-types` is now `--collector.filesystem.fs-types-exclude`. The old flags will be removed in 2.x. * [CHANGE] Rename filesystem collector flags to match other collectors #2012 * [CHANGE] Make node_exporter print usage to STDOUT #2039 * [FEATURE] Add conntrack statistics metrics #1155 * [FEATURE] Add ethtool stats collector #1832 * [FEATURE] Add flag to ignore network speed if it is unknown #1989 * [FEATURE] Add tapestats collector for Linux #2044 * [FEATURE] Add nvme collector #2062 * [ENHANCEMENT] Add ErrorLog plumbing to promhttp #1887 * [ENHANCEMENT] Add more Infiniband counters #2019 * [ENHANCEMENT] netclass: retrieve interface names and filter before parsing #2033 * [ENHANCEMENT] Add time zone offset metric #2060 * [BUGFIX] Handle errors from disabled PSI subsystem #1983 * [BUGFIX] Fix panic when using backwards compatible flags #2000 * [BUGFIX] Fix wrong value for OpenBSD memory buffer cache #2015 * [BUGFIX] Only initiate collectors once #2048 * [BUGFIX] Handle small backwards jumps in CPU idle #2067 Signed-off-by: Ben Kochie <superq@gmail.com>
v1.2.0 * tag 'v1.2.0': (50 commits) Release 1.2.0 Fix conntrack collector log noise Add tapestats to collect tape devices statistics Update common Prometheus files Handle small backwards jumps in CPU idle Add more IB counters mod: update procfs dependency to v0.7.0 Add nvme collector Use new client_golang collectors package. Update go-kstat location Update Go modules Add time zone offset metric netclass: retrieve interface names and filter before parsing Fix Eof newline in collector/conntrack_linux.go Added conntrack statistics metrics (prometheus#1155) Fix build Update collector/ethtool_linux.go Update logic Only iniate collectors once Add ErrorLog plumbing to promhttp ...
return err | ||
} | ||
|
||
ch <- prometheus.MustNewConstMetric( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all in fact counters, not gauges.
* Added conntrack statistics metrics Signed-off-by: Aleksandr Kozlov <avlkozlov@avito.ru> Co-authored-by: Aleksandr Kozlov <avlkozlov@avito.ru> Co-authored-by: Ben Kochie <superq@gmail.com>
NOTE: Ignoring invalid network speed will be the default in 2.x NOTE: Filesystem collector flags have been renamed. `--collector.filesystem.ignored-mount-points` is now `--collector.filesystem.mount-points-exclude` and `--collector.filesystem.ignored-fs-types` is now `--collector.filesystem.fs-types-exclude`. The old flags will be removed in 2.x. * [CHANGE] Rename filesystem collector flags to match other collectors prometheus#2012 * [CHANGE] Make node_exporter print usage to STDOUT prometheus#2039 * [FEATURE] Add conntrack statistics metrics prometheus#1155 * [FEATURE] Add ethtool stats collector prometheus#1832 * [FEATURE] Add flag to ignore network speed if it is unknown prometheus#1989 * [FEATURE] Add tapestats collector for Linux prometheus#2044 * [FEATURE] Add nvme collector prometheus#2062 * [ENHANCEMENT] Add ErrorLog plumbing to promhttp prometheus#1887 * [ENHANCEMENT] Add more Infiniband counters prometheus#2019 * [ENHANCEMENT] netclass: retrieve interface names and filter before parsing prometheus#2033 * [ENHANCEMENT] Add time zone offset metric prometheus#2060 * [BUGFIX] Handle errors from disabled PSI subsystem prometheus#1983 * [BUGFIX] Fix panic when using backwards compatible flags prometheus#2000 * [BUGFIX] Fix wrong value for OpenBSD memory buffer cache prometheus#2015 * [BUGFIX] Only initiate collectors once prometheus#2048 * [BUGFIX] Handle small backwards jumps in CPU idle prometheus#2067 Signed-off-by: Ben Kochie <superq@gmail.com>
* Added conntrack statistics metrics Signed-off-by: Aleksandr Kozlov <avlkozlov@avito.ru> Co-authored-by: Aleksandr Kozlov <avlkozlov@avito.ru> Co-authored-by: Ben Kochie <superq@gmail.com>
NOTE: Ignoring invalid network speed will be the default in 2.x NOTE: Filesystem collector flags have been renamed. `--collector.filesystem.ignored-mount-points` is now `--collector.filesystem.mount-points-exclude` and `--collector.filesystem.ignored-fs-types` is now `--collector.filesystem.fs-types-exclude`. The old flags will be removed in 2.x. * [CHANGE] Rename filesystem collector flags to match other collectors prometheus#2012 * [CHANGE] Make node_exporter print usage to STDOUT prometheus#2039 * [FEATURE] Add conntrack statistics metrics prometheus#1155 * [FEATURE] Add ethtool stats collector prometheus#1832 * [FEATURE] Add flag to ignore network speed if it is unknown prometheus#1989 * [FEATURE] Add tapestats collector for Linux prometheus#2044 * [FEATURE] Add nvme collector prometheus#2062 * [ENHANCEMENT] Add ErrorLog plumbing to promhttp prometheus#1887 * [ENHANCEMENT] Add more Infiniband counters prometheus#2019 * [ENHANCEMENT] netclass: retrieve interface names and filter before parsing prometheus#2033 * [ENHANCEMENT] Add time zone offset metric prometheus#2060 * [BUGFIX] Handle errors from disabled PSI subsystem prometheus#1983 * [BUGFIX] Fix panic when using backwards compatible flags prometheus#2000 * [BUGFIX] Fix wrong value for OpenBSD memory buffer cache prometheus#2015 * [BUGFIX] Only initiate collectors once prometheus#2048 * [BUGFIX] Handle small backwards jumps in CPU idle prometheus#2067 Signed-off-by: Ben Kochie <superq@gmail.com>
I've added conntrack statistics in netfilter collector. It collects data from /proc/net/stat/nf_conntrack, so it doesn't require any special permissions unlike #1093
We are using this patched version in production and it works fine.