-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid node_network_speed_bytes value -125000 for disconnected or virtualized NICs #1967
Comments
Interesting. Do you happen to know if -1 documented in the kernel as a special value? Are there other special values? |
Good question, I haven't had much luck with the kernel source or documentation, the only thing I found was Documentation/ABI/testing/sysfs-class-net, which reads:
It does not state what the attribute could read for "invalid" interfaces, though. However, the if (tb[ETHTOOL_A_LINKMODES_SPEED]) {
uint32_t val = mnl_attr_get_u32(tb[ETHTOOL_A_LINKMODES_SPEED]);
print_banner(nlctx);
if (val == 0 || val == (uint16_t)(-1) || val == (uint32_t)(-1))
printf("\tSpeed: Unknown!\n");
else
printf("\tSpeed: %uMb/s\n", val);
} Since ethtool is closely related to the kernels network subsystem (and thus should know what could be returned) and treats anything else as valid speed value, I guess it would make sense to just copy its behavior:
For reference, here are the outputs of ethtool and node-exporter: laptop without cable plugged in
virtio-based VM interface, connected and working
|
Thanks for the report, I'm just catching up on some of these issues. For "unknown" like this, we tend to not expose data, rather than report negative numbers. |
Some devices (ex virtual) don't have a speed and report `-1` as the speed value. Skip reporting speed for these devices. Fixes: #1967 Signed-off-by: Ben Kochie <superq@gmail.com>
Some devices (ex virtual) don't have a speed and report `-1` as the speed value. Add a flag to allow ignoring speed on these devices. Fixes: #1967 Signed-off-by: Ben Kochie <superq@gmail.com>
Some devices (ex virtual) don't have a speed and report `-1` as the speed value. Add a flag to allow ignoring speed on these devices. Fixes: #1967 Signed-off-by: Ben Kochie <superq@gmail.com>
Some devices (ex virtual) don't have a speed and report `-1` as the speed value. Add a flag to allow ignoring speed on these devices. Fixes: prometheus#1967 Signed-off-by: Ben Kochie <superq@gmail.com>
Some devices (ex virtual) don't have a speed and report `-1` as the speed value. Add a flag to allow ignoring speed on these devices. Fixes: prometheus#1967 Signed-off-by: Ben Kochie <superq@gmail.com>
After upgrading to version
1.1.0
from0.18.x
or older in an oVirt / KVM virtual environment, we noticed that the metricnode_network_speed_bytes
is reported as-125000
when the file/sys/class/net/<nic>/speed
contains the value-1
, while previously the metric returned0
. The same behavior can be observed on physical wired connections when there is no cable plugged in, as I've just verified with my Arch Linux laptop.I'd understand if it returned
-1
since that's whats in the file to indicate that this interface "has no speed" (essentially a sentinel value) because it is either a complicated implementation ofmemcpy
(VM), or has no connection (physical interface w/o cable) and this indicates the absence of information about the speed of the interface.I've bisected this to commit 3ddc82c in PR #1580, included from version
1.0.0-rc.0
onward. I'm not fluent in golang, but judging from the commit message, it seems that the early division caused it to essentially become0
due to loss of precision, and thus it stayed at0
when the multiplications were applied. After the PR was applied, the-1
is brought to a high negative number before the division and so stays distinctly different from the raw input value or the value of0
returned in previous versions.IMHO, the value
0
before #1580 was merged was not entirely correct, but nevertheless adequate to indicate "no speed information for whatever reason". The new value of-125000
most certainly is not correct, as it is neither a well-known sentinel value like-1
nor an accurate indication of the speed of any network interface.Thus, I propose special-casing the value of
-1
to return either:-1
(absence of information) or0
to reinstate the previous behavior.Host operating system: output of
uname -a
Linux hostname 4.18.0-240.10.1.el8_3.x86_64 #1 SMP Mon Jan 18 17:05:51 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
CentOS 8.3 running in oVirt (KVM), also observed on Debian Buster on the same hypervisor
node_exporter version: output of
node_exporter --version
node_exporter command line flags
None
Are you running node_exporter in Docker?
No
What did you do that produced an error?
Update to
1.1.0
from0.18.3
What did you expect to see?
Either the value returned by
0.18.3
:Or ideally the raw value to indicate the absence of information, which would be:
What did you see instead?
The text was updated successfully, but these errors were encountered: