-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hwmon collector is confused on Linux 4.19 #1122
Comments
The data for this collector is coming from /sys/class/hwmon, and the error message is coming from the client_golang which is just trying to verify that the help messages are in the right format. Each sensor value is contained in a file with the format _, for example "temp1_input" for the current reading of temperature sensor 1. It looks like there is a temp0 sensor but it's getting a blank value for part of the filename. I wrote a quick hack which I think will fix the error message, in case you want to try it. Unfortunately, I'm not able to reproduce this locally, so I'm not sure exactly what's causing the problem for you. |
Thanks for looking into this!
Well, that's the thing - I can't find any entry named temp0, I already looked for it and that's what I found
|
Well, what do you know! Applied the patch and:
No other side effects, everything else is humming along just like before. |
Interesting, it looks like |
Obviously 27.8 is the same value as in |
@hhoffstaette Glad to hear the patch fixed the log errors! Can you also list the contents of |
Sure:
|
I think the problem is the file "temp" in that directory. Does it contain a temperature value? |
Good point!
And indeed this entry doesn't exist on my other machine (also running 4.19) where everything has been fine. |
Yep, looks like a temperature in millidegrees. That value should show up in your metrics as |
Indeed I do! (SO sorry for the misleading previous comment). Output with patch:
|
…ave item suffix In some cases the file might be called "temp" instead of the usual format "temp<index>_<item>" as described in the kernel docs: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface In this case, treat this as an _input file containing the current temperature reading. Fixes prometheus#1122 Signed-off-by: Paul Gier <pgier@redhat.com>
I wonder if that's a dupe of |
…ave item suffix (#1123) In some cases the file might be called "temp" instead of the usual format "temp<index>_<item>" as described in the kernel docs: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface In this case, treat this as an _input file containing the current temperature reading. Fixes #1122 Signed-off-by: Paul Gier <pgier@redhat.com>
…ave item suffix (prometheus#1123) In some cases the file might be called "temp" instead of the usual format "temp<index>_<item>" as described in the kernel docs: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface In this case, treat this as an _input file containing the current temperature reading. Fixes prometheus#1122 Signed-off-by: Paul Gier <pgier@redhat.com> Signed-off-by: iori-yja <fivio.11235813@gmail.com>
…ave item suffix (#1123) In some cases the file might be called "temp" instead of the usual format "temp<index>_<item>" as described in the kernel docs: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface In this case, treat this as an _input file containing the current temperature reading. Fixes #1122 Signed-off-by: Paul Gier <pgier@redhat.com> Signed-off-by: Ben Kochie <superq@gmail.com>
…ave item suffix (#1123) In some cases the file might be called "temp" instead of the usual format "temp<index>_<item>" as described in the kernel docs: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface In this case, treat this as an _input file containing the current temperature reading. Fixes #1122 Signed-off-by: Paul Gier <pgier@redhat.com> Signed-off-by: Ben Kochie <superq@gmail.com>
…ave item suffix (prometheus#1123) In some cases the file might be called "temp" instead of the usual format "temp<index>_<item>" as described in the kernel docs: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface In this case, treat this as an _input file containing the current temperature reading. Fixes prometheus#1122 Signed-off-by: Paul Gier <pgier@redhat.com>
…ave item suffix (prometheus#1123) In some cases the file might be called "temp" instead of the usual format "temp<index>_<item>" as described in the kernel docs: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface In this case, treat this as an _input file containing the current temperature reading. Fixes prometheus#1122 Signed-off-by: Paul Gier <pgier@redhat.com>
Updated to 4.19 final & immediately started getting errors from the hwmon collector, which
has been working flawlessly with 4.18.x. Running with --log.level=debug yielded no meaningful
insights except the original error:
Using the prebuilt 0.17.0-rc0 binary on plain 4.19. No docker etc.
As far as the error goes I'm stumped. I poked around in /sys/class/thermal and have no idea
what it tries to read , where or what "temp0" is supposed to be or why the help text is out of whack.
However, even with the error I still see updated & completely correct values for CPU temp/fan rpm in
Grafana, so apparently it's only the help text that is confused. For now I'll keep running
with loglevel reduced to fatal to not spam my logfile.
On another system with different HW sensors everything is working fine without hiccup,
so it might be specific to this machine's nct6776 chip (using the nct6775 module).
Regular "sensors" output is fine, as before.
Other than this I have no clue what is tripping up the hwmon collector since the error reporting
is unfortunately next to useless. :(
The text was updated successfully, but these errors were encountered: