Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hwmon collector is confused on Linux 4.19 #1122

Closed
hhoffstaette opened this issue Oct 22, 2018 · 12 comments · Fixed by #1123
Closed

hwmon collector is confused on Linux 4.19 #1122

hhoffstaette opened this issue Oct 22, 2018 · 12 comments · Fixed by #1123

Comments

@hhoffstaette
Copy link
Contributor

hhoffstaette commented Oct 22, 2018

Updated to 4.19 final & immediately started getting errors from the hwmon collector, which
has been working flawlessly with 4.18.x. Running with --log.level=debug yielded no meaningful
insights except the original error:

ERRO[0056] error gathering metrics: [from Gatherer #2] collected metric node_hwmon_temp_celsius label:<name:"chip" value:"thermal_thermal_zone0" > label:<name:"sensor" value:"temp0" > gauge:<value:27.8 >  has help "Hardware monitor for temperature ()" but should have "Hardware monitor for temperature (input)"  source="log.go:172"

Using the prebuilt 0.17.0-rc0 binary on plain 4.19. No docker etc.

As far as the error goes I'm stumped. I poked around in /sys/class/thermal and have no idea
what it tries to read , where or what "temp0" is supposed to be or why the help text is out of whack.
However, even with the error I still see updated & completely correct values for CPU temp/fan rpm in
Grafana, so apparently it's only the help text that is confused. For now I'll keep running
with loglevel reduced to fatal to not spam my logfile.

On another system with different HW sensors everything is working fine without hiccup,
so it might be specific to this machine's nct6776 chip (using the nct6775 module).
Regular "sensors" output is fine, as before.

Other than this I have no clue what is tripping up the hwmon collector since the error reporting
is unfortunately next to useless. :(

@pgier
Copy link
Contributor

pgier commented Oct 22, 2018

The data for this collector is coming from /sys/class/hwmon, and the error message is coming from the client_golang which is just trying to verify that the help messages are in the right format. Each sensor value is contained in a file with the format _, for example "temp1_input" for the current reading of temperature sensor 1. It looks like there is a temp0 sensor but it's getting a blank value for part of the filename.

I wrote a quick hack which I think will fix the error message, in case you want to try it. Unfortunately, I'm not able to reproduce this locally, so I'm not sure exactly what's causing the problem for you.
pgier@27c522a

@hhoffstaette
Copy link
Contributor Author

hhoffstaette commented Oct 23, 2018

Thanks for looking into this!

It looks like there is a temp0 sensor but it's getting a blank value for part of the filename.

Well, that's the thing - I can't find any entry named temp0, I already looked for it and that's what I found
so puzzling to begin with! Here's a dump of my sys/class/hwmon/ - maybe you can see something
new & unexpected, because I certainly don't. :/
In any case thanks for the patch, I'll give that a try.

ll /sys/class/hwmon/hwmon*/
/sys/class/hwmon/hwmon0/:
total 0
lrwxrwxrwx 1 root root    0 Oct 22 17:35 device -> ../../thermal_zone0/
-r--r--r-- 1 root root 4.0K Oct 22 19:12 name
drwxr-xr-x 2 root root    0 Oct 22 17:35 power/
lrwxrwxrwx 1 root root    0 Oct 22 19:35 subsystem -> ../../../../../class/hwmon/
-r--r--r-- 1 root root 4.0K Oct 22 19:12 temp1_crit
-r--r--r-- 1 root root 4.0K Oct 22 19:12 temp1_input
-r--r--r-- 1 root root 4.0K Oct 22 19:12 temp2_crit
-r--r--r-- 1 root root 4.0K Oct 22 19:12 temp2_input
-rw-r--r-- 1 root root 4.0K Oct 22 19:12 uevent

/sys/class/hwmon/hwmon1/:
total 0
lrwxrwxrwx 1 root root    0 Oct 22 17:35 device -> ../../../coretemp.0/
-r--r--r-- 1 root root 4.0K Oct 22 18:05 name
drwxr-xr-x 2 root root    0 Oct 22 17:35 power/
lrwxrwxrwx 1 root root    0 Oct 22 17:35 subsystem -> ../../../../../class/hwmon/
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp1_crit
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp1_crit_alarm
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp1_input
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp1_label
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp1_max
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp2_crit
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp2_crit_alarm
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp2_input
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp2_label
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp2_max
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp3_crit
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp3_crit_alarm
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp3_input
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp3_label
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp3_max
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp4_crit
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp4_crit_alarm
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp4_input
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp4_label
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp4_max
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp5_crit
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp5_crit_alarm
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp5_input
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp5_label
-r--r--r-- 1 root root 4.0K Oct 22 18:05 temp5_max
-rw-r--r-- 1 root root 4.0K Oct 22 18:05 uevent

/sys/class/hwmon/hwmon2/:
total 0
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 beep_enable
lrwxrwxrwx 1 root root    0 Oct 22 17:35 device -> ../../../nct6775.656/
-r--r--r-- 1 root root 4.0K Oct 23 03:01 fan1_alarm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 fan1_beep
-r--r--r-- 1 root root 4.0K Oct 23 03:01 fan1_input
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 fan1_min
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 fan1_pulses
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 fan1_target
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 fan1_tolerance
-r--r--r-- 1 root root 4.0K Oct 23 03:01 fan2_alarm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 fan2_beep
-r--r--r-- 1 root root 4.0K Oct 23 03:01 fan2_input
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 fan2_min
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 fan2_pulses
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 fan2_target
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 fan2_tolerance
-r--r--r-- 1 root root 4.0K Oct 23 03:01 fan3_alarm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 fan3_beep
-r--r--r-- 1 root root 4.0K Oct 23 03:01 fan3_input
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 fan3_min
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 fan3_pulses
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 fan3_target
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 fan3_tolerance
-r--r--r-- 1 root root 4.0K Oct 23 03:01 fan4_alarm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 fan4_beep
-r--r--r-- 1 root root 4.0K Oct 23 03:01 fan4_input
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 fan4_min
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 fan4_pulses
-r--r--r-- 1 root root 4.0K Oct 23 03:01 fan5_alarm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 fan5_beep
-r--r--r-- 1 root root 4.0K Oct 23 03:01 fan5_input
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 fan5_min
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 fan5_pulses
-r--r--r-- 1 root root 4.0K Oct 23 03:01 in0_alarm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 in0_beep
-r--r--r-- 1 root root 4.0K Oct 23 03:01 in0_input
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 in0_max
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 in0_min
-r--r--r-- 1 root root 4.0K Oct 23 03:01 in1_alarm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 in1_beep
-r--r--r-- 1 root root 4.0K Oct 23 03:01 in1_input
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 in1_max
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 in1_min
-r--r--r-- 1 root root 4.0K Oct 23 03:01 in2_alarm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 in2_beep
-r--r--r-- 1 root root 4.0K Oct 23 03:01 in2_input
-rw-r--r-- 1 root root 4.0K Oct 22 18:29 in2_max
-rw-r--r-- 1 root root 4.0K Oct 22 18:29 in2_min
-r--r--r-- 1 root root 4.0K Oct 23 03:01 in3_alarm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 in3_beep
-r--r--r-- 1 root root 4.0K Oct 23 03:01 in3_input
-rw-r--r-- 1 root root 4.0K Oct 22 18:29 in3_max
-rw-r--r-- 1 root root 4.0K Oct 22 18:29 in3_min
-r--r--r-- 1 root root 4.0K Oct 23 03:01 in4_alarm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 in4_beep
-r--r--r-- 1 root root 4.0K Oct 23 03:01 in4_input
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 in4_max
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 in4_min
-r--r--r-- 1 root root 4.0K Oct 23 03:01 in5_alarm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 in5_beep
-r--r--r-- 1 root root 4.0K Oct 23 03:01 in5_input
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 in5_max
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 in5_min
-r--r--r-- 1 root root 4.0K Oct 23 03:01 in6_alarm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 in6_beep
-r--r--r-- 1 root root 4.0K Oct 23 03:01 in6_input
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 in6_max
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 in6_min
-r--r--r-- 1 root root 4.0K Oct 23 03:01 in7_alarm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 in7_beep
-r--r--r-- 1 root root 4.0K Oct 23 03:01 in7_input
-rw-r--r-- 1 root root 4.0K Oct 22 18:29 in7_max
-rw-r--r-- 1 root root 4.0K Oct 22 18:29 in7_min
-r--r--r-- 1 root root 4.0K Oct 23 03:01 in8_alarm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 in8_beep
-r--r--r-- 1 root root 4.0K Oct 23 03:01 in8_input
-rw-r--r-- 1 root root 4.0K Oct 22 18:29 in8_max
-rw-r--r-- 1 root root 4.0K Oct 22 18:29 in8_min
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 intrusion0_alarm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 intrusion0_beep
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 intrusion1_alarm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 intrusion1_beep
-r--r--r-- 1 root root 4.0K Oct 23 03:01 name
drwxr-xr-x 2 root root    0 Oct 22 17:35 power/
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_auto_point1_pwm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_auto_point1_temp
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_auto_point2_pwm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_auto_point2_temp
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_auto_point3_pwm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_auto_point3_temp
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_auto_point4_pwm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_auto_point4_temp
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_auto_point5_pwm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_auto_point5_temp
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_crit_temp_tolerance
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_enable
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_floor
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_mode
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_start
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_step_down_time
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_step_up_time
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_stop_time
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_target_temp
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_temp_sel
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_temp_tolerance
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_weight_duty_base
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_weight_duty_step
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_weight_temp_sel
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_weight_temp_step
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_weight_temp_step_base
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm1_weight_temp_step_tol
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_auto_point1_pwm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_auto_point1_temp
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_auto_point2_pwm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_auto_point2_temp
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_auto_point3_pwm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_auto_point3_temp
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_auto_point4_pwm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_auto_point4_temp
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_auto_point5_pwm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_auto_point5_temp
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_crit_temp_tolerance
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_enable
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_floor
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_mode
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_start
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_step_down_time
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_step_up_time
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_stop_time
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_target_temp
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_temp_sel
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_temp_tolerance
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_weight_duty_base
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_weight_duty_step
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_weight_temp_sel
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_weight_temp_step
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_weight_temp_step_base
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm2_weight_temp_step_tol
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_auto_point1_pwm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_auto_point1_temp
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_auto_point2_pwm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_auto_point2_temp
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_auto_point3_pwm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_auto_point3_temp
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_auto_point4_pwm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_auto_point4_temp
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_auto_point5_pwm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_auto_point5_temp
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_crit_temp_tolerance
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_enable
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_floor
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_mode
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_start
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_step_down_time
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_step_up_time
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_stop_time
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_target_temp
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_temp_sel
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_temp_tolerance
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_weight_duty_base
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_weight_duty_step
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_weight_temp_sel
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_weight_temp_step
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_weight_temp_step_base
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 pwm3_weight_temp_step_tol
lrwxrwxrwx 1 root root    0 Oct 22 17:35 subsystem -> ../../../../../class/hwmon/
-r--r--r-- 1 root root 4.0K Oct 23 03:01 temp10_input
-r--r--r-- 1 root root 4.0K Oct 23 03:01 temp10_label
-r--r--r-- 1 root root 4.0K Oct 23 03:01 temp1_alarm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 temp1_beep
-r--r--r-- 1 root root 4.0K Oct 23 03:01 temp1_input
-r--r--r-- 1 root root 4.0K Oct 23 03:01 temp1_label
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 temp1_max
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 temp1_max_hyst
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 temp1_offset
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 temp1_type
-r--r--r-- 1 root root 4.0K Oct 23 03:01 temp2_alarm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 temp2_beep
-r--r--r-- 1 root root 4.0K Oct 23 03:01 temp2_input
-r--r--r-- 1 root root 4.0K Oct 23 03:01 temp2_label
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 temp2_max
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 temp2_max_hyst
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 temp2_offset
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 temp2_type
-r--r--r-- 1 root root 4.0K Oct 23 03:01 temp3_alarm
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 temp3_beep
-r--r--r-- 1 root root 4.0K Oct 23 03:01 temp3_input
-r--r--r-- 1 root root 4.0K Oct 23 03:01 temp3_label
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 temp3_max
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 temp3_max_hyst
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 temp3_offset
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 temp3_type
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 temp7_beep
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 temp7_crit
-r--r--r-- 1 root root 4.0K Oct 23 03:01 temp7_input
-r--r--r-- 1 root root 4.0K Oct 23 03:01 temp7_label
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 temp7_max
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 temp7_max_hyst
-r--r--r-- 1 root root 4.0K Oct 23 03:01 temp8_input
-r--r--r-- 1 root root 4.0K Oct 23 03:01 temp8_label
-r--r--r-- 1 root root 4.0K Oct 23 03:01 temp9_input
-r--r--r-- 1 root root 4.0K Oct 23 03:01 temp9_label
-rw-r--r-- 1 root root 4.0K Oct 23 03:01 uevent

@hhoffstaette
Copy link
Contributor Author

Well, what do you know! Applied the patch and:

DEBU[0087] OK: hwmon collector succeeded after 0.007514s.  source="collector.go:135"

No other side effects, everything else is humming along just like before.
Thanks!

@SuperQ
Copy link
Member

SuperQ commented Oct 23, 2018

Interesting, it looks like thermal_zone0 doesn't have a temp0. @hhoffstaette, with the patch do you see a temp0 metric?

@hhoffstaette
Copy link
Contributor Author

hhoffstaette commented Oct 23, 2018

Interesting, it looks like thermal_zone0 doesn't have a temp0. @hhoffstaette, with the patch
do you see a temp0 metric?

No, only temp1..temp5.
I'm an idiot, I've been polling the wrong machine. Sorry.
To answer your question: yes, I see a temp0:

$wget -q -O - http://tux:9100/metrics | grep temp0  
node_hwmon_temp_celsius{chip="thermal_thermal_zone0",sensor="temp0"} 27.8

Obviously 27.8 is the same value as in /sys/class/hwmon/hwmon0/device/temp .

hhoffstaette added a commit to hhoffstaette/portage that referenced this issue Oct 23, 2018
@pgier
Copy link
Contributor

pgier commented Oct 23, 2018

@hhoffstaette Glad to hear the patch fixed the log errors! Can you also list the contents of /sys/class/hwmon/hwmon0/device?

hhoffstaette added a commit to hhoffstaette/portage that referenced this issue Oct 23, 2018
@hhoffstaette
Copy link
Contributor Author

@hhoffstaette Glad to hear the patch fixed the log errors! Can you also list the contents of /sys/class/hwmon/hwmon0/device?

Sure:

ll /sys/class/hwmon/hwmon0/device/
total 0
-r--r--r-- 1 root root 4.0K Oct 22 18:11 available_policies
lrwxrwxrwx 1 root root    0 Oct 22 17:35 cdev0 -> ../cooling_device4/
-r--r--r-- 1 root root 4.0K Oct 22 18:11 cdev0_trip_point
-rw-r--r-- 1 root root 4.0K Oct 22 18:11 cdev0_weight
lrwxrwxrwx 1 root root    0 Oct 22 17:35 cdev1 -> ../cooling_device3/
-r--r--r-- 1 root root 4.0K Oct 22 18:11 cdev1_trip_point
-rw-r--r-- 1 root root 4.0K Oct 22 18:11 cdev1_weight
lrwxrwxrwx 1 root root    0 Oct 22 17:35 cdev2 -> ../cooling_device2/
-r--r--r-- 1 root root 4.0K Oct 22 18:11 cdev2_trip_point
-rw-r--r-- 1 root root 4.0K Oct 22 18:11 cdev2_weight
lrwxrwxrwx 1 root root    0 Oct 22 17:35 cdev3 -> ../cooling_device1/
-r--r--r-- 1 root root 4.0K Oct 22 18:11 cdev3_trip_point
-rw-r--r-- 1 root root 4.0K Oct 22 18:11 cdev3_weight
lrwxrwxrwx 1 root root    0 Oct 22 17:35 cdev4 -> ../cooling_device0/
-r--r--r-- 1 root root 4.0K Oct 22 18:11 cdev4_trip_point
-rw-r--r-- 1 root root 4.0K Oct 22 18:11 cdev4_weight
lrwxrwxrwx 1 root root    0 Oct 22 17:35 device -> ../../../LNXSYSTM:00/LNXSYBUS:01/LNXTHERM:00/
drwxr-xr-x 3 root root    0 Oct 22 19:35 hwmon0/
-rw-r--r-- 1 root root 4.0K Oct 22 18:11 integral_cutoff
-rw-r--r-- 1 root root 4.0K Oct 22 18:11 k_d
-rw-r--r-- 1 root root 4.0K Oct 22 18:11 k_i
-rw-r--r-- 1 root root 4.0K Oct 22 18:11 k_po
-rw-r--r-- 1 root root 4.0K Oct 22 18:11 k_pu
-rw-r--r-- 1 root root 4.0K Oct 22 18:11 mode
-rw-r--r-- 1 root root 4.0K Oct 22 18:11 offset
-rw-r--r-- 1 root root 4.0K Oct 22 18:11 passive
-rw-r--r-- 1 root root 4.0K Oct 22 18:11 policy
drwxr-xr-x 2 root root    0 Oct 22 17:35 power/
-rw-r--r-- 1 root root 4.0K Oct 22 18:11 slope
lrwxrwxrwx 1 root root    0 Oct 22 19:35 subsystem -> ../../../../class/thermal/
-rw-r--r-- 1 root root 4.0K Oct 22 18:11 sustainable_power
-r--r--r-- 1 root root 4.0K Oct 22 18:11 temp
-r--r--r-- 1 root root 4.0K Oct 22 18:11 trip_point_0_temp
-r--r--r-- 1 root root 4.0K Oct 22 18:11 trip_point_0_type
-r--r--r-- 1 root root 4.0K Oct 22 18:11 trip_point_1_temp
-r--r--r-- 1 root root 4.0K Oct 22 18:11 trip_point_1_type
-r--r--r-- 1 root root 4.0K Oct 22 18:11 trip_point_2_temp
-r--r--r-- 1 root root 4.0K Oct 22 18:11 trip_point_2_type
-r--r--r-- 1 root root 4.0K Oct 22 18:11 trip_point_3_temp
-r--r--r-- 1 root root 4.0K Oct 22 18:11 trip_point_3_type
-r--r--r-- 1 root root 4.0K Oct 22 18:11 trip_point_4_temp
-r--r--r-- 1 root root 4.0K Oct 22 18:11 trip_point_4_type
-r--r--r-- 1 root root 4.0K Oct 22 18:11 trip_point_5_temp
-r--r--r-- 1 root root 4.0K Oct 22 18:11 trip_point_5_type
-r--r--r-- 1 root root 4.0K Oct 22 18:11 type
-rw-r--r-- 1 root root 4.0K Oct 22 18:11 uevent

@pgier
Copy link
Contributor

pgier commented Oct 23, 2018

I think the problem is the file "temp" in that directory. Does it contain a temperature value?

@hhoffstaette
Copy link
Contributor Author

hhoffstaette commented Oct 23, 2018

I think the problem is the file "temp" in that directory. Does it contain a temperature value?

Good point!

$cat /sys/class/hwmon/hwmon0/device/temp 
27800

And indeed this entry doesn't exist on my other machine (also running 4.19) where everything has been fine.

@pgier
Copy link
Contributor

pgier commented Oct 23, 2018

Yep, looks like a temperature in millidegrees. That value should show up in your metrics as
node_hwmon_temp_celsius{chip="thermal_thermal_zone0",sensor="temp0"}, do you see it on the node_exporter metrics page?

@hhoffstaette
Copy link
Contributor Author

hhoffstaette commented Oct 23, 2018

Indeed I do! (SO sorry for the misleading previous comment). Output with patch:

$wget -q -O - http://tux:9100/metrics | grep thermal
node_hwmon_chip_names{chip="thermal_thermal_zone0",chip_name="acpitz"} 1
node_hwmon_temp_celsius{chip="thermal_thermal_zone0",sensor="temp0"} 27.8
node_hwmon_temp_celsius{chip="thermal_thermal_zone0",sensor="temp1"} 27.8
node_hwmon_temp_celsius{chip="thermal_thermal_zone0",sensor="temp2"} 29.8
node_hwmon_temp_crit_celsius{chip="thermal_thermal_zone0",sensor="temp1"} 87
node_hwmon_temp_crit_celsius{chip="thermal_thermal_zone0",sensor="temp2"} 87

pgier added a commit to pgier/node_exporter that referenced this issue Oct 23, 2018
…ave item suffix

In some cases the file might be called "temp" instead of the usual format "temp<index>_<item>"
as described in the kernel docs: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface
In this case, treat this as an _input file containing the current temperature reading.

Fixes prometheus#1122

Signed-off-by: Paul Gier <pgier@redhat.com>
@SuperQ
Copy link
Member

SuperQ commented Oct 24, 2018

I wonder if that's a dupe of temp1 from a driver perspective.

SuperQ pushed a commit that referenced this issue Oct 30, 2018
…ave item suffix (#1123)

In some cases the file might be called "temp" instead of the usual format "temp<index>_<item>"
as described in the kernel docs: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface
In this case, treat this as an _input file containing the current temperature reading.

Fixes #1122

Signed-off-by: Paul Gier <pgier@redhat.com>
iori-yja pushed a commit to iori-yja/node_exporter that referenced this issue Nov 2, 2018
…ave item suffix (prometheus#1123)

In some cases the file might be called "temp" instead of the usual format "temp<index>_<item>"
as described in the kernel docs: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface
In this case, treat this as an _input file containing the current temperature reading.

Fixes prometheus#1122

Signed-off-by: Paul Gier <pgier@redhat.com>
Signed-off-by: iori-yja <fivio.11235813@gmail.com>
SuperQ pushed a commit that referenced this issue Nov 30, 2018
…ave item suffix (#1123)

In some cases the file might be called "temp" instead of the usual format "temp<index>_<item>"
as described in the kernel docs: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface
In this case, treat this as an _input file containing the current temperature reading.

Fixes #1122

Signed-off-by: Paul Gier <pgier@redhat.com>
Signed-off-by: Ben Kochie <superq@gmail.com>
SuperQ pushed a commit that referenced this issue Nov 30, 2018
…ave item suffix (#1123)

In some cases the file might be called "temp" instead of the usual format "temp<index>_<item>"
as described in the kernel docs: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface
In this case, treat this as an _input file containing the current temperature reading.

Fixes #1122

Signed-off-by: Paul Gier <pgier@redhat.com>
Signed-off-by: Ben Kochie <superq@gmail.com>
oblitorum pushed a commit to shatteredsilicon/node_exporter that referenced this issue Apr 9, 2024
…ave item suffix (prometheus#1123)

In some cases the file might be called "temp" instead of the usual format "temp<index>_<item>"
as described in the kernel docs: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface
In this case, treat this as an _input file containing the current temperature reading.

Fixes prometheus#1122

Signed-off-by: Paul Gier <pgier@redhat.com>
oblitorum pushed a commit to shatteredsilicon/node_exporter that referenced this issue Apr 9, 2024
…ave item suffix (prometheus#1123)

In some cases the file might be called "temp" instead of the usual format "temp<index>_<item>"
as described in the kernel docs: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface
In this case, treat this as an _input file containing the current temperature reading.

Fixes prometheus#1122

Signed-off-by: Paul Gier <pgier@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants