Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot read hwmon*/temp1_input file #21

Closed
mregni opened this issue Jun 9, 2023 · 27 comments
Closed

Cannot read hwmon*/temp1_input file #21

mregni opened this issue Jun 9, 2023 · 27 comments

Comments

@mregni
Copy link

mregni commented Jun 9, 2023

Hi,

I'm trying to get the smfc tool working on my homelab (X10DRi) but got stuck with the hwmon paths.
Seems that the CPU zone is initialized corretly but the HD zone can't find the temp1_input files anywhere.

I did some digging and indeed, all my /sys/class/scsi_disk/* folders don't have a hwmon directory in them.
Any idea what I'm doing wrong here?

DEBUG output

root@mars:/opt/smfc# ./smfc.py -o 0 -l 4
CONFIG: Logging module was initialized with:
CONFIG:    log_level = 4
CONFIG:    log_output = 0
CONFIG: Command line arguments:
CONFIG:    original arguments: ./smfc.py -o 0 -l 4
CONFIG:    parsed config file = smfc.conf
CONFIG:    parsed log level = 4
CONFIG:    parsed log output = 0
DEBUG: Configuration file (smfc.conf) loaded
CONFIG: Ipmi module was initialized with:
CONFIG:    command = /usr/bin/ipmitool
CONFIG:    fan_mode_delay = 10
CONFIG:    fan_level_delay = 2
CONFIG:    swapped_zones = True
DEBUG: Old IPMI fan mode = FULL_MODE
DEBUG: CPU zone fan controller enabled
CONFIG: CPU zone fan controller was initialized with:
CONFIG:    ipmi zone = 0
CONFIG:    count = 2
CONFIG:    temp_calc = 1
CONFIG:    steps = 6
CONFIG:    sensitivity = 3.0
CONFIG:    polling = 2.0
CONFIG:    min_temp = 30.0
CONFIG:    max_temp = 60.0
CONFIG:    min_level = 35
CONFIG:    max_level = 100
CONFIG:    hwmon_path = ['/sys/devices/platform/coretemp.0/hwmon/hwmon4/temp1_input', '/sys/devices/platform/coretemp.1/hwmon/hwmon5/temp1_input']
CONFIG:    Temperature to level mapping:
CONFIG:    0. [T:30.0C - L:35%]
CONFIG:    1. [T:35.0C - L:45%]
CONFIG:    2. [T:40.0C - L:56%]
CONFIG:    3. [T:45.0C - L:67%]
CONFIG:    4. [T:50.0C - L:78%]
CONFIG:    5. [T:55.0C - L:89%]
CONFIG:    6. [T:60.0C - L:100%]
DEBUG: HD zone fan controller enabled
Traceback (most recent call last):
  File "/opt/smfc/./smfc.py", line 975, in <module>
    main()
  File "/opt/smfc/./smfc.py", line 951, in main
    my_hd_zone = HdZone(my_log, my_ipmi, my_config)
  File "/opt/smfc/./smfc.py", line 703, in __init__
    super().__init__(
  File "/opt/smfc/./smfc.py", line 390, in __init__
    self.build_hwmon_path(hwmon_path)
  File "/opt/smfc/./smfc.py", line 786, in build_hwmon_path
    raise ValueError(self.ERROR_MSG_FILE_IO.format(path))
ValueError: Cannot read file (/sys/class/scsi_disk/0:0:13:0/device/hwmon/hwmon*/temp1_input).

Config file

[Ipmi]
# Path for ipmitool (str, default=/usr/bin/ipmitool)
command=/usr/bin/ipmitool
# Delay time after changing IPMI fan mode (int, seconds, default=10)
fan_mode_delay=10
# Delay time after changing IPMI fan level (int, seconds, default=2)
fan_level_delay=2
# CPU and HD zones are swapped (bool, default=0).
swapped_zones=1

[CPU zone]
# Fan controller enabled (bool, default=0)
enabled=1
# Number of CPUs (int, default=1)
count=2
# Calculation method for CPU temperatures (int, [0-minimum, 1-average, 2-maximum], default=1)
temp_calc=1
# Discrete steps in mapping of temperatures to fan level (int, default=6)
steps=6
# Threshold in temperature change before the fan controller reacts (float, C, default=3.0)
sensitivity=3.0
# Polling time interval for reading temperature (int, sec, default=2)
polling=2
# Minimum CPU temperature (float, C, default=30.0)
min_temp=30.0
# Maximum CPU temperature (float, C, default=60.0)
max_temp=60.0
# Minimum CPU fan level (int, %, default=35)
min_level=35
# Maximum CPU fan level (int, %, default=100)
max_level=100
# Optional parameter, it will be generated automatically (can be used for testing and in special cases).
# Path for CPU sys/hwmon/coretemp file(s) (str multi-line list, default=/sys/devices/platform/coretemp.0/hwmon/hwmon*/temp1_input)
# hwmon_path=/sys/devices/platform/coretemp.0/hwmon/hwmon*/temp1_input
#            /sys/devices/platform/coretemp.1/hwmon/hwmon*/temp1_input


[HD zone]
# Fan controller enabled (bool, default=0)
enabled=1
# Number of HDs (int, default=1)
count=23
# Calculation of HD temperatures (int, [0-minimum, 1-average, 2-maximum], default=1)
temp_calc=1
# Discrete steps in mapping of temperatures to fan level (int, default=4)
steps=4
# Threshold in temperature change before the fan controller reacts (float, C, default=2.0)
sensitivity=2.0
# Polling interval for reading temperature (int, sec, default=10)
polling=10
# Minimum HD temperature (float, C, default=32.0)
min_temp=32.0
# Maximum HD temperature (float, C, default=46.0)
max_temp=46.0
# Minimum HD fan level (int, %, default=35)
min_level=35
# Maximum HD fan level (int, %, default=100)
max_level=100
# Names of the HDs (str multi-line list, default=)
# These names MUST BE specified in '/dev/disk/by-id/...' form!
hd_names=/dev/disk/by-id/scsi-SATA_Samsung_SSD_870_S6PUNX0T715310D
         /dev/disk/by-id/scsi-SATA_SATA_SSD_67F407531F2400139578
         /dev/disk/by-id/scsi-SATA_SATA_SSD_96D70754012400149905
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ0XPDH0000C915756F
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ1CRWT0000C9206GHS
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ1D3FM0000C9206HJW
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ1F6HL0000C920JKGJ
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ1F6MV0000C920N6EE
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ1KB860000C850L5S0
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ1LZG30000C9247N7F
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ245PF0000C843F5T0
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ24AAS0000C920N7L6
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ2DCZ60000C925CLFV
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ2E8CC0000C922FXGV
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ2G2330000C9201HJX
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ2JDXL0000C93432Y5
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ2JSFW0000G84101EM
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ2KASF0000C9355QHB
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ2KZVZ0000C9342YAG
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ2LPKD0000C9362E9U
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ2M09R0000C9362FJX
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ443GZ0000C006EFT5
         /dev/disk/by-id/scsi-SIBM-ESXS_ST14000NM0288_E_ZHZ4CGJ50000C008J0BA
# Optional parameter, it will be generated automatically (can be used for testing and in special cases).
# Path for HD sys/hwmon/drivetemp file(s) (str multi-line list, default=/sys/class/scsi_disk/0:0:0:0/device/hwmon/hwmon*/temp1_input)
# hwmon_path=/sys/class/scsi_disk/0:0:0:0/device/hwmon/hwmon*/temp1_input
#            /sys/class/scsi_disk/1:0:0:0/device/hwmon/hwmon*/temp1_input

/sys/class/scsi_disk content

root@mars:/sys/class/scsi_disk# ls
0:0:0:0  0:0:10:0  0:0:12:0  0:0:14:0  0:0:16:0  0:0:18:0  0:0:2:0  0:0:4:0  0:0:6:0  0:0:8:0  10:0:0:0  9:0:0:0
0:0:1:0  0:0:11:0  0:0:13:0  0:0:15:0  0:0:17:0  0:0:19:0  0:0:3:0  0:0:5:0  0:0:7:0  0:0:9:0  5:0:0:0
@petersulyok
Copy link
Owner

petersulyok commented Jun 9, 2023

Hi, you might be the first user with real SCSI disks. I assume that drivetemp kernel module cannot handle your disks and this is the reason why the files are missing from HWMON. You may also check if lm-sensors can show the temperature of the disks.
In my case I can see such an output for my SATA disks:

root@home:~# sensors
drivetemp-scsi-7-0
Adapter: SCSI adapter
temp1:        +27.0°C  (low  =  +0.0°C, high = +65.0°C)
                       (crit low = -40.0°C, crit = +70.0°C)
                       (lowest = +22.0°C, highest = +38.0°C)

drivetemp-scsi-2-0
Adapter: SCSI adapter
temp1:        +25.0°C  (low  =  +0.0°C, high = +65.0°C)
                       (crit low = -40.0°C, crit = +70.0°C)
                       (lowest = +20.0°C, highest = +36.0°C)

drivetemp-scsi-5-0
Adapter: SCSI adapter
temp1:        +27.0°C  (low  =  +0.0°C, high = +65.0°C)
                       (crit low = -40.0°C, crit = +70.0°C)
                       (lowest = +23.0°C, highest = +41.0°C)

drivetemp-scsi-0-0
Adapter: SCSI adapter
temp1:        +28.0°C  (low  =  +0.0°C, high = +65.0°C)
                       (crit low = -40.0°C, crit = +70.0°C)
                       (lowest = +22.0°C, highest = +40.0°C)

drivetemp-scsi-4-0
Adapter: SCSI adapter
temp1:        +26.0°C  (low  =  +0.0°C, high = +65.0°C)
                       (crit low = -40.0°C, crit = +70.0°C)
                       (lowest = +21.0°C, highest = +41.0°C)
...

@mregni
Copy link
Author

mregni commented Jun 9, 2023

Hi,

Thanks for getting back to me!

Here is the full output of sesnors command. Seems like I can't see the temperatures of the 20 disks.
Main disks are ST14000NM0288, but I do 2 nvme disks and 3 SATA SSD's. Looks like they can be found with lm-sensors

root@mars:/home/reggi# sensors
drivetemp-scsi-10-0
Adapter: SCSI adapter
temp1:        +30.0°C

drivetemp-scsi-5-0
Adapter: SCSI adapter
temp1:        +44.0°C  (low  =  +0.0°C, high = +70.0°C)
                       (crit low =  +0.0°C, crit = +70.0°C)
                       (lowest = +32.0°C, highest = +43.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +52.0°C  (high = +75.0°C, crit = +85.0°C)
Core 0:        +45.0°C  (high = +75.0°C, crit = +85.0°C)
Core 1:        +45.0°C  (high = +75.0°C, crit = +85.0°C)
Core 2:        +44.0°C  (high = +75.0°C, crit = +85.0°C)
Core 3:        +47.0°C  (high = +75.0°C, crit = +85.0°C)
Core 4:        +44.0°C  (high = +75.0°C, crit = +85.0°C)
Core 5:        +44.0°C  (high = +75.0°C, crit = +85.0°C)

i350bb-pci-0400
Adapter: PCI adapter
loc1:         +52.0°C  (high = +120.0°C, crit = +110.0°C)

nvme-pci-0200
Adapter: PCI adapter
Composite:    +38.9°C  (low  = -273.1°C, high = +81.8°C)
                       (crit = +84.8°C)
Sensor 1:     +38.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +41.9°C  (low  = -273.1°C, high = +65261.8°C)

drivetemp-scsi-9-0
Adapter: SCSI adapter
temp1:        +30.0°C

coretemp-isa-0001
Adapter: ISA adapter
Package id 1:  +51.0°C  (high = +75.0°C, crit = +85.0°C)
Core 0:        +45.0°C  (high = +75.0°C, crit = +85.0°C)
Core 1:        +45.0°C  (high = +75.0°C, crit = +85.0°C)
Core 2:        +51.0°C  (high = +75.0°C, crit = +85.0°C)
Core 3:        +47.0°C  (high = +75.0°C, crit = +85.0°C)
Core 4:        +45.0°C  (high = +75.0°C, crit = +85.0°C)
Core 5:        +44.0°C  (high = +75.0°C, crit = +85.0°C)

power_meter-acpi-0
Adapter: ACPI interface
power1:        0.00 W  (interval =   1.00 s)

nvme-pci-0100
Adapter: PCI adapter
Composite:    +38.9°C  (low  = -273.1°C, high = +81.8°C)
                       (crit = +84.8°C)
Sensor 1:     +38.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +42.9°C  (low  = -273.1°C, high = +65261.8°C)

Looks like smartctl can find the temperatures, so at least the sensors itself are working on the disks

root@mars:/sys/class/hwmon/hwmon4# smartctl -a /dev/sdg | grep Temperature
Temperature Warning:  Enabled
Current Drive Temperature:     40 C
Drive Trip Temperature:        65 C

@secabeen
Copy link

secabeen commented Jun 9, 2023

You'll need to ensure the coretemp and drivetemp modules are loaded by adding them to /etc/modules or manually loading them with modprobe coretemp drivetemp. Note that drivetemp is only available in kernel 5.6 and later; I had to upgrade from Ubuntu 20.04 to 22.04 to get it.

@mregni
Copy link
Author

mregni commented Jun 13, 2023

I'm running 22.04 and drivetemp seems to be working.
Can't really find a workaround to this issue at the moment, if I find any I'll post an update for sure.

@petersulyok
Copy link
Owner

This is compatibility issue of the drivetemp module. It was planned for ATA/SATA drives and works with some SCSI drives as well. See more details here and here.

You may contact the author of the module on this compatibility issue: Guenter Roeck (linus@roeck-us.net), he might have some idea.

@thiete
Copy link

thiete commented Jun 14, 2023

I also have a bunch of SAS drives connected to a HBA which do not work with drivetemp.
hddtemp can display the temperatures; it gets them from SMART data. hddemp has a daemon mode which can be used to poll the data, does some caching etc. Seems it would be a decent candidate for a secondary source of temperature info inn these cases. If I ever have a bit of spare time I'll look into implementing it.

@petersulyok
Copy link
Owner

Yes indeed, hddtemp and smartctl are the potential additional options for reading HDD temperature. Originally I selected drivetemp because of the speed.

@petersulyok
Copy link
Owner

petersulyok commented Jun 15, 2023

I'm planning to add a new configuration parameter:

[HD zone]
# Source of the HD temperature (int, 0-drivetemp, 1-hddtemp, 2-smartctl, default=0)
temp_source=0

where the user can configure the source of the temperature reading. Please give me some time to find the best way of doing this.

Update: I checked the current state of hddtemp and it seems to be a pretty dead project. The latest (beta) release was in 2007 and it was removed from the latest stable Debian release (bookworm). So I would skip this way of reading HD temperature.

Update 2: I've got several design questions:

  1. Should SMFC support multiple temperature sources for diverse set of disks (NVME, SATA, SCSI drives together)?
  2. Usually the SMART attribute 194 contains the temperature but some vendors use a different one (e.g 190). Should SMFC make it configurable?

@petersulyok
Copy link
Owner

Hi @mregni, could you please share the output of this command from your machine?

smartctl -a /dev/sdg

@mregni
Copy link
Author

mregni commented Jun 18, 2023

@petersulyok, didn't see your message yesterday sorry.
Here is the output

root@mars:/home/reggi# smartctl -a /dev/sdg
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-73-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               IBM-ESXS
Product:              ST14000NM0288 E
Revision:             ECH8
Compliance:           SPC-5
User Capacity:        13,902,809,137,152 bytes [13.9 TB]
Logical block size:   4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c500adad4e17
Serial number:        ZHZ2LPKD0000C9362E9U
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Sun Jun 18 19:44:01 2023 UTC
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification = 0
Total blocks reassigned during format = 0
Total new blocks reassigned = 0
Power on minutes since format = 78982
Current Drive Temperature:     37 C
Drive Trip Temperature:        65 C

Accumulated power on time, hours:minutes 27796:02
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0     549780.899           0
write:         0        0         0         0          0     461192.606           0
verify:        0        0         0         0          0      24936.628           0

Non-medium error count:        0

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Aborted (by user command)   -       8                 - [-   -    -]

Long (extended) Self-test duration: 65535 seconds [1092.2 minutes]

@petersulyok
Copy link
Owner

Strange that smartctl did not show SMART attributes in the report. Maybe because of the self test?
I'm looking for something like this:

root@home:~# smartctl -A /dev/sda
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-9-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   127   127   054    Old_age   Offline      -       112
  3 Spin_Up_Time            0x0007   162   162   024    Pre-fail  Always       -       406 (Average 407)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       2190
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   100   100   067    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   140   140   020    Old_age   Offline      -       15
  9 Power_On_Hours          0x0012   097   097   000    Old_age   Always       -       23802
 10 Spin_Retry_Count        0x0012   100   100   060    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       268
 22 Unknown_Attribute       0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   097   097   000    Old_age   Always       -       4008
193 Load_Cycle_Count        0x0012   097   097   000    Old_age   Always       -       4008
194 Temperature_Celsius     0x0002   232   232   000    Old_age   Always       -       28 (Min/Max 17/45)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       1

Could you please send this as well? We could parse the temperature value from this attribute list.

@mregni
Copy link
Author

mregni commented Jun 19, 2023

Got some bad news. Seems that SCSI disks don't have the extended SMART attribute list as SATA disks do. I found a good explenation on the TrueNAS forum here from 5 years ago.

Someone else suggested sdparm to read the SAS mode pages but I don't see any useful info for temperature there:

root@mars:/home/reggi# sdparm -a /dev/sdg
    /dev/sdg: IBM-ESXS  ST14000NM0288 E   ECH8
Read write error recovery mode page:
  AWRE          1  [cha: y, def:  1, sav:  1]
  ARRE          1  [cha: y, def:  1, sav:  1]
  TB            0  [cha: y, def:  0, sav:  0]
  RC            0  [cha: n, def:  0, sav:  0]
  EER           0  [cha: y, def:  0, sav:  0]
  PER           0  [cha: y, def:  0, sav:  0]
  DTE           0  [cha: y, def:  0, sav:  0]
  DCR           0  [cha: y, def:  0, sav:  0]
  RRC           20  [cha: y, def: 20, sav: 20]
  COR_S         0  [cha: n, def:  0, sav:  0]
  HOC           0  [cha: n, def:  0, sav:  0]
  DSOC          0  [cha: n, def:  0, sav:  0]
  LBPERE        0  [cha: n, def:  0, sav:  0]
  MWR           0  [cha: n, def:  0, sav:  0]
  WRC           5  [cha: y, def:  5, sav:  5]
  RTL           10000  [cha: y, def:10000, sav:10000]
Disconnect-reconnect (SPC + transports) mode page:
  BFR           0  [cha: n, def:  0, sav:  0]
  BER           0  [cha: n, def:  0, sav:  0]
  BIL           0  [cha: y, def:  0, sav:  0]
  DTL           0  [cha: n, def:  0, sav:  0]
  CTL           0  [cha: y, def:  0, sav:  0]
  MBS           164  [cha: y, def:164, sav:164]
  EMDP          0  [cha: n, def:  0, sav:  0]
  FA            0  [cha: n, def:  0, sav:  0]
  DIMM          0  [cha: y, def:  0, sav:  0]
  DTDC          0  [cha: n, def:  0, sav:  0]
  FBS           0  [cha: n, def:  0, sav:  0]
Format (SBC) mode page:
  TPZ           1  [cha: n, def:  1, sav:  1]
  ASPZ          0  [cha: n, def:  0, sav:  0]
  ATPZ          0  [cha: n, def:  0, sav:  0]
  ATPLU         2  [cha: n, def:  2, sav:  2]
  SPT           3  [cha: n, def:  3, sav:  3]
  DBPPS         4096  [cha: n, def:4096, sav:4096]
  INTLV         1  [cha: n, def:  1, sav:  1]
  TSF           0  [cha: n, def:  0, sav:  0]
  CSF           0  [cha: n, def:  0, sav:  0]
  SSEC          0  [cha: n, def:  0, sav:  0]
  HSEC          1  [cha: n, def:  1, sav:  1]
  RMB           0  [cha: n, def:  0, sav:  0]
  SURF          0  [cha: n, def:  0, sav:  0]
Rigid disk (SBC) mode page:
  NOC           499914  [cha: n, def:499914, sav:499914]
  NOH           16  [cha: n, def: 16, sav: 16]
  SCWP          0  [cha: n, def:  0, sav:  0]
  SCRWC         0  [cha: n, def:  0, sav:  0]
  DSR           0  [cha: n, def:  0, sav:  0]
  LZC           0  [cha: n, def:  0, sav:  0]
  RPL           0  [cha: n, def:  0, sav:  0]
  ROTO          0  [cha: n, def:  0, sav:  0]
  MRR           7200  [cha: n, def:7200, sav:7200]
Verify error recovery (SBC) mode page:
  V_EER         0  [cha: y, def:  0, sav:  0]
  V_PER         0  [cha: y, def:  0, sav:  0]
  V_DTE         0  [cha: y, def:  0, sav:  0]
  V_DCR         0  [cha: y, def:  0, sav:  0]
  V_RC          20  [cha: y, def: 20, sav: 20]
  V_COR_S       0  [cha: n, def:  0, sav:  0]
  V_RTL         10000  [cha: y, def:10000, sav:10000]
Caching (SBC) mode page:
  IC            0  [cha: y, def:  0, sav:  0]
  ABPF          0  [cha: n, def:  0, sav:  0]
  CAP           0  [cha: y, def:  0, sav:  0]
  DISC          1  [cha: y, def:  1, sav:  1]
  SIZE          0  [cha: n, def:  0, sav:  0]
  WCE           0  [cha: y, def:  0, sav:  0]
  MF            0  [cha: n, def:  0, sav:  0]
  RCD           0  [cha: y, def:  0, sav:  0]
  DRRP          0  [cha: n, def:  0, sav:  0]
  WRP           0  [cha: n, def:  0, sav:  0]
  DPTL          -1  [cha: n, def: -1, sav: -1]
  MIPF          0  [cha: y, def:  0, sav:  0]
  MAPF          -1  [cha: y, def: -1, sav: -1]
  MAPFC         -1  [cha: n, def: -1, sav: -1]
  FSW           0  [cha: y, def:  0, sav:  0]
  LBCSS         0  [cha: n, def:  0, sav:  0]
  DRA           0  [cha: y, def:  0, sav:  0]
  SYNC_PROG     0  [cha: n, def:  0, sav:  0]
  NV_DIS        0  [cha: n, def:  0, sav:  0]
  NCS           16  [cha: y, def: 16, sav: 16]
  CSS           0  [cha: n, def:  0, sav:  0]
Control mode page:
  TST           1  [cha: y, def:  1, sav:  1]
  TMF_ONLY      1  [cha: y, def:  1, sav:  1]
  DPICZ         1  [cha: y, def:  0, sav:  1]
  D_SENSE       0  [cha: y, def:  1, sav:  0]
  GLTSD         0  [cha: y, def:  0, sav:  0]
  RLEC          0  [cha: y, def:  0, sav:  0]
  QAM           1  [cha: y, def:  1, sav:  1]
  NUAR          0  [cha: n, def:  0, sav:  0]
  QERR          0  [cha: y, def:  0, sav:  0]
  RAC           0  [cha: n, def:  0, sav:  0]
  UA_INTLCK     0  [cha: n, def:  0, sav:  0]
  SWP           0  [cha: n, def:  0, sav:  0]
  ATO           1  [cha: n, def:  1, sav:  1]
  TAS           0  [cha: n, def:  0, sav:  0]
  ATMPE         0  [cha: n, def:  0, sav:  0]
  RWWP          0  [cha: y, def:  0, sav:  0]
  SBLP          0  [cha: n, def:  0, sav:  0]
  AUTOLOAD      0  [cha: n, def:  0, sav:  0]
  BTP           0  [cha: n, def:  0, sav:  0]
  ESTCT         -1  [cha: n, def: -1, sav: -1]
Control extension mode page:
  DLC           0  [cha: n, def:  0, sav:  0]
  TCMOS         0  [cha: y, def:  0, sav:  0]
  SCSIP         1  [cha: y, def:  1, sav:  1]
  IALUAE        0  [cha: n, def:  0, sav:  0]
  INIT_PR       0  [cha: n, def:  0, sav:  0]
  MSDL          0  [cha: y, def:  0, sav:  0]
Protocol specific logical unit mode page:
  LUPID         6  [cha: n, def:  6, sav:  6]
Protocol specific port mode page:
  PPID          6  [cha: n, def:  6, sav:  6]
Power condition mode page:
  PM_BG         0  [cha: n, def:  0, sav:  0]
  STANDBY_Y     0  [cha: y, def:  0, sav:  0]
  IDLE_C        0  [cha: y, def:  0, sav:  0]
  IDLE_B        0  [cha: y, def:  0, sav:  0]
  IDLE          0  [cha: y, def:  0, sav:  0]
  STANDBY       0  [cha: y, def:  0, sav:  0]
  ICT           1  [cha: y, def:  1, sav:  1]
  SCT           9000  [cha: y, def:9000, sav:9000]
  IBCT          1200  [cha: y, def:1200, sav:1200]
  ICCT          6000  [cha: y, def:6000, sav:6000]
  SYCT          6000  [cha: y, def:6000, sav:6000]
  CCF_IDLE      1  [cha: y, def:  1, sav:  1]
  CCF_STAND     1  [cha: y, def:  1, sav:  1]
  CCF_STOPP     2  [cha: y, def:  2, sav:  2]
Power consumption mode page:
  ACT_LEV       0  [cha: n, def:  0, sav:  0]
  PC_ID         0  [cha: y, def:  0, sav:  0]
Informational exceptions control mode page:
  PERF          1  [cha: y, def:  1, sav:  1]
  EBF           0  [cha: n, def:  0, sav:  0]
  EWASC         1  [cha: y, def:  1, sav:  1]
  DEXCPT        0  [cha: y, def:  0, sav:  0]
  TEST          0  [cha: y, def:  0, sav:  0]
  EBACKERR      0  [cha: n, def:  0, sav:  0]
  LOGERR        0  [cha: y, def:  0, sav:  0]
  MRIE          6  [cha: y, def:  4, sav:  6]
  INTT          36000  [cha: y, def:36000, sav:36000]
  REPC          0  [cha: y, def:  0, sav:  0]
Background control (SBC) mode page:
  S_L_FULL      0  [cha: n, def:  0, sav:  0]
  LOWIR         0  [cha: n, def:  0, sav:  0]
  EN_BMS        0  [cha: y, def:  0, sav:  0]
  EN_PS         0  [cha: n, def:  0, sav:  0]
  BMS_I         144  [cha: y, def:144, sav:144]
  BPS_TL        24  [cha: y, def: 24, sav: 24]
  MIN_IDLE      500  [cha: y, def:500, sav:500]
  MAX_SUSP      0  [cha: y, def:  0, sav:  0]

So the only temperature I can get from the SCSI disks are in the smartctl -a /dev/sdg output I gave you in my previous comment:

Current Drive Temperature:     37 C
Drive Trip Temperature:        65 C

Maybe you can parse it from there when no SMART attributes are found in the output?

@petersulyok
Copy link
Owner

Could you please also check if hddtemp can display the temperature properly?
If the package is not available on your system, the manual installation instructions can be found here.

I'm wondering if the program is compatible with your SCSI disks.

@mregni
Copy link
Author

mregni commented Jun 21, 2023

Just installed hddtemp manually and looks like it can read the temperature as well yes

root@mars:/usr/share/misc# hddtemp /dev/sdg
/dev/sdg: IBM-ESXSST14000NM0288 E: 38°C

Gives the same result as smartctl -a /dev/sdg

...
Current Drive Temperature:     38 C
...

@petersulyok
Copy link
Owner

petersulyok commented Jun 27, 2023

Hi @mregni and @thiete, I created the first version of the hddtemp based temperature reading for SAS/SCSI drives.
You can find it on the temp_source branch (only smfc.py and smfc.conf changed here).

Please test it and report back your experiences.

Changes:

  • path for hddtemp can be specified in the configuraiton file
  • hddtemp will be used automatically if there is a scsi- tag in the name of the hard disk
  • NVME SSDs are also supported with automatic name recognition
  • the different SATA, NVME, SAS/SCSI drives can be mixed in hd_names=

@thiete
Copy link

thiete commented Jun 27, 2023

Working well over here! Thanks a lot for implementing this!

(For reference, I have a mixture of SAS and SATA drives, connected either to motherboard SATA ports or through Dell PERC H310 and H200 cards.)

@mregni
Copy link
Author

mregni commented Jun 28, 2023

This is awesome, works on my system as well (mixed SCSI and SATA disks as well)
Thanks for the quick fix!

@petersulyok
Copy link
Owner

Thanks for the testing and the feedback.
I will release version 3.0 soon and this feature will be there.

@petersulyok
Copy link
Owner

Hi @Rhyolite1, could you please copy your config here?
It would help me understanding your problem better.

@Rhyolite1
Copy link

Rhyolite1 commented Aug 3, 2023

smfc.conf
smfc service configuration parameters

[Ipmi]
Path for ipmitool (str, default=/usr/bin/ipmitool)
command=/usr/bin/ipmitool
Delay time after changing IPMI fan mode (int, seconds, default=10)
fan_mode_delay=10
Delay time after changing IPMI fan level (int, seconds, default=2)
fan_level_delay=2
CPU and HD zones are swapped (bool, default=0).
swapped_zones=0

[CPU zone]
Fan controller enabled (bool, default=0)
enabled=1
Number of CPUs (int, default=1)
count=1
Calculation method for CPU temperatures (int, [0-minimum, 1-average, 2-maximum], default=1)
temp_calc=1
Discrete steps in mapping of temperatures to fan level (int, default=6)
steps=6
Threshold in temperature change before the fan controller reacts (float, C, default=3.0)
sensitivity=3.0
Polling time interval for reading temperature (int, sec, default=2)
polling=2
Minimum CPU temperature (float, C, default=30.0)
min_temp=30.0
Maximum CPU temperature (float, C, default=60.0)
max_temp=60.0
Minimum CPU fan level (int, %, default=35)
min_level=35
Maximum CPU fan level (int, %, default=100)
max_level=100
Optional parameter, it will be generated automatically (can be used for testing and in special cases).
Path for CPU sys/hwmon/coretemp file(s) (str multi-line list, default=/sys/devices/platform/coretemp.0/hwmon/hwmon*/temp1_input)
hwmon_path=/sys/devices/platform/coretemp.0/hwmon/hwmon*/temp1_input
/sys/devices/platform/coretemp.1/hwmon/hwmon*/temp1_input

[HD zone]
Fan controller enabled (bool, default=0)
enabled=1
Number of HDs (int, default=1)
count=6
Calculation of HD temperatures (int, [0-minimum, 1-average, 2-maximum], default=1)
temp_calc=1
Discrete steps in mapping of temperatures to fan level (int, default=4)
steps=4
Threshold in temperature change before the fan controller reacts (float, C, default=2.0)
sensitivity=2.0
Polling interval for reading temperature (int, sec, default=10)
polling=10
Minimum HD temperature (float, C, default=32.0)
min_temp=32.0
Maximum HD temperature (float, C, default=46.0)
max_temp=46.0
Minimum HD fan level (int, %, default=35)
min_level=35
Maximum HD fan level (int, %, default=100)
max_level=100
Names of the HDs (str multi-line list, default=)
These names MUST BE specified in '/dev/disk/by-id/...' form!
hd_names=/dev/disk/by-id/scsi-SATA_Samsung_SSD_870_S625NJ0R147671M
/dev/disk/by-id/scsi-SATA_ST10000DM0004-2G_ZJV6ARVY
/dev/disk/by-id/scsi-SATA_WDC_WD100EMAZ-00_1EG416EZ
/dev/disk/by-id/scsi-SATA_WDC_WD100EMAZ-00_JEK4R3NN
/dev/disk/by-id/scsi-SATA_WDC_WD140EDGZ-11_9MH5Z3RK
/dev/disk/by-id/nvme-Samsung_SSD_990_PRO_with_Heatsink_1TB_S73JNJ0W402020P
Optional parameter, it will be generated automatically based on the disk names.
List of files in /sys/hwmon file system or 'hddtemp' (str multi-line list, default=)
hwmon_path=/sys/class/scsi_disk/0:0:0:0/device/hwmon/hwmon*/temp1_input
/sys/class/scsi_disk/1:0:0:0/device/hwmon/hwmon*/temp1_input
hddtemp
Standby guard feature for RAID arrays (bool, default=0)
standby_guard_enabled=0
Number of HDs already in STANDBY state before the full RAID array will be forced to it (int, default=1)
standby_hd_limit=1
Path for 'smartctl' command (str, default=/usr/sbin/smartctl).
Required for 'standby guard' feature only
smartctl_path=/usr/sbin/smartctl
Path for 'hddtemp' command (str, default=/usr/sbin/hddtemp).
Required for reading of the temperature in case of SAS/SCSI disks.
hddtemp_path=/usr/sbin/hddtemp

@petersulyok
Copy link
Owner

Thanks. Without seeing the #s not sure which part of the configuration is live and which one is commented out.
It seems that you use SATA disks on a SCSI or SAS adapter (not a SATA controller).

In this thread we discussed a problem where drivetemp kernel module was not compatible with SCSI disks.
I assume you loaded this kernel module, right? You can check back with sensors command (from lm-sensors package) if the kernel can see the temperature of these drives. In my case I see entries in the output of the sensors command for each hard disks:

root@home:~# sensors
drivetemp-scsi-7-0
Adapter: SCSI adapter
temp1:        +28.0°C  (low  =  +0.0°C, high = +65.0°C)
                       (crit low = -40.0°C, crit = +70.0°C)
                       (lowest = +22.0°C, highest = +40.0°C)

If the disks temperatures are visible for the sensors command then NVME drive caused problem for you. It is surely not compatible with drivetemp.

@Rhyolite1
Copy link

Rhyolite1 commented Aug 3, 2023

@petersulyok You are correct, yea. I'm sorry to have wasted your time, I cannot believe I missed that. Yea, I have a SAS adapter.

@petersulyok
Copy link
Owner

@Rhyolite1, not a problem at all 😉. We are learning together here, since I don't know exactly how smfc is running on your HW config. The outcome is important for me as well.

I would appreciate your feedback with the output of the sensors command to understand if your hard disks are supported by drivertemp module or not. Thanks.

@Rhyolite1
Copy link

@petersulyok I would be happy to post if you want to see it. However, after correcting the NVME on the list and going over my work noticing that I didn't add one module to /etc/modules. Upon making these changes, everything worked as it should. Sorry to have wasted some of your time. This is really cool thank you for making it.

@petersulyok
Copy link
Owner

@Rhyolite1, it is nice to hear that smfc is finally working for you. The conclusion is also interesting that SATA drives are working with drivetemp module through a SAS/SCSI adapter.

@Rhyolite1
Copy link

@petersulyok If there's any information that you would like from my setup to help future posters, please let me know!

@petersulyok
Copy link
Owner

petersulyok commented Aug 17, 2023

Support of SAS/SCSI disks is part of the smfc v3.0.0 release.
I close this issue and I plan to add a question and answer in the Q&A session about this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants