Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thermalzone not working #816

Closed
Ramshield opened this issue Jul 5, 2021 · 35 comments
Closed

Thermalzone not working #816

Ramshield opened this issue Jul 5, 2021 · 35 comments

Comments

@Ramshield
Copy link

Ramshield commented Jul 5, 2021

Hi,

My exporter does not expose any metrics regarding the thermalzone.
It is enabled however:

windows_exporter_collector_duration_seconds{collector="thermalzone"} 0.0060052
windows_exporter_collector_success{collector="thermalzone"} 1
windows_exporter_collector_timeout{collector="thermalzone"} 0
[System.Environment]::OSVersion.Version

Major  Minor  Build  Revision
-----  -----  -----  --------
10     0      19042  0

Exporter version: Starting windows_exporter (version=0.16.0, branch=master, revision=f316d81d50738eb0410b0748c5dcdc6874afe95a)

I run windows exporter with the following arguments:
"C:\Program Files\windows_exporter\windows_exporter.exe" --log.format logger:eventlog?name=windows_exporter --telemetry.addr :9182 --collectors.enabled cpu,cs,logical_disk,logon,memory,net,os,process,service,system,tcp,time,thermalzone,textfile

I'm a Linux engineer, so I have no clue how to troubleshoot something like this. Please advice, thank you!

@breed808
Copy link
Contributor

Are there any relevant logs in the Event Viewer? windows_exporter will log to Windows Logs -> Application.

@luigi311
Copy link

Getting something similar with no thermal data. Looking at the event viewer and filtering for windows_exporter everything is information except 2 which are warnings
No filters specified for process collector. This will generate a very large number of metrics!

Looks like all my things are duplicated twice which is why I have 2 warnings with the same message. Everything else seems to be working though just a thermalzone issue.

@crockk
Copy link

crockk commented Jul 22, 2021

I am also having this issue. I suspect my hardware does not support the thermalzone collector, but I do not know how to validate this.

@breed808
Copy link
Contributor

breed808 commented Aug 8, 2021

Checking if the thermalzone perflib metrics are present would be a good first step:

# List Counter Sets (confirm if "Thermal Zone Information" CounterSet is present)
Get-Counter -ListSet * | Sort-Object -Property CounterSetName | Select CounterSetName
# List counters for set
Get-Counter -ListSet 'Thermal Zone Information' 
# Get a counter from the set
Get-Counter -Counter '\Thermal Zone Information(*)\Temperature'

I get an error ("Get-Counter: Internal performance counter API call failed. Error: 800007d1.") when running the last command, but that may be due to my VM not having access to any hardware temperature sensors.

@Ramshield
Copy link
Author

@breed808 thank you for your reply. I run those commands in PowerShell as Administrator.

PS C:\WINDOWS\system32> Get-Counter -ListSet * | Sort-Object -Property CounterSetName | Select CounterSetName

CounterSetName
--------------
.NET CLR Data
.NET CLR Exceptions
.NET CLR Interop
.NET CLR Jit
.NET CLR Loading
.NET CLR LocksAndThreads
.NET CLR Memory
.NET CLR Networking
.NET CLR Networking 4.0.0.0
.NET CLR Remoting
.NET CLR Security
.NET Data Provider for Oracle
.NET Data Provider for SqlServer
.NET Memory Cache 4.0
{115b92b4-7191-491a-a9b5-93c8e9fb641b}
{7d937e49-cfd5-438f-af4f-b3047d90a5c3}
{f3e82f6e-9df4-425d-a5d5-3a9832005b16}
AppV Client Streamed Data Percentage
Authorization Manager Applications
BitLocker
BITS Net Utilization
Bluetooth Device
Bluetooth Radio
BranchCache
Browser
Cache
Client Side Caching
Database
Database ==> Databases
Database ==> Instances
Database ==> TableClasses
Distributed Routing Table
Distributed Transaction Coordinator
DNS64 Global
Energy Meter
Event Log
Event Tracing for Windows
Event Tracing for Windows Session
Fax Service
FileSystem Disk Activity
Generic IKEv1, AuthIP, and IKEv2
GPU Adapter Memory
GPU Engine
GPU Local Adapter Memory
GPU Non Local Adapter Memory
GPU Process Memory
HTTP Service
HTTP Service Request Queues
HTTP Service Url Groups
Hyper-V Dynamic Memory Integration Service
Hyper-V Hypervisor
Hyper-V Hypervisor Logical Processor
Hyper-V Hypervisor Root Partition
Hyper-V Hypervisor Root Virtual Processor
Hyper-V Virtual Machine Bus Pipes
Hyper-V VM Vid Partition
ICMP
ICMPv6
IPHTTPS Global
IPHTTPS Session
IPsec AuthIP IPv4
IPsec AuthIP IPv6
IPsec Connections
IPsec Driver
IPsec IKEv1 IPv4
IPsec IKEv1 IPv6
IPsec IKEv2 IPv4
IPsec IKEv2 IPv6
IPv4
IPv6
Job Object Details
LogicalDisk
Memory
Microsoft Winsock BSP
MSDTC Bridge 3.0.0.0
MSDTC Bridge 4.0.0.0
NBT Connection
Netlogon
Network Adapter
Network Interface
Network QoS Policy
NUMA Node Memory
Objects
Offline Files
Pacer Flow
Pacer Pipe
PacketDirect EC Utilization
PacketDirect Queue Depth
PacketDirect Receive Counters
PacketDirect Receive Filters
PacketDirect Transmit Counters
Paging File
Peer Name Resolution Protocol
Per Processor Network Activity Cycles
Per Processor Network Interface Card Activity
Physical Network Interface Card Activity
PhysicalDisk
Power Meter
PowerShell Workflow
Print Queue
Process
Processor
Processor Information
RAS
RAS Port
RAS Total
RDMA Activity
ReadyBoost Cache
Redirector
ReFS
RemoteFX Graphics
RemoteFX Network
Search Gatherer
Search Gatherer Projects
Search Indexer
Security Per-Process Statistics
Security System-Wide Statistics
Server
Server Work Queues
ServiceModelEndpoint 3.0.0.0
ServiceModelEndpoint 4.0.0.0
ServiceModelOperation 3.0.0.0
ServiceModelOperation 4.0.0.0
ServiceModelService 3.0.0.0
ServiceModelService 4.0.0.0
SMB Client Shares
SMB Direct Connection
SMB Server
SMB Server Sessions
SMB Server Shares
SMSvcHost 3.0.0.0
SMSvcHost 4.0.0.0
Storage Management WSP Spaces Runtime
Storage Spaces Drt
Storage Spaces Tier
Storage Spaces Virtual Disk
Storage Spaces Write Cache
Synchronization
SynchronizationNuma
System
TCPIP Performance Diagnostics
TCPIP Performance Diagnostics (Per-CPU)
TCPv4
TCPv6
Telephony
Teredo Client
Teredo Relay
Teredo Server
Terminal Services
Terminal Services Session
Thermal Zone Information
Thread
UDPv4
UDPv6
USB
User Input Delay per Process
User Input Delay per Session
WF (System.Workflow) 4.0.0.0
WFP
WFP Classify
WFP Reauthorization
WFPv4
WFPv6
Windows Media Player Metadata
Windows Time Service
Windows Workflow Foundation
WinNAT
WinNAT ICMP
WinNAT Instance
WinNAT TCP
WinNAT UDP
WMI Objects
WorkflowServiceHost 4.0.0.0
WSMan Quota Statistics
XHCI CommonBuffer
XHCI Interrupter
XHCI TransferRing


PS C:\WINDOWS\system32> Get-Counter -ListSet 'Thermal Zone Information'


CounterSetName     : Thermal Zone Information
MachineName        : .
CounterSetType     : SingleInstance
Description        : The Thermal Zone Information performance counter set consists of counters that measure aspects of each thermal zone in the system.
Paths              : {\Thermal Zone Information(*)\High Precision Temperature, \Thermal Zone Information(*)\Throttle Reasons, \Thermal Zone Information(*)\% Passive Limit, \Thermal Zone Information(*)\Temperature}
PathsWithInstances : {\Thermal Zone Information(*)\High Precision Temperature, \Thermal Zone Information(*)\Throttle Reasons, \Thermal Zone Information(*)\% Passive Limit, \Thermal Zone Information(*)\Temperature}
Counter            : {\Thermal Zone Information(*)\High Precision Temperature, \Thermal Zone Information(*)\Throttle Reasons, \Thermal Zone Information(*)\% Passive Limit, \Thermal Zone Information(*)\Temperature}



PS C:\WINDOWS\system32> Get-Counter -Counter '\Thermal Zone Information(*)\Temperature'
Get-Counter : The specified instance is not present.
At line:1 char:1
+ Get-Counter -Counter '\Thermal Zone Information(*)\Temperature'
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidResult: (:) [Get-Counter], Exception
    + FullyQualifiedErrorId : CounterApiError,Microsoft.PowerShell.Commands.GetCounterCommand

This is on a physical PC.

Please let me know what other information I can provide.

@breed808
Copy link
Contributor

breed808 commented Aug 8, 2021

Strange, some searching indicates that this error is returned when not running the query as Administrator 😕

Are you able to query any of the other counters, such as High Precision Temperature? It'd also be worth checking the Performance Monitor to see if any Thermal Zone Information metrics are exposed their.

@Ramshield
Copy link
Author

Ramshield commented Aug 8, 2021

Hi @breed808. Thanks for the quick reply, appreciate it!

I am unable to get any of the other counters in PowerShell, ran as Administrator.

I am unable to get them in Performance Monitor either. So it seems it's a Windows problem.
So I checked Event Viewer and found 5 Warnings from the source PerfProc:

Unable to open the job object \BaseNamedObjects\WmiProviderSubSystemHostJob for query access. The calling process may not have permission to open this job. The first four bytes (DWORD) of the Data section contains the status code.

I ran Performance monitor again as administrator, hoping it would help, but it didn't.
Any suggestions?

EDIT:

I found this article: https://www.tenforums.com/general-support/136109-error-event-1020-perflib-win-10-1903-a.html
It says to run C:\WINDOWS\SysWOW64> Lodctr /R, which I did, twice as the first time resulted in an error.
A new event was logged however:

The Open procedure for service ".NETFramework" in DLL "C:\WINDOWS\system32\mscoree.dll" failed with error code The system cannot find the file specified.. Performance data for this service will not be available.

I tried to install https://dotnet.microsoft.com/download/dotnet-framework/net48 as suggested by Google, but it already says that it's installed. So not sure what to install for that specific .dll file, but I think it's related...

@breed808
Copy link
Contributor

breed808 commented Aug 8, 2021

I've done some more searching and there's mention of repairing the .NET Framework installation to install the missing mscoree.dll file.
Microsoft host a .NET Framework repair tool here: https://www.microsoft.com/en-gb/download/details.aspx?id=30135. I'm not sure how helpful it will be though.

@Ramshield
Copy link
Author

I ran the tool, and tried to run the .NET Framework installer again as said in the tool. Unfortunately it didn't fix the Performance monitor, not even after a reboot.

Stupidly enough, I never checked if mscoree.dll was ever there, but it is now, unfortunately no luck..

@crockk
Copy link

crockk commented Aug 8, 2021

Checking if the thermalzone perflib metrics are present would be a good first step:

# List Counter Sets (confirm if "Thermal Zone Information" CounterSet is present)
Get-Counter -ListSet * | Sort-Object -Property CounterSetName | Select CounterSetName
# List counters for set
Get-Counter -ListSet 'Thermal Zone Information' 
# Get a counter from the set
Get-Counter -Counter '\Thermal Zone Information(*)\Temperature'

I get an error ("Get-Counter: Internal performance counter API call failed. Error: 800007d1.") when running the last command, but that may be due to my VM not having access to any hardware temperature sensors.

I also get the same error on the final command - but I am not running the commands from a VM:

PS C:\Windows\system32> Get-Counter -Counter '\Thermal Zone Information(*)\Temperature'
Get-Counter : Internal performance counter API call failed. Error: 800007d1.
At line:1 char:1
+ Get-Counter -Counter '\Thermal Zone Information(*)\Temperature'
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidResult: (:) [Get-Counter], Exception
    + FullyQualifiedErrorId : CounterApiError,Microsoft.PowerShell.Commands.GetCounterCommand

The first two commands run without error.

@rlabrecque
Copy link

I'm seeing similar results; running Powershell as Administrator:

PS C:\> Get-Counter -ListSet 'Thermal Zone Information'


CounterSetName     : Thermal Zone Information
MachineName        : .
CounterSetType     : SingleInstance
Description        : The Thermal Zone Information performance counter set consists of counters that measure aspects of
                     each thermal zone in the system.
Paths              : {\Thermal Zone Information(*)\High Precision Temperature, \Thermal Zone Information(*)\Throttle
                     Reasons, \Thermal Zone Information(*)\% Passive Limit, \Thermal Zone Information(*)\Temperature}
PathsWithInstances : {\Thermal Zone Information(*)\High Precision Temperature, \Thermal Zone Information(*)\Throttle
                     Reasons, \Thermal Zone Information(*)\% Passive Limit, \Thermal Zone Information(*)\Temperature}
Counter            : {\Thermal Zone Information(*)\High Precision Temperature, \Thermal Zone Information(*)\Throttle
                     Reasons, \Thermal Zone Information(*)\% Passive Limit, \Thermal Zone Information(*)\Temperature}



PS C:\> Get-Counter -Counter '\Thermal Zone Information(*)\Temperature'
Get-Counter : The specified instance is not present.
At line:1 char:1
+ Get-Counter -Counter '\Thermal Zone Information(*)\Temperature'
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidResult: (:) [Get-Counter], Exception
    + FullyQualifiedErrorId : CounterApiError,Microsoft.PowerShell.Commands.GetCounterCommand
PS C:\> [System.Environment]::OSVersion.Version

Platform ServicePack Version      VersionString
-------- ----------- -------      -------------
 Win32NT             10.0.19042.0 Microsoft Windows NT 10.0.19042.0

This is running on Windows Server 20H2, on bare metal with almost nothing else installed or configured. Using windows_exporter v0.16.0.

CPU: AMD Threadripper 3960X

@rlabrecque
Copy link

I think a separate yet related issue here is that windows_exporter_collector_success{collector="thermalzone"} 1 should be 0.

@breed808
Copy link
Contributor

Apologies all, I've checked the thermalzone collector to fix the windows_exporter_collector_success metric, and noted the collector is actually using WMI as the metric source. So the Get-Counter commands may have been a waste of time 😞

Could you run the following and see if any output is returned?

Get-CimInstance -Classname Win32_PerfRawData_Counters_ThermalZoneInformation

I've run this on my testing VM but have received no output or error.

@Ramshield
Copy link
Author

Same here @breed808

PS C:\Windows\system32> Get-CimInstance -Classname Win32_PerfRawData_Counters_ThermalZoneInformation
>>
PS C:\Windows\system32> Get-CimInstance -Classname Win32_PerfRawData_Counters_ThermalZoneInformation
>>
PS C:\Windows\system32>

It looks almost like it expects something extra.

@Ramshield
Copy link
Author

@breed808 Any update/suggestions on this? I don't mind joining an IRC or something so we can troubleshoot this faster if you'd like.

@breed808
Copy link
Contributor

@Ramshield I don't mind supporting over IRC, but I'm not sure if I can be of much more help here.
I think we need someone with more ThermalZone experience, as there seems to be a some prerequisite missing here.

@Ramshield
Copy link
Author

@breed808 Anyone we can mention who might be able to help? :)

@carlpett
Copy link
Collaborator

It's been a few years since I looked at this last time, but from what I recall, the ThermalZone data was very finicky, and requires some driver support which we never managed to pin down exactly what was supposed to provide...
The root issue seemed to be that there's actually no unified API for this, so the conclusion at the time was that it'd be a lot of work to implement this in any other way.
If there are suggestions for how to achieve this though, I think we'd be very happy to replace the current implementation!

@Ramshield
Copy link
Author

Is there any way to take a look at for example Open hardware monitor for inspiration, at the least?
Perhaps they are open for discussion for advice!

@carlpett
Copy link
Collaborator

There was some work on reusing OHM in #727, but it stalled on a mix of licensing issues and whether it was a good integration pattern.

@namxam
Copy link

namxam commented Oct 17, 2021

I am running it on a German system and it seems it cannot collect data as I have to run the following command to get the relevant data Get-Counter -ListSet 'Thermozoneninformationen'. Any ideas on how to deal with non-English systems?

@ottobaer
Copy link

There was some work on reusing OHM in #727, but it stalled on a mix of licensing issues and whether it was a good integration pattern.

Maybe Open Hardware Monitor is a solution. It exposes it's readings to WMI and it's unter the MPL 2.0 license.

http://openhardwaremonitor.org/wordpress/wp-content/uploads/2011/04/OpenHardwareMonitor-WMI.pdf

It seems that it can be interfaced with it's DLL.

https://stackoverflow.com/questions/3262603/accessing-cpu-temperature-in-python

@samuelinho
Copy link

samuelinho commented Feb 10, 2022

I am running it on a German system and it seems it cannot collect data as I have to run the following command to get the relevant data Get-Counter -ListSet 'Thermozoneninformationen'. Any ideas on how to deal with non-English systems?

I'm facing the same issue, in my case it's in spanisht and it seems it cant get temperature values to pass them.
'\Información sobre la zona térmica(*)\Temperatura'

@breed808
Copy link
Contributor

The translated ListSet names dont't match the English name in the collector.

From the previous reports I've seen on this issue, not all ListSets have translation problems (or are not translated). It's something we should address at some stage, else we're excluding entire localizations from running the exporter.

@rlabrecque
Copy link

I was likely on American English when I tried originally and it wasn't working for me.

@breed808
Copy link
Contributor

Yes, there's two issues with the collector that have been raised in this thread:

  1. Unknown dependency preventing thermalzone collector and Perflib commands from returning metrics
  2. Translated name of Thermalzone ListSet preventing collector from working correctly on non-English locales.

Users in this thread are largely experiencing 1), but 2) is also a problem.

@stefangweichinger
Copy link

adding my "me too" here as well. German installation of MS Windows Server 2019.

@DominikRoB
Copy link

Same here (German, empty results set), I think we have a clear pattern

@Nilas1994
Copy link

Nilas1994 commented Apr 22, 2024

Thermalzone not working for some reason tested the collector windows_exporter_collector_success{collector="thermalzone"} which is 0 , it is possible that these are vendor specific classes that aren't always available on all systems. therefor we should enumerate the classes if they are like thermal or temp.

if we do this in powershell we get to see some more
Get-CimClass -Namespace root/cimv2 | Where-Object {$.CimClassName -like "Temp" -or $.CimClassName -like "Thermal" -or $_.CimClassName -like "Cooling"}

image

i also found out there are all zero
image

even if i try to see this it gives nothing
image

So its is surely possible that this information is behind specific vendor classes.

i did some more research on this it depends on the hardware some hardware isent supported but provide monitoring tools which can be used to enumerate CPU temperatures so recommendation is to build it as a custom metric, as example for dell you can use Dell Command | Monitor and maybe schedule a task to update the metrics to a textfile as a workaround.

@jkroepke
Copy link
Member

We also plan an collectors which allows to scrape any perfdata based counters.

@jkroepke
Copy link
Member

jkroepke commented Sep 8, 2024

In summary: Thermalzone is a generic approach from Windows which seems not implemented by each driver vendor. Thats something which can't be fixed by windows_exporter.

Open Hardware Monitor looks good-ish, but it seems getting outdated. No releases since 3 years which sounds suspicious. https://github.com/openhardwaremonitor/openhardwaremonitor

@apeshand
Copy link

Open Hardware Monitor looks good-ish, but it seems getting outdated. No releases since 3 years which sounds suspicious. https://github.com/openhardwaremonitor/openhardwaremonitor

LibreHardwareMonitor is updated fork for OHM.
I am already using this approach to collect thermal data from the win host hardware -> parsing data from LHM using a python script via Telegraf and combining windows_exporter and script output.

@jkroepke
Copy link
Member

The current workaround is https://github.com/nickbabcock/OhmGraphite - it an first class prometheus exporter for hardware sensors.

@apeshand
Copy link

The current workaround is https://github.com/nickbabcock/OhmGraphite - it an first class prometheus exporter for hardware sensors.

Wow! I have not seen this solution before. I will definitely try it today. Thanks for the tip ;-)

@jkroepke
Copy link
Member

I will close this issue.

Thermalzone collector works, but hardware vendors doesn't implement the interfaces.

Workaround is using https://github.com/nickbabcock/OhmGraphite

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests