Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipmi_sensors should split multiple sensor readings with the same name #3416

Closed
envy opened this issue Nov 1, 2017 · 7 comments
Closed

ipmi_sensors should split multiple sensor readings with the same name #3416

envy opened this issue Nov 1, 2017 · 7 comments

Comments

@envy
Copy link

envy commented Nov 1, 2017

Feature Request

I'm monitoring a Dell R630 with telegraf and ipmi_sensors. The server has two CPU sockets and therefore reports temperatures for both CPUs. It would be nice, if I could access those two readings seperately as they are currently combined into one reading.

Proposal:

If multiple instances of a sensor are detected, they should be split. Example the Temp sensor:

$ ipmitool -I lanplus -H 10.1.100.103 -U root sdr
Password:
SEL              | Not Readable      | ns
Intrusion        | 0x00              | ok
Fan1A            | 5160 RPM          | ok
Fan2A            | 5160 RPM          | ok
Fan3A            | 5040 RPM          | ok
Fan4A            | 4920 RPM          | ok
Fan5A            | 5160 RPM          | ok
Fan6A            | 5040 RPM          | ok
Inlet Temp       | 18 degrees C      | ok
CPU Usage        | 8 percent         | ok
IO Usage         | 0 percent         | ok
MEM Usage        | 0 percent         | ok
SYS Usage        | 9 percent         | ok
Exhaust Temp     | 29 degrees C      | ok
Temp             | 39 degrees C      | ok
Temp             | disabled          | ns
... 

Current behavior:

The output comes from a server, which has only one CPU but it still applies:

$ telegraf --config-directory /etc/telegraf/telegraf.d/ --input-filter ipmi_sensor --test
* Plugin: inputs.ipmi_sensor, Collection 1
* Internal: 30s
...
> ipmi_sensor,unit=degrees_c,host=monitoring,name=temp,server=esxi0-mgmt.example.org status=1i,value=42 1509562189000000000
> ipmi_sensor,host=monitoring,name=temp,server=esxi0-mgmt.example.org value=0,status=0i 1509562189000000000
...

Two values are inserted for temp: 42 and 0. This makes it either difficult (only on CPU) or impossible (two or more CPUs) to find out the temperature of CPU0/CPU1/.../CPUn.

Desired behavior:

Instead of just one temp reading, a second one should be created to hold the second value.
I propose the following naming schemes for further discussion:

  1. temp_0, temp_1, ..., temp_n
    Could be not that easy to implement, as you would need to look first for multiple instances before recording data.
  2. temp, temp_1, ..., temp_n
    First measurement is always called by its name. Further instances are suffixed with a number

For backwards compatibility reasons I would suggest the second option. This is also easier to implement, as you just need to create a new one if one with the same name already exists for the current parsing run.

Use case: [Why is this important (helps with prioritizing requests)]

Currently it is not possible to distinguish both temperature readings after the data has left telegraf, as only one measurement exists. Monitoring a multi CPU system's temperatures is therefore not possible. For servers with only one CPU but two sockets it is possbile but difficult to get the temperature as you need to employ e.g. the max function of InfluxDB to get only the non-zero value out of the series.

@danielnelson
Copy link
Contributor

I like the second suggestion, though in the specific case above it seems like we should skip values like disabled that do not have a value.

@envy
Copy link
Author

envy commented Nov 1, 2017

Yes, that would also be nice. I try to get some ipmitool output of some other Dell servers tomorrow that actually have two CPUs to see what the difference in output is (if there is any).

@envy
Copy link
Author

envy commented Nov 2, 2017

So I gathered some output from two servers at work, one with IDRAC 6 and one with IDRAC 7: https://gist.github.com/envy/91cb29370cbf8414c58402014ff4f8df

The IDRAC 7 output matches the IDRAC 8 output of my R630 shown above.
The IDRAC 6 output lists Temp even more often, however, no actual value is given. I don't know why yet, maybe it's a permission problem as I'm not the administrator of that machine.

Regarding skipping values, maybe it would be good to combine this with #3207 to filter out readings that have a status of ns/na ?

@danielnelson
Copy link
Contributor

Oh right, I had forgotten about this other issue. So we'll just do the temp, temp_1 suggestion to complete this issue.

@jhg03a
Copy link
Contributor

jhg03a commented Jan 19, 2018

The information to properly disambiguate the information is present, but it changes the output to the parser.

> sudo ipmitool sdr
SEL              | Not Readable      | ns
Intrusion        | 0x00              | ok
Fan1             | 5040 RPM          | ok
Fan2             | 5160 RPM          | ok
...snip...
Riser 2 Presence | 0x00              | ok
Riser 3 Presence | 0x00              | ok
Temp             | 58 degrees C      | ok
Temp             | 71 degrees C      | ok
> sudo ipmitool sdr elist
SEL              | 72h | ns  |  7.1 | No Reading
Intrusion        | 73h | ok  |  7.1 |
Fan1             | 30h | ok  |  7.1 | 5040 RPM
Fan2             | 31h | ok  |  7.1 | 5160 RPM
...snip...
Riser 2 Presence | 4Eh | ok  |  7.1 | Connected
Riser 3 Presence | 4Fh | ok  |  7.1 | Connected
Temp             | 0Eh | ok  |  3.1 | 58 degrees C
Temp             | 0Fh | ok  |  3.2 | 69 degrees C

So in the case where you have a duplicate sensor name, you can probably disabiguate it using the entity id, for example Temp and 3.1 versus 3.2 in this case.

@danielnelson
Copy link
Contributor

One option is to add a switch for the new format, which would free up breaking changes and we could add the entity_id as a tag.

[[inputs.ipmi_sensor]]
  schema_version = 2

@jhg03a
Copy link
Contributor

jhg03a commented Feb 5, 2018

I've got a complete PR for this. Working through process to get it submitted upstream.

@jhg03a jhg03a mentioned this issue Jul 22, 2018
3 tasks
@reimda reimda changed the title impi_sensors should split multiple sensor readings with the same name ipmi_sensors should split multiple sensor readings with the same name Apr 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants