Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ipmi.v2.schema #4450

Merged
merged 4 commits into from
Jul 31, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions etc/telegraf.conf
Original file line number Diff line number Diff line change
Expand Up @@ -1976,6 +1976,9 @@
# ## Timeout for the ipmitool command to complete
# timeout = "20s"

# ## Schema Version: (Optional, defaults to version 1)
# schemaVersion = 2


# # Gather packets and bytes counters from Linux ipsets
# [[inputs.ipset]]
Expand Down
63 changes: 48 additions & 15 deletions plugins/inputs/ipmi_sensor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ If no servers are specified, the plugin will query the local machine sensor stat
```
ipmitool sdr
```
or with the version 2 schema:
```
ipmitool sdr elist
```

When one or more servers are specified, the plugin will use the following command to collect remote host sensor stats:

Expand Down Expand Up @@ -41,19 +45,36 @@ ipmitool -I lan -H SERVER -U USERID -P PASSW0RD sdr

## Timeout for the ipmitool command to complete. Default is 20 seconds.
timeout = "20s"

## Schema Version: (Optional, defaults to version 1)
metric_version = 2
```

### Measurements

Version 1 schema:
- ipmi_sensor:
- tags:
- name
- unit
- host
- server (only when retrieving stats from remote servers)
- fields:
- status (int)
- status (int, 1=ok status_code/0=anything else)
- value (float)

Version 2 schema:
- ipmi_sensor:
- tags:
- name
- entity_id (can help uniquify duplicate names)
- status_code (two letter code from IPMI documentation)
- status_desc (extended status description field)
- unit (only on analog values)
- host
- server (only when retrieving stats from remote)
- fields:
- value (float)

#### Permissions

Expand All @@ -68,24 +89,36 @@ KERNEL=="ipmi*", MODE="660", GROUP="telegraf"

### Example Output

#### Version 1 Schema
When retrieving stats from a remote server:
```
ipmi_sensor,server=10.20.2.203,unit=degrees_c,name=ambient_temp status=1i,value=20 1458488465012559455
ipmi_sensor,server=10.20.2.203,unit=feet,name=altitude status=1i,value=80 1458488465012688613
ipmi_sensor,server=10.20.2.203,unit=watts,name=avg_power status=1i,value=220 1458488465012776511
ipmi_sensor,server=10.20.2.203,unit=volts,name=planar_3.3v status=1i,value=3.28 1458488465012861875
ipmi_sensor,server=10.20.2.203,unit=volts,name=planar_vbat status=1i,value=3.04 1458488465013072508
ipmi_sensor,server=10.20.2.203,unit=rpm,name=fan_1a_tach status=1i,value=2610 1458488465013137932
ipmi_sensor,server=10.20.2.203,unit=rpm,name=fan_1b_tach status=1i,value=1775 1458488465013279896
ipmi_sensor,server=10.20.2.203,name=uid_light value=0,status=1i 1517125513000000000
ipmi_sensor,server=10.20.2.203,name=sys._health_led status=1i,value=0 1517125513000000000
ipmi_sensor,server=10.20.2.203,name=power_supply_1,unit=watts status=1i,value=110 1517125513000000000
ipmi_sensor,server=10.20.2.203,name=power_supply_2,unit=watts status=1i,value=120 1517125513000000000
ipmi_sensor,server=10.20.2.203,name=power_supplies value=0,status=1i 1517125513000000000
ipmi_sensor,server=10.20.2.203,name=fan_1,unit=percent status=1i,value=43.12 1517125513000000000
```


When retrieving stats from the local machine (no server specified):
```
ipmi_sensor,name=uid_light value=0,status=1i 1517125513000000000
ipmi_sensor,name=sys._health_led status=1i,value=0 1517125513000000000
ipmi_sensor,name=power_supply_1,unit=watts status=1i,value=110 1517125513000000000
ipmi_sensor,name=power_supply_2,unit=watts status=1i,value=120 1517125513000000000
ipmi_sensor,name=power_supplies value=0,status=1i 1517125513000000000
ipmi_sensor,name=fan_1,unit=percent status=1i,value=43.12 1517125513000000000
```

#### Version 2 Schema

When retrieving stats from the local machine (no server specified):
```
ipmi_sensor,unit=degrees_c,name=ambient_temp status=1i,value=20 1458488465012559455
ipmi_sensor,unit=feet,name=altitude status=1i,value=80 1458488465012688613
ipmi_sensor,unit=watts,name=avg_power status=1i,value=220 1458488465012776511
ipmi_sensor,unit=volts,name=planar_3.3v status=1i,value=3.28 1458488465012861875
ipmi_sensor,unit=volts,name=planar_vbat status=1i,value=3.04 1458488465013072508
ipmi_sensor,unit=rpm,name=fan_1a_tach status=1i,value=2610 1458488465013137932
ipmi_sensor,unit=rpm,name=fan_1b_tach status=1i,value=1775 1458488465013279896
ipmi_sensor,name=uid_light,entity_id=23.1,status_code=ok,status_desc=ok value=0 1517125474000000000
ipmi_sensor,name=sys._health_led,entity_id=23.2,status_code=ok,status_desc=ok value=0 1517125474000000000
ipmi_sensor,entity_id=10.1,name=power_supply_1,status_code=ok,status_desc=presence_detected,unit=watts value=110 1517125474000000000
ipmi_sensor,name=power_supply_2,entity_id=10.2,status_code=ok,unit=watts,status_desc=presence_detected value=125 1517125474000000000
ipmi_sensor,name=power_supplies,entity_id=10.3,status_code=ok,status_desc=fully_redundant value=0 1517125474000000000
ipmi_sensor,entity_id=7.1,name=fan_1,status_code=ok,status_desc=transition_to_running,unit=percent value=43.12 1517125474000000000
```
137 changes: 115 additions & 22 deletions plugins/inputs/ipmi_sensor/ipmi.go
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
package ipmi_sensor

import (
"bufio"
"bytes"
"fmt"
"os/exec"
"regexp"
"strconv"
"strings"
"sync"
Expand All @@ -14,14 +17,20 @@ import (
)

var (
execCommand = exec.Command // execCommand is used to mock commands in tests.
execCommand = exec.Command // execCommand is used to mock commands in tests.
re_v1_parse_line = regexp.MustCompile(`^(?P<name>[^|]*)\|(?P<description>[^|]*)\|(?P<status_code>.*)`)
re_v2_parse_line = regexp.MustCompile(`^(?P<name>[^|]*)\|[^|]+\|(?P<status_code>[^|]*)\|(?P<entity_id>[^|]*)\|(?:(?P<description>[^|]+))?`)
re_v2_parse_description = regexp.MustCompile(`^(?P<analogValue>[0-9.]+)\s(?P<analogUnit>.*)|(?P<status>.+)|^$`)
re_v2_parse_unit = regexp.MustCompile(`^(?P<realAnalogUnit>[^,]+)(?:,\s*(?P<statusDesc>.*))?`)
)

// Ipmi stores the configuration values for the ipmi_sensor input plugin
type Ipmi struct {
Path string
Privilege string
Servers []string
Timeout internal.Duration
Path string
Privilege string
Servers []string
Timeout internal.Duration
MetricVersion int
}

var sampleConfig = `
Expand All @@ -46,16 +55,22 @@ var sampleConfig = `

## Timeout for the ipmitool command to complete
timeout = "20s"

## Schema Version: (Optional, defaults to version 1)
metric_version = 2
`

// SampleConfig returns the documentation about the sample configuration
func (m *Ipmi) SampleConfig() string {
return sampleConfig
}

// Description returns a basic description for the plugin functions
func (m *Ipmi) Description() string {
return "Read metrics from the bare metal servers via IPMI"
}

// Gather is the main execution function for the plugin
func (m *Ipmi) Gather(acc telegraf.Accumulator) error {
if len(m.Path) == 0 {
return fmt.Errorf("ipmitool not found: verify that ipmitool is installed and that ipmitool is in your PATH")
Expand Down Expand Up @@ -93,23 +108,33 @@ func (m *Ipmi) parse(acc telegraf.Accumulator, server string) error {
opts = conn.options()
}
opts = append(opts, "sdr")
if m.MetricVersion == 2 {
opts = append(opts, "elist")
}
cmd := execCommand(m.Path, opts...)
out, err := internal.CombinedOutputTimeout(cmd, m.Timeout.Duration)
timestamp := time.Now()
if err != nil {
return fmt.Errorf("failed to run command %s: %s - %s", strings.Join(cmd.Args, " "), err, string(out))
}
if m.MetricVersion == 2 {
return parseV2(acc, hostname, out, timestamp)
}
return parseV1(acc, hostname, out, timestamp)
}

func parseV1(acc telegraf.Accumulator, hostname string, cmdOut []byte, measured_at time.Time) error {
// each line will look something like
// Planar VBAT | 3.05 Volts | ok
lines := strings.Split(string(out), "\n")
for i := 0; i < len(lines); i++ {
vals := strings.Split(lines[i], "|")
if len(vals) != 3 {
scanner := bufio.NewScanner(bytes.NewReader(cmdOut))
for scanner.Scan() {
ipmiFields := extractFieldsFromRegex(re_v1_parse_line, scanner.Text())
if len(ipmiFields) != 3 {
continue
}

tags := map[string]string{
"name": transform(vals[0]),
"name": transform(ipmiFields["name"]),
}

// tag the server is we have one
Expand All @@ -118,38 +143,106 @@ func (m *Ipmi) parse(acc telegraf.Accumulator, server string) error {
}

fields := make(map[string]interface{})
if strings.EqualFold("ok", trim(vals[2])) {
if strings.EqualFold("ok", trim(ipmiFields["status_code"])) {
fields["status"] = 1
} else {
fields["status"] = 0
}

val1 := trim(vals[1])

if strings.Index(val1, " ") > 0 {
if strings.Index(ipmiFields["description"], " ") > 0 {
// split middle column into value and unit
valunit := strings.SplitN(val1, " ", 2)
fields["value"] = Atofloat(valunit[0])
valunit := strings.SplitN(ipmiFields["description"], " ", 2)
var err error
fields["value"], err = aToFloat(valunit[0])
if err != nil {
continue
}
if len(valunit) > 1 {
tags["unit"] = transform(valunit[1])
}
} else {
fields["value"] = 0.0
}

acc.AddFields("ipmi_sensor", fields, tags, time.Now())
acc.AddFields("ipmi_sensor", fields, tags, measured_at)
}

return nil
return scanner.Err()
}

func Atofloat(val string) float64 {
func parseV2(acc telegraf.Accumulator, hostname string, cmdOut []byte, measured_at time.Time) error {
// each line will look something like
// CMOS Battery | 65h | ok | 7.1 |
// Temp | 0Eh | ok | 3.1 | 55 degrees C
// Drive 0 | A0h | ok | 7.1 | Drive Present
scanner := bufio.NewScanner(bytes.NewReader(cmdOut))
for scanner.Scan() {
ipmiFields := extractFieldsFromRegex(re_v2_parse_line, scanner.Text())
if len(ipmiFields) < 3 || len(ipmiFields) > 4 {
continue
}

tags := map[string]string{
"name": transform(ipmiFields["name"]),
}

// tag the server is we have one
if hostname != "" {
tags["server"] = hostname
}
tags["entity_id"] = transform(ipmiFields["entity_id"])
tags["status_code"] = trim(ipmiFields["status_code"])
fields := make(map[string]interface{})
descriptionResults := extractFieldsFromRegex(re_v2_parse_description, trim(ipmiFields["description"]))
// This is an analog value with a unit
if descriptionResults["analogValue"] != "" && len(descriptionResults["analogUnit"]) >= 1 {
var err error
fields["value"], err = aToFloat(descriptionResults["analogValue"])
if err != nil {
continue
}
// Some implementations add an extra status to their analog units
unitResults := extractFieldsFromRegex(re_v2_parse_unit, descriptionResults["analogUnit"])
tags["unit"] = transform(unitResults["realAnalogUnit"])
if unitResults["statusDesc"] != "" {
tags["status_desc"] = transform(unitResults["statusDesc"])
}
} else {
// This is a status value
fields["value"] = 0.0
// Extended status descriptions aren't required, in which case for consistency re-use the status code
if descriptionResults["status"] != "" {
tags["status_desc"] = transform(descriptionResults["status"])
} else {
tags["status_desc"] = transform(ipmiFields["status_code"])
}
}

acc.AddFields("ipmi_sensor", fields, tags, measured_at)
}

return scanner.Err()
}

// extractFieldsFromRegex consumes a regex with named capture groups and returns a kvp map of strings with the results
func extractFieldsFromRegex(re *regexp.Regexp, input string) map[string]string {
submatches := re.FindStringSubmatch(input)
results := make(map[string]string)
for i, name := range re.SubexpNames() {
if name != input && name != "" && input != "" {
results[name] = trim(submatches[i])
}
}
return results
}

// aToFloat converts string representations of numbers to float64 values
func aToFloat(val string) (float64, error) {
f, err := strconv.ParseFloat(val, 64)
if err != nil {
return 0.0
} else {
return f
return 0.0, err
}
return f, nil
}

func trim(s string) string {
Expand Down
Loading