Graphite S.M.A.R.T. Exporter

Monitors drive S.M.A.R.T. attributes and periodically sends them to a Graphite compatible server as tagged metrics

This script was conceived to be an addon to TrueNAS's own metrics system using Graphite. The goal of this script was to run out-of-the-box on a TrueNAS SCALE and provide detailed S.M.A.R.T. metrics for all S.M.A.R.T. capable devices in the system.

It sends plain-text tagged Graphite metrics, which are compatible with the Prometheus Graphite Exporter out-of-the-box. No configuration is necessary in Graphite Exporter, since it supports tagged metrics. To get all of TrueNAS' metrics into Prometheus, Supporterino's Graphite Mapping File for Prometheus Graphite Exporter is very helpful.

If a TrueNAS system is already monitored, a Graphite compatible receiver endpoint is available as well, so this script plugs right into that endpoint.

Other solutions to get S.M.A.R.T. metrics from a system exist, such as smartctl_exporter or prometheus-smartctl, but these would either need to be run in a privileged container using TrueNAS SCALE's app system (smartctl_exporter) or need additional python modules not available on a TrueNAS installation (prometheus-smartctl). With TrueNAS being an appliance (and moving to a read-only file system with Dragonfish release), installing python modules is not a viable option.

Requirements:

bash
smartctl 7.0
jq with oniguruma library (Developed and tested with v1.6, works with v1.5 as well)
netcat

This script uses smartctl's json output mode, which was introduced with version 7.0, hence the requirement.

Even though this script was developed with TrueNAS SCALE in mind, and this documentation mentions use cases related to this scenario in special, it should be able to run on any machine providing the above prerequisites.

Tested on:

TrueNAS SCALE 23.10.1
Debian 12.4 Bookworm
Synology DSM 7.1 (with manually installed smartctl v7.4 and netcat via Entware)

Might/should work on TrueNAS CORE, test results are welcome!

Key Features

Run at startup and send metrics with customizable frequency
Tags to make metrics resilient against device name changes after reboots
Continue to send last known metrics for devices in STANDBY
Special metrics for device's power state
ZFS Support: Provides pool name in info metric
Manually specify devices to monitor
Manually specify device types
Logging

Disclaimer

This script was heavily inspired by ngandrass's TrueNAS Spindown Timer. In fact, it was developed to be used in conjunction with the truenas-spindown-timer script, since periodic S.M.A.R.T. queries will prevent a device from spinning down on its own. With truenas-spindown-timer, drives will still spin down and not be disturbed by this script.

One of the main motivations for the development of this script was to gather statistics on how often drives are getting spun down and woken up with a specific usage profile to determine whether disk spindown is worth the potential shortening of disk life vs. energy consumption savings.

Usage

Graphite S.M.A.R.T. exporter version 1.1.4
Usage:
  ./graphite_smart_exporter.sh [-h] -d [-p] -n <HOSTNAME> [-f <FREQUENCY>] [-c] [-m <DEVICE>] [-t <DEVICE=TYPE> ] [-v] [-q] [-l <LOG_FILE>] [-s <SMART_TEMP_FILE_NAME>]

Gathers S.M.A.R.T. data about all S.M.A.R.T. capable drives in the system
and sends them as metrics to a Graphite server.

Options:
  -d DESTINATION            : The destnation IP address or host name under which the Graphite
                              server is reachable.
  -p PORT                   : The port the Graphite server is listening on for the plaintext protocol.
  -n HOSTNAME               : The host name to set for the metrics' 'instance' tag.
  -f FREQUENCY              : Frequency metrics are gathered and sent to Graphite with in seconds
                              (default: 300)
  -l LOG_FILE               : Name of the log file to log into. File logging is only enabled if a file name is provided. (default: empty)
  -c                        : Continue sending last known/stale data if a drive is in standby/spun down. If a drive is spun down, S.M.A.R.T. attributes 
                              cannot be read without waking it up. If this option is set, the script continues to send the last known S.M.A.R.T. 
                              metrics for a drive that is spun down to prevent gaps in data.
                              Otherwise no metrics are sent until the drive is awake again.
  -m DEVICE                 : List devices to monitor using this argument, once per drive to minor, e.g. -m /dev/sda -m /dev/sdc
  -t DEVICE=TYPE            : Manually specify the device type for a device. Use this if smartctl device type autodetection does not work for your case. Does NOT disable device discovery. Example: -t /dev/sda=nvme
  -s SMART_TEMP_FILE_NAME   : Name of the temp file the S.M.A.R.T. output is written to during each cycle the script is running.
                              Explicitly set if you plan on running multiple instances of this script to prevent collisions. (default: smart_output.json)
  -o                        : Omit device name tag from info metric. If you're dealing with a system that changes device names frequently, set this flag to avoid multiple time series after the device changed name.
  -q                        : Quiet mode. Outputs are suppressed set. Can not be set if -v is set.
  -v                        : Verbose mode. Prints additional information during execution. File logging is only enabled in verbose mode. Can not be set if -q is set.
  -h                        : Print this help message.
Example usage:
./graphite_smart_exporter.sh -d graphite.mydomain.com -n myhost
./graphite_smart_exporter.sh -d graphite.mydomain.com -p 9198 -n myhost -f 600
./graphite_smart_exporter.sh -d graphite.mydomain.com -n myhost -f 600 -o -m /dev/sda -m /dev/sdc -t /dev/sdc=sat

Set up script to run at TrueNAS startup with custom frequency

To automatically run the script after startup, use TrueNAS' Init/Shutdown Scripts feature.
Download the script onto your machine. In your TrueNAS UI, navigate to System Settings -> Advanced -> Init/Shutdown Scripts and create a new script with the following settings:

Description: Graphite SMART Exporter
Type: Command
Command: sudo /path/to/script/graphite_smart_exporter.sh -d graphite.mydomain.com -n myhost -f 60
When: Post Init

Since smartctl is only available for root, the script must be executed with sudo. ö The command above will send S.M.A.R.T. metrics to your Graphite instance every 60 seconds.

Exported Metrics and Tags

At the moment, the script supports metrics for sat and nvme type of devices.
(If you need additional device types supported, [open an issue] and append an example output of smartctl --json=c -a.)
See graphite_export.md for a sample of this script's exported metrics.

Metrics

The script will export the following metrics:

Metric Name	Device Types	Description
`smart_disk_info`	all	Info metric with the sole purpose to provide tags to be joined to actual metrics using the serial number; has a static value of `1`
`smart_attribute`	`sat`	Standard ATA S.M.A.R.T. attribute
`smart_nvme_attribute`	`nvme`	NVME S.M.A.R.T Health Information
`smart_device_temperature`	all	Device Temperature in °C
`smart_power_cycle_count`	all	Number of count of full power on/off cycles
`smart_power_on_time_hours`	all	Number of hours the device has been powered
`smart_power_status`	all	Indicator whether the device is active or in standby/spun down
`smart_status_passed`	all	Indicating whether the latest S.M.A.R.T. test has passed

Disk Info Tags

The following tags/labels are added to the smart_disk_info info metric:

Tag Name	Description
`model_name`	The device's model name (if present), e.g. `HGST_HUH721010ALE600`
`model_family`	The device's model family (if present), e.g. `HGST_Ultrastar_He10`
`serial_number`	The device's serial number, e.g. `1EHXXXXX`; can be used make sure a certain metric always refers to the same phyiscal device if logical device names change during reboots
`firmware_version`	The firmware version reported by the device, e.g. `LHGNT384`
`user_capacity_bytes`	The drive's capacity in bytes, e.g. `10000831348736`
`device_name`	The shortened logical device name, e.g. `sda`; might change during reboots; not presetn it `-o` argument is set
`device_type`	The device type, e.g. `sat`
`instance`	The host name passed to the script using `-n`
`zfs_pool`	If the disk is part of a ZFS pool, this tag contains the name of that pool

Common Tags

All other metrics have the following common tags:

Tag Name	Description
`serial_number`	The device's serial number, e.g. `1EHXXXXX`; can be used to join to `smart_disk_info` metric for additional tags

NVME Specific Tags

smart_nvme_attribute metrics have these additional tags to the common tags:

Tag Name	Description
`value_type`	Fixed to `raw`
`attribute_name`	Name of the reported NVME Health Information, e.g. `available_spare`

SAT Specific Tags

smart_attribute metrics have these additional tags to the common tags:

Tag Name	Description
`value_type`	One of `value`, `worst`, `thresh` or `raw`
`attribute_name`	Name of the reported ATA S.M.A.R.T. attribute, e.g. `Start_Stop_Count`
`attribute_id`	Unique ID of the S.M.A.R.T. attribute, e.g. `4`

Limitations for tag values

The Graphite Documentation on tagged metrics reads the following about tag values:

Tag values must also have a length >= 1, they may contain any ascii characters except ; and the first character must not be ~.

For values that might not always exist (such as model_name or model_family) that means that the whole tag cannot be added if the value is empty.
Even though the definition for allowed tag value characters implies that a whitespace is allowed, whitespaces in tag values seem to break some Graphite servers such as the Prometheus Graphite Exporter. This is why this script will replace blanks in tag values with underscores _.

Omit Device Name from Info Metric

If the device names change between reboots or if devices are added to the system, this will cause multiple time series to be created due to the now differing device_name tag. To avoid this, start the script specifying the -o flag, which will cause the device_name tag to be omitted from the smart_disk_info metric. If device_name is volatile, it is of limited value anyway.

ZFS Support

Since the script was originally designed to be run on a TrueNAS based system, information about ZFS pool assignment for each disk is a logical supplement for the disk's info metrics. If the script detects ZFS on the system, it will try to find the ZFS pool each monitored disk is assigned to and add a zfs_pool tag to the smart_disk_info metric.

Continue to send last known metrics for devices in STANDBY

If a device is in standby / spun down, querying S.M.A.R.T. attributes would wake it up, which this script will not do. On the other hand this means there are no metrics available for disks that are in standby.

The script offers the argument -c. If that argument is set, it will continue to send the last queried metrics for a device in standby to prevent gaps in the metric time series. The only metric that will be updated is smart_power_state (which will reflect the standby state by being 0).
⚠️ Be aware that this has the side effect of continuous metrics such as smart_device_temp or smart_power_on_hours to show sudden jumps in graphs when the updated values are sent after the device has woken up again.

Manually specify devices to monitor

By default, the script will scan for all S.M.A.R.T. capable device in the system at startup and send metrics for all these devices. If only a specific subset of devices should be monitored, these devices may be passed to the script with the argument -m, specifying the argument once per device to monitor.
The following example will only monitor devices /dev/sda and /dev/sdc and will not scan for other devices:

./graphite_smart_exporter.sh -d graphite.mydomain.com -n myhost -m /dev/sda -m /dev/sdc

Manually specify device types

smartctl will try to guess the correct device type when querying a specific device. However, it might not get it right for all devices, which might result in wrong/missing output.
To manually force the script to use a specific device type for a certain device, specify it using the arugment -t in the form <device_name>=<type>, once per device.
The following example will only monitor devices /dev/sda and /dev/sdc, but force /dev/sdc to be treated as sat type device:

./graphite_smart_exporter.sh -d graphite.mydomain.com -n myhost -m /dev/sda -m /dev/sdc -t /dev/sdc=sat

Note that specifying -t alone without -m will not disable scanning for devices, but will honor the device type for the specified devices.

Logging

The script uses logfmt as a logging format and supports normal logging to console, logging to a file and no log output at all:

pass no additional arguments to only use normal log output on console
pass -v to enable debug log output
pass -l graphite_smart_exporter.log to enable logging to the graphite_smart_exporter.log file using the specified verbosity level
pass -q to disable all logging

Note that -v and -q cannot be set at the same time.

export PATH=/opt/sbin:/opt/bin:$PATH BASEDIR=$(dirname "$0") bash $BASEDIR/graphite_smart_exporter.sh -v -d graphite.local.salvoxia.de -p 9109 -n samvault.local.salvoxia.de -s smart_output2.json -f 60 -l graphite_smart_exporter.log -t /dev/sda=sat -t /dev/sdb=sat

Installation on Synology DSM

Synology DSM 7.1 uses an older version of smartctl that does not support json formatted output. In order to use Graphite S.M.A.R.T. Exporter, a newer version of smartctl must be installed.
This can be done using Entware.

Guide

Follow the installation instructions for Entware
Install smartctl using Entware
```
sudo opkg install smartctl
```
Copy graphite_smart_exporter.sh and synolgoy_wrapper.sh to a suitable folder on your Synology NAS
Make graphite_smart_exporter.sh and synolgoy_wrapper.sh executable
```
chmod +x graphite_smart_exporter.sh synolgoy_wrapper.sh
```
Log in to DSM Web interface DSM > Control Panel > Task Scheduler
- Create > Triggered Task > User Defined Script
  - General
    - Task: Graphite S.M.A.R.T. Exporter
    - User: root
    - Event: Boot-up
    - Pre-task: Entware
  - Task Settings
    - Run Command: (see bellow)
```
/bin/bash /path/to/exporter/synology_wrapper.sh <exporterArgs>
```

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
example_data		example_data
images		images
README.md		README.md
graphite_smart_exporter.sh		graphite_smart_exporter.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graphite S.M.A.R.T. Exporter

Key Features

Disclaimer

Usage

Set up script to run at TrueNAS startup with custom frequency

Exported Metrics and Tags

Metrics

Disk Info Tags

Common Tags

NVME Specific Tags

SAT Specific Tags

Limitations for tag values

Omit Device Name from Info Metric

ZFS Support

Continue to send last known metrics for devices in STANDBY

Manually specify devices to monitor

Manually specify device types

Logging

Installation on Synology DSM

Guide

About

Releases 6

Packages

Languages

Salvoxia/graphite-smart-exporter

Folders and files

Latest commit

History

Repository files navigation

Graphite S.M.A.R.T. Exporter

Key Features

Disclaimer

Usage

Set up script to run at TrueNAS startup with custom frequency

Exported Metrics and Tags

Metrics

Disk Info Tags

Common Tags

NVME Specific Tags

SAT Specific Tags

Limitations for tag values

Omit Device Name from Info Metric

ZFS Support

Continue to send last known metrics for devices in STANDBY

Manually specify devices to monitor

Manually specify device types

Logging

Installation on Synology DSM

Guide

About

Topics

Resources

Stars

Watchers

Forks

Releases 6

Packages 0

Languages

Packages