Expose per dataset ZFS metrics #1602

pmb311 · 2020-02-11T16:28:24Z

Host operating system: output of `uname -a`

Linux foo1.example.com 4.19.67 #1 SMP Thu Aug 22 16:06:16 EDT 2019 x86_64 GNU/Linux

node_exporter version: output of `node_exporter --version`

node_exporter, version 0.18.1 (branch: release-0.18, revision: 0037b4808adeaa041eb5dd699f7427714ca8b8c0)
build user: foo
build date: 20191030-22:24:38
go version: go1.13.3

node_exporter command line flags

/usr/sbin/node_exporter --collector.zfs

Are you running node_exporter in Docker?

No

What did you do that produced an error?

N/A

What did you expect to see?

We'd like to collect the per-dataset contents of /proc/spl/kstat/zfs/ZPOOL NAME/objset-*
E.g.:
root@foo:~$ cat /proc/spl/kstat/zfs/ZPOOL NAME/objset-0x4c3
49 1 0x01 7 1904 882869614188 7661358045725488
name type data
dataset_name 7 ZPOOL NAME/DATASET NAME
writes 4 162659962
nwritten 4 169357302418427
reads 4 19860562
nread 4 20787773826774
nunlinks 4 5326
nunlinked 4 5326

What did you see instead?

The contents of this file are not collected by node_exporter

The text was updated successfully, but these errors were encountered:

discordianfish · 2020-02-25T10:04:30Z

Is that available as unprivileged user? Or does it require root permissions to read?

brian-brazil · 2020-02-25T10:14:58Z

What version of ZFS is that? I'm not seeing it on my 0.7.5.

Sudhar287 · 2020-02-25T14:12:34Z

I tried reading it from a non root user and it worked.
My ZFS version is : 0.8.3-1~bpo10+1

EDIT: Also, can I work on this issue?

Sudhar287 · 2020-03-02T22:08:16Z

I have some ideas about the metric and labels names to accomplish this task.

The existing structure to query metric in the IO file is:
node_zfs_zpool_nread{instance="localhost:9100",job="node",zpool="myZpool1"}

My proposals are very similar. I have two alternatives, please highlight on which is better:

node_zfs_zpool_nread{instance="localhost:9100",job="node",zpool="myZpool1", dataset=”datasetName”}

Pros:

Adhering to existing standards. Just including an extra label: dataset

Cons:

A bit confusing because some of the metric name for querying ZFS per pool metrics and ZFS per pool per dataset metrics are the same. Eg: nread header is there is both zfs per pool metrics and per pool per dataset metric.
- Or maybe maybe this is actually an advantage. When the dataset label is left blank, users can understand that they are querying the metrics per pool. When they specify the dataset, they know they are querying metics per pool per dataset.
All the metric names in the io file are not there in the objset file. For ex: There is wlentime in the io file but not in the objset file. Similarly, there is nunlinks in the objset but not in the io file.

node_zfs_zpool_dataset_nread{instance="localhost:9100",job="node",zpool="myZpool1", dataset=”datasetName”}

Pros:

Explicit dataset string in metric name will lead to no confusion.
Easier to query the prometheus metric names with dataset.

Cons:

Has dataset string in both metric and label. Is it necessary to be so explicit about it?

thoro · 2020-03-04T09:24:07Z

I like option 2 better, option 1 might break existing alerts / notifications / etc

But I would like to support this issue, since I was just looking for exactly that.

The metrics are available starting with ZFS 0.8

Sudhar287 · 2020-03-04T14:00:50Z

Thanks for the response @thoro.
Would also like to let you know that I've been programming the solution for quite some time. My teammate raised the issue and we would be using this feature even if its not finally included in this repo. :)

Sudhar287 · 2020-03-05T06:47:53Z

Hello moderators Brian and Johannes!
Please do give your feedback on the two alternatives proposed when possible. :)

I would also like to let you know that the second alternative didn't work as planned. I'm guessing its because of this. The number of elements appearing in the drop-down menu in the Prometheus UI was significantly lesser than expected. I think some of the data is being overwritten. I tried adding a unique job tag as suggested here, but that didn't work too. FYI, I'm using a similar code structure to this.

However, the query structure below worked like a charm: node_zfs_zpool_poolName_datasetName_nread{instance="localhost:9100",job="node",zpool="myZpool1", dataset=”datasetName”}
The above results in a plethora of metric names and dosent really seem very practical. How should I proceed?

Sudhar287 · 2020-03-10T13:07:08Z

FYI, I figured out what the problem was. Like it says in the docstring, the combination of metric name, label name, help string that I was using was getting overwritten and behaving erratically.
Credit goes to @mknapphrt for helping me debug this.

discordianfish · 2020-04-17T08:57:13Z

Would the sum of node_zfs_zpool_dataset_nread be equal to node_zfs_zpool_nread?
In that case we don't need both, so we could either drop node_zfs_zpool_nread or make node_zfs_zpool_nread not include the aggregate but only the per dataset metrics.

Does that make sense?

gerardba · 2020-04-17T18:21:55Z

Would the sum of node_zfs_zpool_dataset_nread be equal to node_zfs_zpool_nread?

Actually they don't match because node_zfs_zpool_dataset_nread counts any read that happens on the dataset, including those served from ARC cache. On the other hand node_zfs_zpool_nread only counts reads that hit disk - including zpool scrubs, which don't show in any dataset.

discordianfish · 2020-04-20T08:16:56Z

Makes sense, then go with option 2

thulle · 2020-05-26T04:26:20Z

I'm a bit curious about the node_zfs_zpool_dataset_nread naming. Would dataset there be the whole path to the dataset, which in my case can be up to 70 characters long, or would it just be the "basename" and full path would be distinguishable by tha dataset tag?

Edit: just realised it's probably "dataset" since it's dataset statistics..

SuperQ · 2020-05-26T04:47:44Z

While we're at it, it would be nice to make the new metric names follow Prometheus conventions.

pmb311 · 2020-05-27T18:59:03Z

I'm a bit curious about the node_zfs_zpool_dataset_nread naming. Would dataset there be the whole path to the dataset, which in my case can be up to 70 characters long, or would it just be the "basename" and full path would be distinguishable by tha dataset tag?

Edit: just realised it's probably "dataset" since it's dataset statistics..

Using the example in my initial comment, It's 'ZPOOL NAME/DATASET NAME' .

thulle · 2020-05-28T11:42:13Z

@pmb311 yeah, I just wondered if the dataset name would end up in the metric name since the dataset name could be something like "Remote systems/backups/webservers/web0001/customerdata/customer0200/staticdata" for example.

aqw · 2020-09-03T07:10:48Z

This functionality was added in #1632. Perhaps this issue can be closed?

Sudhar287 · 2020-09-04T13:57:48Z

Yes, this feature has been added! @aqw

Sudhar287 mentioned this issue Mar 10, 2020

read contents of objset file #1632

Merged

discordianfish added accepted enhancement labels Apr 17, 2020

SuperQ closed this as completed Sep 4, 2020

tahajahangir mentioned this issue Jan 7, 2021

PV/Dataset metrics openebs/zfs-localpv#272

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose per dataset ZFS metrics #1602

Expose per dataset ZFS metrics #1602

pmb311 commented Feb 11, 2020

discordianfish commented Feb 25, 2020

brian-brazil commented Feb 25, 2020

Sudhar287 commented Feb 25, 2020 •

edited

Loading

Sudhar287 commented Mar 2, 2020

thoro commented Mar 4, 2020 •

edited

Loading

Sudhar287 commented Mar 4, 2020

Sudhar287 commented Mar 5, 2020

Sudhar287 commented Mar 10, 2020

discordianfish commented Apr 17, 2020

gerardba commented Apr 17, 2020

discordianfish commented Apr 20, 2020

thulle commented May 26, 2020 •

edited

Loading

SuperQ commented May 26, 2020

pmb311 commented May 27, 2020

thulle commented May 28, 2020

aqw commented Sep 3, 2020

Sudhar287 commented Sep 4, 2020

Expose per dataset ZFS metrics #1602

Expose per dataset ZFS metrics #1602

Comments

pmb311 commented Feb 11, 2020

Host operating system: output of uname -a

node_exporter version: output of node_exporter --version

node_exporter command line flags

Are you running node_exporter in Docker?

What did you do that produced an error?

What did you expect to see?

What did you see instead?

discordianfish commented Feb 25, 2020

brian-brazil commented Feb 25, 2020

Sudhar287 commented Feb 25, 2020 • edited Loading

Sudhar287 commented Mar 2, 2020

thoro commented Mar 4, 2020 • edited Loading

Sudhar287 commented Mar 4, 2020

Sudhar287 commented Mar 5, 2020

Sudhar287 commented Mar 10, 2020

discordianfish commented Apr 17, 2020

gerardba commented Apr 17, 2020

discordianfish commented Apr 20, 2020

thulle commented May 26, 2020 • edited Loading

SuperQ commented May 26, 2020

pmb311 commented May 27, 2020

thulle commented May 28, 2020

aqw commented Sep 3, 2020

Sudhar287 commented Sep 4, 2020

Host operating system: output of `uname -a`

node_exporter version: output of `node_exporter --version`

Sudhar287 commented Feb 25, 2020 •

edited

Loading

thoro commented Mar 4, 2020 •

edited

Loading

thulle commented May 26, 2020 •

edited

Loading