Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose per dataset ZFS metrics #1602

Closed
pmb311 opened this issue Feb 11, 2020 · 17 comments
Closed

Expose per dataset ZFS metrics #1602

pmb311 opened this issue Feb 11, 2020 · 17 comments

Comments

@pmb311
Copy link

pmb311 commented Feb 11, 2020

Host operating system: output of uname -a

Linux foo1.example.com 4.19.67 #1 SMP Thu Aug 22 16:06:16 EDT 2019 x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 0.18.1 (branch: release-0.18, revision: 0037b4808adeaa041eb5dd699f7427714ca8b8c0)
build user: foo
build date: 20191030-22:24:38
go version: go1.13.3

node_exporter command line flags

/usr/sbin/node_exporter --collector.zfs

Are you running node_exporter in Docker?

No

What did you do that produced an error?

N/A

What did you expect to see?

We'd like to collect the per-dataset contents of /proc/spl/kstat/zfs/ZPOOL NAME/objset-*
E.g.:
root@foo:~$ cat /proc/spl/kstat/zfs/ZPOOL NAME/objset-0x4c3
49 1 0x01 7 1904 882869614188 7661358045725488
name type data
dataset_name 7 ZPOOL NAME/DATASET NAME
writes 4 162659962
nwritten 4 169357302418427
reads 4 19860562
nread 4 20787773826774
nunlinks 4 5326
nunlinked 4 5326

What did you see instead?

The contents of this file are not collected by node_exporter

@discordianfish
Copy link
Member

Is that available as unprivileged user? Or does it require root permissions to read?

@brian-brazil
Copy link
Contributor

What version of ZFS is that? I'm not seeing it on my 0.7.5.

@Sudhar287
Copy link
Contributor

Sudhar287 commented Feb 25, 2020

I tried reading it from a non root user and it worked.
My ZFS version is : 0.8.3-1~bpo10+1

EDIT: Also, can I work on this issue?

@Sudhar287
Copy link
Contributor

I have some ideas about the metric and labels names to accomplish this task.

The existing structure to query metric in the IO file is:
node_zfs_zpool_nread{instance="localhost:9100",job="node",zpool="myZpool1"}

My proposals are very similar. I have two alternatives, please highlight on which is better:

  1. node_zfs_zpool_nread{instance="localhost:9100",job="node",zpool="myZpool1", dataset=”datasetName”}

Pros:

  • Adhering to existing standards. Just including an extra label: dataset

Cons:

  • A bit confusing because some of the metric name for querying ZFS per pool metrics and ZFS per pool per dataset metrics are the same. Eg: nread header is there is both zfs per pool metrics and per pool per dataset metric.

    • Or maybe maybe this is actually an advantage. When the dataset label is left blank, users can understand that they are querying the metrics per pool. When they specify the dataset, they know they are querying metics per pool per dataset.
  • All the metric names in the io file are not there in the objset file. For ex: There is wlentime in the io file but not in the objset file. Similarly, there is nunlinks in the objset but not in the io file.

  1. node_zfs_zpool_dataset_nread{instance="localhost:9100",job="node",zpool="myZpool1", dataset=”datasetName”}

Pros:

  • Explicit dataset string in metric name will lead to no confusion.
  • Easier to query the prometheus metric names with dataset.

Cons:

  • Has dataset string in both metric and label. Is it necessary to be so explicit about it?

@thoro
Copy link

thoro commented Mar 4, 2020

I like option 2 better, option 1 might break existing alerts / notifications / etc

But I would like to support this issue, since I was just looking for exactly that.

The metrics are available starting with ZFS 0.8

@Sudhar287
Copy link
Contributor

Thanks for the response @thoro.
Would also like to let you know that I've been programming the solution for quite some time. My teammate raised the issue and we would be using this feature even if its not finally included in this repo. :)

@Sudhar287
Copy link
Contributor

Hello moderators Brian and Johannes!
Please do give your feedback on the two alternatives proposed when possible. :)

I would also like to let you know that the second alternative didn't work as planned. I'm guessing its because of this. The number of elements appearing in the drop-down menu in the Prometheus UI was significantly lesser than expected. I think some of the data is being overwritten. I tried adding a unique job tag as suggested here, but that didn't work too. FYI, I'm using a similar code structure to this.

However, the query structure below worked like a charm: node_zfs_zpool_poolName_datasetName_nread{instance="localhost:9100",job="node",zpool="myZpool1", dataset=”datasetName”}
The above results in a plethora of metric names and dosent really seem very practical. How should I proceed?

@Sudhar287
Copy link
Contributor

FYI, I figured out what the problem was. Like it says in the docstring, the combination of metric name, label name, help string that I was using was getting overwritten and behaving erratically.
Credit goes to @mknapphrt for helping me debug this.

@discordianfish
Copy link
Member

Would the sum of node_zfs_zpool_dataset_nread be equal to node_zfs_zpool_nread?
In that case we don't need both, so we could either drop node_zfs_zpool_nread or make node_zfs_zpool_nread not include the aggregate but only the per dataset metrics.

Does that make sense?

@gerardba
Copy link

Would the sum of node_zfs_zpool_dataset_nread be equal to node_zfs_zpool_nread?

Actually they don't match because node_zfs_zpool_dataset_nread counts any read that happens on the dataset, including those served from ARC cache. On the other hand node_zfs_zpool_nread only counts reads that hit disk - including zpool scrubs, which don't show in any dataset.

@discordianfish
Copy link
Member

Makes sense, then go with option 2

@thulle
Copy link

thulle commented May 26, 2020

I'm a bit curious about the node_zfs_zpool_dataset_nread naming. Would dataset there be the whole path to the dataset, which in my case can be up to 70 characters long, or would it just be the "basename" and full path would be distinguishable by tha dataset tag?

Edit: just realised it's probably "dataset" since it's dataset statistics..

@SuperQ
Copy link
Member

SuperQ commented May 26, 2020

While we're at it, it would be nice to make the new metric names follow Prometheus conventions.

@pmb311
Copy link
Author

pmb311 commented May 27, 2020

I'm a bit curious about the node_zfs_zpool_dataset_nread naming. Would dataset there be the whole path to the dataset, which in my case can be up to 70 characters long, or would it just be the "basename" and full path would be distinguishable by tha dataset tag?

Edit: just realised it's probably "dataset" since it's dataset statistics..

Using the example in my initial comment, It's 'ZPOOL NAME/DATASET NAME' .

@thulle
Copy link

thulle commented May 28, 2020

@pmb311 yeah, I just wondered if the dataset name would end up in the metric name since the dataset name could be something like "Remote systems/backups/webservers/web0001/customerdata/customer0200/staticdata" for example.

@aqw
Copy link

aqw commented Sep 3, 2020

This functionality was added in #1632. Perhaps this issue can be closed?

@Sudhar287
Copy link
Contributor

Yes, this feature has been added! @aqw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants