Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SBD watchdog and msgwait metrics #174

Merged
merged 7 commits into from
Sep 3, 2020

Conversation

MalloZup
Copy link
Contributor

@MalloZup MalloZup commented Sep 1, 2020

todo:

  • add documentation for metrics

Note:

I have implemented the metric like

ha_cluster_sbd_devices{device="/dev/vdd",status="healthy"} 1
ha_cluster_sbd_devices_watchdog_timeout{device="/dev/vdc"} 60
ha_cluster_sbd_devices_msgwait_timeout{device="/dev/vdd"} 120

I didn't found any useful doc about the loop and allocate metric so I even wondered if we should expose them. (In fact I haven't but if they are meant to be useful, we should definitely document this in the suse-doc)
See:
https://documentation.suse.com/sle-ha/15-SP2/html/SLE-HA-all/cha-ha-storage-protect.html#sec-ha-storage-protect-watchdog-timings

@diegoakechi cc
@nick-wang cc

I personally think that if we are exposing only 2 metrics we can create for each one a new metric without label.

If we want to expose 4 then I will refactor accordingly with a type label. I am ok with both directions

@diegoakechi
Copy link
Member

@MalloZup How are these kind of metric being formatted on other cases or maybe on other exporters? I think the usage of label more flexible and elegant, but at the same time it might make harder for the consumer. Imagine that we use label and implement only 2 metrics. When a third label is introduced we might break dashboards that might not have consistent filters, while a new metric would be ignored.

With that my vote is for separated metrics.

Copy link
Member

@stefanotorresi stefanotorresi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MalloZup as I mentioned in the JIRA issue, the way I would go about this is to add a single metric with both a device and a type label, and add all the timeouts.
There is no real downside to expose also the other two types that you have left out.

The final result should be something like this:

# TYPE ha_cluster_sbd_device_timeout gauge
ha_cluster_sbd_device_timeout{type="watchdog",device="/dev/vdc"} 5
ha_cluster_sbd_device_timeout{type="allocate",device="/dev/vdc"} 2
ha_cluster_sbd_device_timeout{type="loop",device="/dev/vdc"} 1
ha_cluster_sbd_device_timeout{type="msgwait",device="/dev/vdc"} 10

@stefanotorresi
Copy link
Member

@diegoakechi for these kind of use cases, the best practice is exactly the opposite of using separate metrics. See https://prometheus.io/docs/practices/instrumentation/#use-labels

collector/sbd/sbd.go Outdated Show resolved Hide resolved
@diegoakechi
Copy link
Member

@diegoakechi for these kind of use cases, the best practice is exactly the opposite of using separate metrics. See https://prometheus.io/docs/practices/instrumentation/#use-labels

Sure, in that case lets go this way. Also, make sure to include all the metrics, so we avoid any filtering problem in the future. Thanks @stefanotorresi

@MalloZup
Copy link
Contributor Author

MalloZup commented Sep 2, 2020

@stefanotorresi I'm ok to use the type even for 2 metric.

ha_cluster_sbd_device_timeout{type="watchdog",device="/dev/vdc" 1
ha_cluster_sbd_device_timeout{type="msgwait",device="/dev/vdc"} 1

as follow-up with @yan-gao discussion we don't need to expose the other 2 metrics since they are more confusing then helping

thx for reviews. I will rework this during the day.

@stefanotorresi
Copy link
Member

as follow-up with @yan-gao discussion we don't need to expose the other 2 metrics since they are more confusing then helping

okay, sounds good to me!

@MalloZup MalloZup changed the title [WIP] Implement SBD watchdog and msgwait metrics Implement SBD watchdog and msgwait metrics Sep 2, 2020
Copy link
Member

@stefanotorresi stefanotorresi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I suggest just a few minor tweaks to the labels and names.

collector/sbd/sbd.go Outdated Show resolved Hide resolved
collector/sbd/sbd.go Outdated Show resolved Hide resolved
collector/sbd/sbd.go Outdated Show resolved Hide resolved
doc/metrics.md Outdated Show resolved Hide resolved
test/sbd.metrics Outdated Show resolved Hide resolved
test/sbd.metrics Outdated Show resolved Hide resolved
Copy link
Contributor Author

@MalloZup MalloZup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx should be ok now

@stefanotorresi stefanotorresi merged commit 8912739 into ClusterLabs:master Sep 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants