Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Linux HA ClusterLabs exporterr to prometheus doc #1540

Merged
merged 1 commit into from
Feb 5, 2020

Conversation

MalloZup
Copy link
Contributor

@MalloZup MalloZup commented Feb 3, 2020

Add Linux HA ClusterLabs exporterr to prometheus doc

@brian-brazil
Copy link
Contributor

Thanks for your PR. What exactly is the name of the piece of software that this exporter is for? It seems a bit odd to mention "HA" or a company name in here, unless it's actually required to disambiguate the name.

This also looks like it's actually several exporters in one, which is not a pattern we encourage. There's possibly some overlap with the node exporter on DRDB at least too.

Looking through the code I'm not sure that the split of counters and gauges is correct, and all counters (and only counters) should end in _total. Similarly fail_count sounds like a gauge rather than part of a summary/historgram so shouldn't end in _count.

@stefanotorresi
Copy link

stefanotorresi commented Feb 4, 2020

@brian-brazil thanks for taking the time to vet this project! I'm co-maintainer of it together with @MalloZup.

The word "HA" refers to what historically was the Linux HA project (yep, not a great name), all the components of which are now maintained under the ClusterLabs org.

The ha_cluster_exporter project aims at exporting metrics of the whole stack, being Pacemaker, Corosync, and SBD the main components, and while I understand that grouping multiple exporters in one project may not be ideal, in this case it was deemed appropriate to avoid maintenance overhead of many different little packages which would be deployed all together in 99% of the use cases.

The DRBD collector actually is mutually exclusive with the one provided by node_exporter because that only supports the DRBD version bundled with the linux kernel (8.x), while ours only support the version provided within SUSE (9.x).

In hindsight, I realise that we could have named it something different, but you know how it goes about naming things in software... 😉

That said, we'll look into renaming the counters and gauges according to your indications, thank you very much! We looked at https://prometheus.io/docs/practices/naming/ but it wasn't immediately understandable that _total is for counters and _count is for summary/historgram.

@MalloZup
Copy link
Contributor Author

MalloZup commented Feb 4, 2020

thx @brian-brazil and @stefanotorresi for answer.

JFYI clusterlab is not a company but an opensource group/organisation where different people (from org and opensource communities) partecipate.

https://clusterlabs.org/.

Imho it would disambiguate the name because imho HA is to ambigous.

@brian-brazil
Copy link
Contributor

brian-brazil commented Feb 5, 2020

The question here on the naming is what would an existing user of this software look for if they were on this page? If a piece of software has gone through several names and is still commonly known by some of them, we may end up including a few of them.

At a first glance HA doesn't seem useful here, unless Linux is in front of it.

and while I understand that grouping multiple exporters in one project may not be ideal, in this case it was deemed appropriate to avoid maintenance overhead of many different little packages which would be deployed all together in 99% of the use cases.

That sounds like there may be an acceptable exception in this case, which I think would be the first time that would have happened. Would every machine run all of them? Do they expose per-machine data, cluster data, or a mix?

I note that DRDB is not in that list however, and those are machine metrics which would almost certainly not fall into that.

The DRBD collector actually is mutually exclusive with the one provided by node_exporter because that only supports the DRBD version bundled with the linux kernel (8.x), while ours only support the version provided within SUSE (9.x).

In such cases it'd better to look at adding support to the node exporter, rather than creating a brand new exporter every time there's a new protocol version.

@stefanotorresi
Copy link

The question here on the naming is what would an existing user of this software look for if they were on this page?

That's a very fair point. I guess that would be "pacemaker" or "clusterlabs", which is why we added these words as the text of the link. Would something like "Linux HA ClusterLabs exporter" be more satisfactory to you?
By the way, I see where you're coming from, I really do: I don't like the name ha_cluster_exporter either, to be honest, not one bit. If I could go back, I would've probably made your very same points.
Now, I can't really dispute your reasoning, so... if you are really adamant about this, then I guess we'll have to see if we can change the repo name to something like clusterlabs_exporter, or give up the idea of being listed in the Prometheus docs altogether (which would suck, tbh), but please understand that renaming the project would involve a considerable amount of work on our part, as this package is already shipped within some openSUSE repositories, and it's soon to be part of SUSE's SLES product line. So, do you really want us to do this?

Would every machine run all of them? Do they expose per-machine data, cluster data, or a mix?

Yes, a mix. 😉
The ClusterLabs stack is something like 25 years old, the components are tightly coupled and they heavily rely on local IPC. Having multiple small exporters doesn't make much sense when the system we're trying to instrument is quite monolithic itself.

I note that DRDB is not in that list however, and those are machine metrics which would almost certainly not fall into that.

You're right that DRBD is not part of the ClusterLabs stack, but it's very commonly associated with it, in fact, I will cite the official docs saying "Using DRBD in conjunction with the Pacemaker cluster stack is arguably DRBD’s most frequently found use case".
Furthermore, SUSE and LINBIT have an established relationship, with people contributing both ways, so it was just natural and much easier for us to introduce support in our package rather than getting familiarity with an existing third-party one and going through an external contribution and release process.
Also, one could argue that the node exporter itself is instrumenting many different pieces of software, so it goes against the very idea that you're suggesting to split our package apart.
Yes, there is a slight overlap, and we might explore the possibility of moving the DRBD part as a contribution to the node exporter in the future, or maybe even moving it from the node exporter to a separate package, if we find an understanding of where does it makes better sense. Would it be a way to go forward? Splitting the package right now is probably not something we can commit to at the moment, as we really see DRBD as a common component of the Linux HA clusters; is this non-negotiable? 😬

I hope you understand that we're doing everything in our power to find a compromise, but there is more to it than you might think!

@brian-brazil
Copy link
Contributor

So, do you really want us to do this?

I'm concerned about what we write on this page so users can find your exporter within it.

What your git repo is called is not for me to determine. I'm here to provide a curated list of exporters while improving their quality in passing, not to make busywork.

Yes, a mix. wink

That's a potential issue, as semantically cluster-level views should not be mixed with machine/process-level views of stuff (but each processes providing its view as to what the cluster state currently is is okay) and instead should be separate exporters - or separate endpoints on the same exporter. Put another way, don't mix metrics that represent the current official quorum with metrics that are not from quorum.

For example for Kubernetes you have the /metrics for the kubelet process, the cadvisor metrics exposed by the kubelet, the /metrics for the API process, and then kube-state-metrics for the user-facing data kubernetes stores.

This is a bit of uncharted territory here, it's not at all required that exporters listed here perfectly meet the guidelines however we do require that they generally make sense in the first place (e.g. something that's simply not going to work out cardinality wise outside of very niche circumstances).

Also, one could argue that the node exporter itself is instrumenting many different pieces of software, so it goes against the very idea that you're suggesting to split our package apart.

Only systemd really (and there's talk of moving that out), the rest is all Unix kernels.

is this non-negotiable? grimacing

This case is kinda weird. Tradeoffs are an fundamental part of designing an exporter, however "uberexporters" are a particularly bad smell when it comes to exporters. Overlapping with an existing exporter is also something I try to avoid. What I'd hope is that there'd at least be a good faith attempt to move the DRDB 9 support into the node exporter or otherwise clean this situation up at some undefined point in the future.

Signed-off-by: dmaiocchi <dmaiocchi@suse.com>
@MalloZup MalloZup changed the title Add ha_cluster_exporter to prometheus doc Add Linux HA ClusterLabs exporterr to prometheus doc Feb 5, 2020
@stefanotorresi
Copy link

stefanotorresi commented Feb 5, 2020

I'm concerned about what we write on this page so users can find your exporter within it.

Apologies, I understood that you didn't approve the actual name of the project (and you would be right in doing so 😀)

Well, then, we have settled with "Linux HA ClusterLabs exporter" as the link text. That should work.

each processes providing its view as to what the cluster state currently is is okay

That's exactly what's happening: the Pacemaker state is always cluster-wide.

What I'd hope is that there'd at least be a good faith attempt to move the DRDB 9 support into the node exporter or otherwise clean this situation up at some undefined point in the future.

Absolutely. We certainly don't want the project to grow out of proportion.

@brian-brazil
Copy link
Contributor

That's exactly what's happening: the Pacemaker state is always cluster-wide.

That's not quite right then, as all the non-Pacemaker state is not cluster wise.

What happens if one of the machines lags behind and doesn't have the latest state? Are you always talking to a leader or some form, or getting each machine's current view?

@stefanotorresi
Copy link

That's not quite right then, as all the non-Pacemaker state is not cluster wise.

Well, the cluster state is cluster wise, all the rest is only relevant to just the nodes.

What happens if one of the machines lags behind and doesn't have the latest state? Are you always talking to a leader or some form, or getting each machine's current view?

There is no such concept of "leader" in this stack.
All nodes participate in a quorum pool handled by the dedicated Corosync messaging component, also running and instrumented in all the nodes, which implements the Totem Single Ring Ordering and Membership Protocol. This protocol provides reliable messaging via something called "virtual synchrony", which I guess is a fancy way of saying "reliable multicast" in mission-critical applications.
In the occurrence of a network partition, consistency is protected by a fencing mechanism (usually SBD) that isolates and shuts down the minority of the nodes affected by the anomaly.
The exporter provides data from all these components, and this data resides in equal part on all the nodes.
Does this answer your question?

@stefanotorresi
Copy link

by the way, we also have given thought about the fact that we export the same metrics from multiple nodes, but the RabbitMQ folks have faced the same issue and we are evaluating the same approach.

@brian-brazil
Copy link
Contributor

Does this answer your question?

Not quite. Are you saying it's impossible for a node to have stale data, even briefly or via configuration/operator error? That seems a bit implausible to me.

We'd usually have each process expose what it currently knows, as if it differs across the processes that's interesting. Leaderness (if any) would be a boolean gauge, that you could join with in PromQL.

the fact that we export the same metrics from multiple nodes, but the RabbitMQ folks have faced the same issue and we are evaluating the same approach.

That approach would not be good practice. Firstly the right way to do that would be to have a singleton target to scrape for this information, as self="1" is basically a target label indicating this should be a separate target rather than mixing it into another target - though you'd only do that for data which you don't care which process it came from (e.g. running a SQL query across any of a replica pool) which doesn't appear to be the case here.
Secondly as above, if the values are actually different that's something you want to know inside Prometheus.
Thirdly Prometheus is pretty efficient, so the cost probably isn't as high as you'd think.

@diegoakechi
Copy link

Hi @brian-brazil and @stefanotorresi, I will try to help here...

Does this answer your question?

Not quite. Are you saying it's impossible for a node to have stale data, even briefly or via configuration/operator error? That seems a bit implausible to me.

Not really. So, even that the cluster-wise, the cluster state should be the same on all the nodes and following the principles @stefanotorresi has mentioned, pacemaker does that via replicating the metadata between all the cluster nodes, and we can have problems on this based on many reasons like network problems, misconfiguration, and etc.

We'd usually have each process expose what it currently knows, as if it differs across the processes that's interesting. Leaderness (if any) would be a boolean gauge, that you could join with in PromQL.

This is exactly what the exporter is currently doing. It exposes what the node currently knows about the cluster status + local metrics for the other components that compose the cluster (Corosync as communication layer, SBD as fencing mechanism, etc). If the cluster metrics differ between nodes, are exactly the situations that indicate some problem has occurred.

In that sense, even if it is called cluster metrics, on my POV it is a local node metric, since it is the current cluster status present on the local node metadata replica.

For example, there is a so-called Cluster partition situation, where some nodes lost membership to the clusters and form a new cluster. The way to identify that is by comparing the cluster membership of each node and check how the cluster wise information differs between them.

So, in that sense, I don't know how we could tackle such situations without comparing the cluster status reported by each node.

Also just for completeness, it is not true to say that there is no leader on Pacemaker cluster. What we don't have is a Controller/Master node. Instead, Pacemaker has a concept of Designated Coordinator that is elected by the cluster automatically.

Hope it makes a little bit clearer...

@brian-brazil
Copy link
Contributor

In that sense, even if it is called cluster metrics, on my POV it is a local node metric, since it is the current cluster status present on the local node metadata replica.

That all sounds fine then. These distinctions can be a bit subtle.

@brian-brazil brian-brazil merged commit 632c6b1 into prometheus:master Feb 5, 2020
@diegoakechi
Copy link

@brian-brazil I am glad I was able to clarify that. Thanks for your support on this.

@MalloZup
Copy link
Contributor Author

MalloZup commented Feb 6, 2020

Thx for merging and discussion

eightnoneone pushed a commit to eightnoneone/docs that referenced this pull request Apr 20, 2020
Signed-off-by: dmaiocchi <dmaiocchi@suse.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants