Alerts - OCS Cluster and Cluster Nodes health #1

umangachapagain · 2018-11-29T06:52:32Z

Need following status alerts:

Node status (Up/Down)
Container status (Up/Down)
Gluster peer in cluster status (Connected/Disconnected)
Glusterd2 service status (Up/Down)
Cluster status

umangachapagain · 2019-01-14T12:12:27Z

@shtripat How do we get these metrics?
@JohnStrunk I was thinking if these metrics should come from an exporter in anthill as it would have the real updates about the Cluster or Node health which it maintains for reconciliation.

JohnStrunk · 2019-01-14T19:19:48Z

@JohnStrunk I was thinking if these metrics should come from an exporter in anthill as it would have the real updates about the Cluster or Node health which it maintains for reconciliation.

I'm hesitant to get these items from Anthill. It will have its own view of each, but we then get a dependency... If the operator is down or malfunctioning, the alerts are potentially wrong.

I would expect many of these to come via data from gluster-prometheus or health checks on labeled pods. The benefit of using g-p is that as long as 1 gd2 pod is ready, the exporter should be available through the gd2 client service.

cloudbehl · 2019-01-14T21:23:14Z

Node status (Up/Down)

It can come from K8s(node exporter). We can add a recording rule and set it under a gluster namespace.

Container status (Up/Down)

It can come from K8s. but I don't know how useful this will be.

Gluster peer in cluster status (Connected/Disconnected)

It can be provided by glusterd2 API

Glusterd2 service status (Up/Down)

It can be provided by glusterd2 api /ping endpoint

Cluster status

It can be provided by v1/cluster/{cluster_id}/status

@umangachapagain @JohnStrunk

cloudbehl added the help wanted Extra attention is needed label Jan 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alerts - OCS Cluster and Cluster Nodes health #1

Alerts - OCS Cluster and Cluster Nodes health #1

umangachapagain commented Nov 29, 2018

umangachapagain commented Jan 14, 2019

JohnStrunk commented Jan 14, 2019

cloudbehl commented Jan 14, 2019 •

edited

Loading

Alerts - OCS Cluster and Cluster Nodes health #1

Alerts - OCS Cluster and Cluster Nodes health #1

Comments

umangachapagain commented Nov 29, 2018

umangachapagain commented Jan 14, 2019

JohnStrunk commented Jan 14, 2019

cloudbehl commented Jan 14, 2019 • edited Loading

cloudbehl commented Jan 14, 2019 •

edited

Loading