Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

receive: Hash-ring metrics #1363

Merged
merged 4 commits into from
Aug 5, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ We use *breaking* word for marking changes that are not backward compatible (rel

- [#1358](https://github.com/thanos-io/thanos/pull/1358) Added `part_size` configuration option for HTTP multipart requests minimum part size for S3 storage type

- [#1363](https://github.com/thanos-io/thanos/pull/1363) Thanos Receive now exposes `thanos_receive_hashring_nodes` and `thanos_receive_hashring_tenants` metrics to monitor status of hash-rings

### Changed

- [#1338](https://github.com/thanos-io/thanos/pull/1338) Querier still warns on store API duplicate, but allows a single one from duplicated set. This is gracefully warn about the problematic logic and not disrupt immediately.
Expand Down Expand Up @@ -65,7 +67,7 @@ The other `type` you can use is `JAEGER` now. The `config` keys and values are J

### Changed

- [#1284](https://github.com/thanos-io/thanos/pull/1284) Add support for multiple label-sets in Info gRPC service.
- [#1284](https://github.com/thanos-io/thanos/pull/1284) Add support for multiple label-sets in Info gRPC service.
This deprecates the single `Labels` slice of the `InfoResponse`, in a future release backward compatible handling for the single set of Labels will be removed. Upgrading to v0.6.0 or higher is advised.
*breaking* If you run have duplicate queries in your Querier configuration with hierarchical federation of multiple Queries this PR makes Thanos Querier to detect this case and block all duplicates. Refer to 0.6.1 which at least allows for single replica to work.

Expand Down
27 changes: 24 additions & 3 deletions pkg/receive/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,11 @@ type ConfigWatcher struct {
logger log.Logger
watcher *fsnotify.Watcher

changesCounter prometheus.Counter
errorCounter prometheus.Counter
refreshCounter prometheus.Counter
changesCounter prometheus.Counter
errorCounter prometheus.Counter
refreshCounter prometheus.Counter
hashringNodesGauge *prometheus.GaugeVec
hashringTenantsGauge *prometheus.GaugeVec

// last is the last known configuration.
last []HashringConfig
Expand Down Expand Up @@ -75,13 +77,27 @@ func NewConfigWatcher(logger log.Logger, r prometheus.Registerer, path string, i
Name: "thanos_receive_hashrings_file_refreshes_total",
Help: "The number of refreshes of the hashrings configuration file.",
}),
hashringNodesGauge: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "thanos_receive_hashring_nodes",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodes or endpoints? (: Don't have strong opinion.

Help: "The number of nodes per hashring.",
},
[]string{"name"}),
hashringTenantsGauge: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "thanos_receive_hashring_tenants",
Help: "The number of tenants per hashring.",
},
[]string{"name"}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious on cardinality of this - this change on restart time or is fairly consistent across restarts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is consistent across restarts. Name comes from a manually maintained list which is fed through a configuration file.

}

if r != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those metrics will work bit better if we would register them below ^^

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂ Thanks for pointing that out.

r.MustRegister(
c.changesCounter,
c.errorCounter,
c.refreshCounter,
c.hashringNodesGauge,
c.hashringTenantsGauge,
)
}

Expand Down Expand Up @@ -172,6 +188,11 @@ func (cw *ConfigWatcher) refresh(ctx context.Context) {
// Save the last known configuration.
cw.last = config

for _, c := range config {
cw.hashringNodesGauge.WithLabelValues(c.Hashring).Set(float64(len(c.Endpoints)))
cw.hashringTenantsGauge.WithLabelValues(c.Hashring).Set(float64(len(c.Tenants)))
}

select {
case <-ctx.Done():
return
Expand Down