Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[large-tiles] Additional metrics added #2720

Merged
merged 12 commits into from
Nov 3, 2020
Merged

Conversation

gediminasgu
Copy link
Collaborator

What this PR does / why we need it:
This PR adds useful metrics for large-tiles process monitoring.

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

NONE

Does this PR require updating code package or user-facing documentation?:

NONE

// HandleCounterResets is temporarily used to force counter reset handling logics on the processed series.
// TODO: remove once we have metrics type stored in the metadata.
HandleCounterResets bool
MetricsScope tally.Scope
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would this make more sense as an instrument.Options?

Copy link
Collaborator Author

@gediminasgu gediminasgu Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The good thing about the tally.Scope is that I can set tags for all methods below (scope). With instrument.Options I didn't find how to do it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iOpts := instrument.NewOptions()
// To get scope
scope := iOpts.MetricsScope()
// To set scope on opts
iOpts = iOpts.SetMetricsScope(scope.With("tags...."))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just seems more conventional to put instrument options into options structs rather than the raw scopes

@@ -1693,6 +1694,7 @@ func (n *dbNamespace) aggregateTiles(
ctx, sourceNs.ID(), sourceShard.ID(), blockReaders, sourceBlockVolumes, opts, nsCtx.Schema)

processedTileCount += shardProcessedTileCount
processedShards.Inc(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this always increment? Probably better to break this out into success/error metrics? Also might be useful to add a metric for attemptedShards that would increment at the start of this loop to catch scenarios where we can't read the shard for whatever reason

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if the shard fails, the whole job will fail. So kind of failed job count = failed shard count w/o any additional information. Does it really make sense to duplicate it?

Comment on lines 1181 to 1183
opts.MetricsScope.Counter("aggregation.errors").Inc(1)
} else {
opts.MetricsScope.Counter("aggregation.success").Inc(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having two different metrics here, should it be one metric with a success tag?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to follow the convention which I saw elsewhere: https://github.com/m3db/m3/blob/gg/large-tiles-metrics/src/x/instrument/methods.go#L298
So, which one we should follow? :)

@gediminasgu gediminasgu merged commit 83fb11f into master Nov 3, 2020
@gediminasgu gediminasgu deleted the gg/large-tiles-metrics branch November 3, 2020 09:06
soundvibe added a commit that referenced this pull request Nov 5, 2020
* master:
  [read_index_segments] Always validate index segment checksums before reading/validating contents (#2835)
  [query] Return additional warnings in /query{,_range} endpoints (#2836)
  Add a check for seriesIter.Err (#2834)
  [tools] Add concurrent read_index_segments validation option (#2833)
  [query] Add non-ready namespaces to Clusters interface and use in /namespace/ready endpoint (#2828)
  [query] Tests for when function argument is an expression (#2809)
  [large-tiles] Additional metrics added (#2720)
  [query] Refactor groupByNodes and implement `nodes` arg in asPercent (#2816)
  [read_index_segments] Add log that outlines which segment being processed (#2825)
  [aggregator] Handle follower canLead() for not yet flushed windows (#2818)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants