-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[large-tiles] Additional metrics added #2720
Conversation
src/dbnode/storage/types.go
Outdated
// HandleCounterResets is temporarily used to force counter reset handling logics on the processed series. | ||
// TODO: remove once we have metrics type stored in the metadata. | ||
HandleCounterResets bool | ||
MetricsScope tally.Scope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would this make more sense as an instrument.Options
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The good thing about the tally.Scope
is that I can set tags for all methods below (scope). With instrument.Options
I didn't find how to do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iOpts := instrument.NewOptions()
// To get scope
scope := iOpts.MetricsScope()
// To set scope on opts
iOpts = iOpts.SetMetricsScope(scope.With("tags...."))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just seems more conventional to put instrument options into options structs rather than the raw scopes
@@ -1693,6 +1694,7 @@ func (n *dbNamespace) aggregateTiles( | |||
ctx, sourceNs.ID(), sourceShard.ID(), blockReaders, sourceBlockVolumes, opts, nsCtx.Schema) | |||
|
|||
processedTileCount += shardProcessedTileCount | |||
processedShards.Inc(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this always increment? Probably better to break this out into success/error metrics? Also might be useful to add a metric for attemptedShards
that would increment at the start of this loop to catch scenarios where we can't read the shard for whatever reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, if the shard fails, the whole job will fail. So kind of failed job count = failed shard count w/o any additional information. Does it really make sense to duplicate it?
src/dbnode/storage/database.go
Outdated
opts.MetricsScope.Counter("aggregation.errors").Inc(1) | ||
} else { | ||
opts.MetricsScope.Counter("aggregation.success").Inc(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having two different metrics here, should it be one metric with a success tag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to follow the convention which I saw elsewhere: https://github.com/m3db/m3/blob/gg/large-tiles-metrics/src/x/instrument/methods.go#L298
So, which one we should follow? :)
# Conflicts: # src/dbnode/storage/namespace.go # src/dbnode/storage/types.go
# Conflicts: # src/dbnode/storage/types.go
* master: [read_index_segments] Always validate index segment checksums before reading/validating contents (#2835) [query] Return additional warnings in /query{,_range} endpoints (#2836) Add a check for seriesIter.Err (#2834) [tools] Add concurrent read_index_segments validation option (#2833) [query] Add non-ready namespaces to Clusters interface and use in /namespace/ready endpoint (#2828) [query] Tests for when function argument is an expression (#2809) [large-tiles] Additional metrics added (#2720) [query] Refactor groupByNodes and implement `nodes` arg in asPercent (#2816) [read_index_segments] Add log that outlines which segment being processed (#2825) [aggregator] Handle follower canLead() for not yet flushed windows (#2818)
What this PR does / why we need it:
This PR adds useful metrics for large-tiles process monitoring.
Special notes for your reviewer:
Does this PR introduce a user-facing and/or backwards incompatible change?:
Does this PR require updating code package or user-facing documentation?: