-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instrument query path #3182
Instrument query path #3182
Conversation
Add additional metrics to better understand the bottlenecks in the query path.
Codecov Report
@@ Coverage Diff @@
## master #3182 +/- ##
=======================================
Coverage 72.2% 72.3%
=======================================
Files 1086 1087 +1
Lines 100558 100584 +26
=======================================
+ Hits 72675 72758 +83
+ Misses 22830 22781 -49
+ Partials 5053 5045 -8
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
} | ||
} | ||
|
||
type queryMetrics struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move this into the limits package? Seems like we typically want the combo of by range / by cardinality together
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually i think it's strange they are in the limits package. moved them to the index
package. good call on reusing though.
@@ -875,6 +904,12 @@ type fetchTaggedResultsIter struct { | |||
tagEncoder serialize.TagEncoder | |||
iOpts instrument.Options | |||
instrumentClose func(error) | |||
startTime time.Time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also what's the difference between this and fetchStart
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh it's not. clearly a test is needed for the metrics.
renamed this to dataReadStart
to make it more clear.
@@ -826,6 +850,11 @@ func (s *service) fetchTaggedIter(ctx context.Context, req *rpc.FetchTaggedReque | |||
tagEncoder: tagEncoder, | |||
iOpts: s.opts.InstrumentOptions(), | |||
instrumentClose: instrumentClose, | |||
totalDocsCount: queryResult.Results.TotalDocsCount(), | |||
nowFn: s.nowFn, | |||
fetchStart: startTime, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this supposed to include the index lookup? Looks like it does (line 835)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. it's total time of the fetch api call
src/dbnode/storage/index/metrics.go
Outdated
cardinalityHists []*cardinalityHist | ||
} | ||
|
||
func (bm *queryCardinality) Record(seriesCount int, queryRuntime time.Duration) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be docsCount
yeah? Maybe we should rename all this "cardinality" metrics naming to "queryDocsCount" metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* master: (30 commits) [dbnode] Use go context to cancel index query workers after timeout (#3194) [aggregator] Fix change ActivePlacement semantics on close (#3201) [aggregator] Simplify (Active)StagedPlacement API (#3199) [aggregator] Checking if metadata is set to default should not cause copying (#3198) [dbnode] Remove readers and writer from aggregator API (#3122) [aggregator] Avoid large copies in entry rollup comparisons by making them more inline-friendly (#3195) [dbnode] Re-add aggregator doc limit update (#3137) [m3db] Do not close reader in filterFieldsIterator.Close() (#3196) Revert "Remove disk series read limit (#3174)" (#3193) [instrument] Improve sampled timer and stopwatch performance (#3191) Omit unset fields in metadata json (#3189) [dbnode] Remove left-over code in storage/bootstrap/bootstrapper (#3190) [dbnode][coordinator] Support match[] in label endpoints (#3180) Instrument the worker pool with the wait time (#3188) Instrument query path (#3182) [aggregator] Remove indirection, large copy from unaggregated protobuf decoder (#3186) [aggregator] Sample timers completely (#3184) [aggregator] Reduce error handling overhead in rawtcp server (#3183) [aggregator] Move shardID calculation out of critical section (#3179) Move instrumentation cleanup to FetchTaggedResultIterator Close() (#3173) ...
Add additional metrics to better understand the bottlenecks in the query
path.
What this PR does / why we need it:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing and/or backwards incompatible change?:
Does this PR require updating code package or user-facing documentation?: