Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrument query path #3182

Merged
merged 15 commits into from
Feb 7, 2021
Merged

Instrument query path #3182

merged 15 commits into from
Feb 7, 2021

Conversation

ryanhall07
Copy link
Collaborator

Add additional metrics to better understand the bottlenecks in the query
path.

What this PR does / why we need it:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:


Does this PR require updating code package or user-facing documentation?:


Add additional metrics to better understand the bottlenecks in the query
path.
@ryanhall07 ryanhall07 marked this pull request as ready for review February 6, 2021 03:09
@codecov
Copy link

codecov bot commented Feb 6, 2021

Codecov Report

Merging #3182 (9c424e2) into master (8840b10) will increase coverage by 0.0%.
The diff coverage is 83.2%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3182   +/-   ##
=======================================
  Coverage    72.2%    72.3%           
=======================================
  Files        1086     1087    +1     
  Lines      100558   100584   +26     
=======================================
+ Hits        72675    72758   +83     
+ Misses      22830    22781   -49     
+ Partials     5053     5045    -8     
Flag Coverage Δ
aggregator 75.8% <ø> (-0.1%) ⬇️
cluster 85.0% <ø> (ø)
collector 84.3% <ø> (ø)
dbnode 78.7% <82.6%> (+<0.1%) ⬆️
m3em 74.4% <ø> (ø)
m3ninx 73.2% <100.0%> (+<0.1%) ⬆️
metrics 20.0% <ø> (ø)
msg 74.0% <ø> (-0.3%) ⬇️
query 67.3% <ø> (+0.1%) ⬆️
x 80.3% <ø> (-0.1%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8840b10...d89dc3a. Read the comment docs.

Copy link
Collaborator

@wesleyk wesleyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}
}

type queryMetrics struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this into the limits package? Seems like we typically want the combo of by range / by cardinality together

Copy link
Collaborator Author

@ryanhall07 ryanhall07 Feb 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually i think it's strange they are in the limits package. moved them to the index package. good call on reusing though.

@@ -875,6 +904,12 @@ type fetchTaggedResultsIter struct {
tagEncoder serialize.TagEncoder
iOpts instrument.Options
instrumentClose func(error)
startTime time.Time
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this set?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also what's the difference between this and fetchStart?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh it's not. clearly a test is needed for the metrics.

renamed this to dataReadStart to make it more clear.

@@ -826,6 +850,11 @@ func (s *service) fetchTaggedIter(ctx context.Context, req *rpc.FetchTaggedReque
tagEncoder: tagEncoder,
iOpts: s.opts.InstrumentOptions(),
instrumentClose: instrumentClose,
totalDocsCount: queryResult.Results.TotalDocsCount(),
nowFn: s.nowFn,
fetchStart: startTime,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this supposed to include the index lookup? Looks like it does (line 835)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. it's total time of the fetch api call

cardinalityHists []*cardinalityHist
}

func (bm *queryCardinality) Record(seriesCount int, queryRuntime time.Duration) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be docsCount yeah? Maybe we should rename all this "cardinality" metrics naming to "queryDocsCount" metrics?

Copy link
Collaborator

@robskillington robskillington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ryanhall07 ryanhall07 merged commit a628eb5 into master Feb 7, 2021
@ryanhall07 ryanhall07 deleted the rhall-instrument-index branch February 7, 2021 23:44
soundvibe added a commit that referenced this pull request Feb 10, 2021
* master: (30 commits)
  [dbnode] Use go context to cancel index query workers after timeout (#3194)
  [aggregator] Fix change ActivePlacement semantics on close (#3201)
  [aggregator] Simplify (Active)StagedPlacement API (#3199)
  [aggregator] Checking if metadata is set to default should not cause copying (#3198)
  [dbnode] Remove readers and writer from aggregator API (#3122)
  [aggregator] Avoid large copies in entry rollup comparisons by making them more inline-friendly (#3195)
  [dbnode] Re-add aggregator doc limit update (#3137)
  [m3db] Do not close reader in filterFieldsIterator.Close() (#3196)
  Revert "Remove disk series read limit (#3174)" (#3193)
  [instrument] Improve sampled timer and stopwatch performance (#3191)
  Omit unset fields in metadata json (#3189)
  [dbnode] Remove left-over code in storage/bootstrap/bootstrapper (#3190)
  [dbnode][coordinator] Support match[] in label endpoints (#3180)
  Instrument the worker pool with the wait time (#3188)
  Instrument query path (#3182)
  [aggregator] Remove indirection, large copy from unaggregated protobuf decoder (#3186)
  [aggregator] Sample timers completely (#3184)
  [aggregator] Reduce error handling overhead in rawtcp server (#3183)
  [aggregator] Move shardID calculation out of critical section (#3179)
  Move instrumentation cleanup to FetchTaggedResultIterator Close() (#3173)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants