Instrument query path #3182

ryanhall07 · 2021-02-06T03:04:01Z

Add additional metrics to better understand the bottlenecks in the query
path.

What this PR does / why we need it:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

Does this PR require updating code package or user-facing documentation?:

Add additional metrics to better understand the bottlenecks in the query path.

codecov · 2021-02-06T03:22:23Z

Codecov Report

Merging #3182 (9c424e2) into master (8840b10) will increase coverage by 0.0%.
The diff coverage is 83.2%.

@@           Coverage Diff           @@
##           master    #3182   +/-   ##
=======================================
  Coverage    72.2%    72.3%           
=======================================
  Files        1086     1087    +1     
  Lines      100558   100584   +26     
=======================================
+ Hits        72675    72758   +83     
+ Misses      22830    22781   -49     
+ Partials     5053     5045    -8

Flag	Coverage Δ
aggregator	`75.8% <ø> (-0.1%)`	⬇️
cluster	`85.0% <ø> (ø)`
collector	`84.3% <ø> (ø)`
dbnode	`78.7% <82.6%> (+<0.1%)`	⬆️
m3em	`74.4% <ø> (ø)`
m3ninx	`73.2% <100.0%> (+<0.1%)`	⬆️
metrics	`20.0% <ø> (ø)`
msg	`74.0% <ø> (-0.3%)`	⬇️
query	`67.3% <ø> (+0.1%)`	⬆️
x	`80.3% <ø> (-0.1%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8840b10...d89dc3a. Read the comment docs.

wesleyk

LGTM

wesleyk · 2021-02-06T21:01:18Z

src/dbnode/network/server/tchannelthrift/node/service.go

+	}
+}
+
+type queryMetrics struct {


can we move this into the limits package? Seems like we typically want the combo of by range / by cardinality together

actually i think it's strange they are in the limits package. moved them to the index package. good call on reusing though.

wesleyk · 2021-02-06T21:07:12Z

src/dbnode/network/server/tchannelthrift/node/service.go

@@ -875,6 +904,12 @@ type fetchTaggedResultsIter struct {
 	tagEncoder      serialize.TagEncoder
 	iOpts           instrument.Options
 	instrumentClose func(error)
+	startTime       time.Time


where is this set?

also what's the difference between this and fetchStart?

ugh it's not. clearly a test is needed for the metrics.

renamed this to dataReadStart to make it more clear.

wesleyk · 2021-02-06T21:08:00Z

src/dbnode/network/server/tchannelthrift/node/service.go

@@ -826,6 +850,11 @@ func (s *service) fetchTaggedIter(ctx context.Context, req *rpc.FetchTaggedReque
 		tagEncoder:      tagEncoder,
 		iOpts:           s.opts.InstrumentOptions(),
 		instrumentClose: instrumentClose,
+		totalDocsCount:  queryResult.Results.TotalDocsCount(),
+		nowFn:           s.nowFn,
+		fetchStart:      startTime,


is this supposed to include the index lookup? Looks like it does (line 835)

yes. it's total time of the fetch api call

robskillington · 2021-02-07T04:05:09Z

src/dbnode/storage/index/metrics.go

+	cardinalityHists []*cardinalityHist
+}
+
+func (bm *queryCardinality) Record(seriesCount int, queryRuntime time.Duration) {


This should be docsCount yeah? Maybe we should rename all this "cardinality" metrics naming to "queryDocsCount" metrics?

…l-instrument-index

robskillington

LGTM

…l-instrument-index

* master: (30 commits) [dbnode] Use go context to cancel index query workers after timeout (#3194) [aggregator] Fix change ActivePlacement semantics on close (#3201) [aggregator] Simplify (Active)StagedPlacement API (#3199) [aggregator] Checking if metadata is set to default should not cause copying (#3198) [dbnode] Remove readers and writer from aggregator API (#3122) [aggregator] Avoid large copies in entry rollup comparisons by making them more inline-friendly (#3195) [dbnode] Re-add aggregator doc limit update (#3137) [m3db] Do not close reader in filterFieldsIterator.Close() (#3196) Revert "Remove disk series read limit (#3174)" (#3193) [instrument] Improve sampled timer and stopwatch performance (#3191) Omit unset fields in metadata json (#3189) [dbnode] Remove left-over code in storage/bootstrap/bootstrapper (#3190) [dbnode][coordinator] Support match[] in label endpoints (#3180) Instrument the worker pool with the wait time (#3188) Instrument query path (#3182) [aggregator] Remove indirection, large copy from unaggregated protobuf decoder (#3186) [aggregator] Sample timers completely (#3184) [aggregator] Reduce error handling overhead in rawtcp server (#3183) [aggregator] Move shardID calculation out of critical section (#3179) Move instrumentation cleanup to FetchTaggedResultIterator Close() (#3173) ...

ryanhall07 added 2 commits February 5, 2021 19:03

Instrument query path

5e44589

Add additional metrics to better understand the bottlenecks in the query path.

lint

2b958d5

ryanhall07 marked this pull request as ready for review February 6, 2021 03:09

ryanhall07 requested review from rallen090 and robskillington February 6, 2021 03:09

ryanhall07 added 5 commits February 6, 2021 11:10

Accumulate total durations for entire query

53b60b9

lint

7198b6b

mock gen

635f50a

Add lock when updating result durations

2073eb9

fix wide query lock

0fb1e29

wesleyk approved these changes Feb 6, 2021

View reviewed changes

ryanhall07 added 2 commits February 6, 2021 15:41

reuse composite QueryMetrics in service

07960d6

Merge branch 'master' into rhall-instrument-index

9c424e2

robskillington reviewed Feb 7, 2021

View reviewed changes

ryanhall07 added 2 commits February 6, 2021 20:32

Reset result durations

3a0405e

Merge branch 'rhall-instrument-index' of github.com:m3db/m3 into rhal…

434f8b6

…l-instrument-index

robskillington approved these changes Feb 7, 2021

View reviewed changes

ryanhall07 added 4 commits February 7, 2021 14:39

Add wait time metric and rename cardinality -> docCount

e235c16

Merge branch 'master' into rhall-instrument-index

1d44f6a

lock wait time update

4886c4c

Merge branch 'rhall-instrument-index' of github.com:m3db/m3 into rhal…

d89dc3a

…l-instrument-index

ryanhall07 merged commit a628eb5 into master Feb 7, 2021

ryanhall07 deleted the rhall-instrument-index branch February 7, 2021 23:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instrument query path #3182

Instrument query path #3182

ryanhall07 commented Feb 6, 2021

codecov bot commented Feb 6, 2021 •

edited

Loading

wesleyk left a comment

wesleyk Feb 6, 2021

ryanhall07 Feb 6, 2021 •

edited

Loading

wesleyk Feb 6, 2021

wesleyk Feb 6, 2021

ryanhall07 Feb 6, 2021

wesleyk Feb 6, 2021

ryanhall07 Feb 6, 2021

robskillington Feb 7, 2021

robskillington left a comment

Instrument query path #3182

Instrument query path #3182

Conversation

ryanhall07 commented Feb 6, 2021

codecov bot commented Feb 6, 2021 • edited Loading

Codecov Report

wesleyk left a comment

Choose a reason for hiding this comment

wesleyk Feb 6, 2021

Choose a reason for hiding this comment

ryanhall07 Feb 6, 2021 • edited Loading

Choose a reason for hiding this comment

wesleyk Feb 6, 2021

Choose a reason for hiding this comment

wesleyk Feb 6, 2021

Choose a reason for hiding this comment

ryanhall07 Feb 6, 2021

Choose a reason for hiding this comment

wesleyk Feb 6, 2021

Choose a reason for hiding this comment

ryanhall07 Feb 6, 2021

Choose a reason for hiding this comment

robskillington Feb 7, 2021

Choose a reason for hiding this comment

robskillington left a comment

Choose a reason for hiding this comment

codecov bot commented Feb 6, 2021 •

edited

Loading

ryanhall07 Feb 6, 2021 •

edited

Loading