[dbnode] Re-add aggregator doc limit update #3137

arnikola · 2021-01-29T09:01:56Z

Reverts #3133 and uses the regression test suite from #3135 to use a backwards compatible doc limit while maintaining the per-query controls that had been reverted.

Review guide: commit 77100c4 is a revert, so it may be easier to look at the diff from then on.

wesleyk · 2021-01-29T14:29:40Z

src/dbnode/storage/index/block.go

+		lastField               []byte
+		lastFieldIsValid        bool
+		reuseLastEntry          bool
+		fieldsAdded, termsAdded bool


newFieldAdded / newTermAdded?

wesleyk · 2021-01-29T14:30:25Z

src/dbnode/storage/index/block.go

@@ -234,7 +234,6 @@ func NewBlock(
 		iopts,
 	)

-	aggAdded := opts.InstrumentOptions().MetricsScope().Counter("aggregate-added-counter")


why remove this?

This metric is essentially a copy of the total aggregate results metric here, so not too useful

wesleyk · 2021-01-29T14:33:37Z

src/dbnode/storage/index/block.go

@@ -711,27 +715,58 @@ func (b *block) aggregateWithSpan(
 			}

 			field, term := iter.Current()
-			batch, numAdded = b.appendFieldAndTermToBatch(batch, field, term, iterateTerms)


so the problem here was that numAdded included duplicates?

Added the writeup of the problem on the other comment, hopefully explains what was wrong

wesleyk · 2021-01-29T14:36:54Z

src/dbnode/storage/index/block.go

 			if results.EnforceLimits() {
-				if err := b.docsLimit.Inc(len(batch), source); err != nil {
-					return false, err
+				if lastField == nil {


so we're essentially only incrementing the docs limit if we're working on a new field?

So essentially what was happening previously was this:

incoming field/term tuples were added to the batch [here]
-- if terms were included in the query (e.g. label/foo/values), and the term matched the previous term, it would just alloc and append the field onto the existing entry’s terms; this was the underlying problem that took out our test cluster when their cardinality exploded, as it would allocate a huge slice of tags, as well as the tags themselves, here. This is because the batch size is calculated as len(terms), and would cause the batch max to apply only to new(different) label names.
-- Once either batch is full (note that for a label/foo/values request, this would never happen since the batch length would be at most 1) or incoming entries are exhausted, we’d increment the docs limit with len(terms).

Although there were no changes to how the docs limit was incremented, because the batch size calculation was changed from len(terms) to ~~len(terms) + len(all entries), we’d increment the docs limit far more often, which is a big problem when the terms are mostly a single value, as is always the case with label/foo/values, which is probably the most widely used metadata query endpoint; in essence we'd be reporting a doc increase for each 256 terms, rather than once per query as previously.

codecov · 2021-01-29T15:34:17Z

Codecov Report

Merging #3137 (97f5761) into master (ff99f05) will increase coverage by 6.5%.
The diff coverage is 83.4%.

@@            Coverage Diff             @@
##           master    #3137      +/-   ##
==========================================
+ Coverage    65.7%    72.3%    +6.5%     
==========================================
  Files         234     1087     +853     
  Lines       24512   100786   +76274     
==========================================
+ Hits        16127    72893   +56766     
- Misses       7333    22829   +15496     
- Partials     1052     5064    +4012

Flag	Coverage Δ
aggregator	`75.8% <ø> (+21.5%)`	⬆️
cluster	`85.0% <ø> (?)`
collector	`84.3% <ø> (+38.6%)`	⬆️
dbnode	`78.7% <83.4%> (+5.5%)`	⬆️
m3em	`74.4% <ø> (?)`
m3ninx	`73.3% <ø> (?)`
metrics	`20.0% <ø> (?)`
msg	`74.2% <ø> (?)`
query	`67.2% <ø> (+0.9%)`	⬆️
x	`80.2% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ff99f05...1b8d366. Read the comment docs.

robskillington · 2021-02-09T02:12:40Z

src/dbnode/storage/index/aggregate_results.go

+	if ns == nil {
+		return &usageMetrics{}


nit: Shouldn't this just return an error and the caller should return an error if construction fails?

For tests we should probably just update callsites to always pass a non-nil ident.ID for the namespace.

Yeah you're probably right, this felt really wonky

Will remove this; to be honest we don't really get a lot out of per namespace metrics here, will just drop namespace from the tags entirely

Resolving in following pr

src/dbnode/storage/index/aggregate_results.go

robskillington · 2021-02-09T02:20:27Z

src/dbnode/storage/index/aggregate_results.go

-		if err := r.addFieldWithLock(field.Name, field.Value); err != nil {
-			return fmt.Errorf("unable to add document [%+v]: %w", document, err)
+	// NB: cannot insert more than max docs, so that acts as the upper bound here.
+	remainingInserts := remainingDocs


Could this not just be:

remainingInserts := remainingDocs if r.aggregateOpts.SizeLimit != 0 { remainingInserts = int(math.Min(float64(remainingInserts), float64(r.aggregateOpts.SizeLimit - r.size))) }

robskillington

LGTM

* master: (30 commits) [dbnode] Use go context to cancel index query workers after timeout (#3194) [aggregator] Fix change ActivePlacement semantics on close (#3201) [aggregator] Simplify (Active)StagedPlacement API (#3199) [aggregator] Checking if metadata is set to default should not cause copying (#3198) [dbnode] Remove readers and writer from aggregator API (#3122) [aggregator] Avoid large copies in entry rollup comparisons by making them more inline-friendly (#3195) [dbnode] Re-add aggregator doc limit update (#3137) [m3db] Do not close reader in filterFieldsIterator.Close() (#3196) Revert "Remove disk series read limit (#3174)" (#3193) [instrument] Improve sampled timer and stopwatch performance (#3191) Omit unset fields in metadata json (#3189) [dbnode] Remove left-over code in storage/bootstrap/bootstrapper (#3190) [dbnode][coordinator] Support match[] in label endpoints (#3180) Instrument the worker pool with the wait time (#3188) Instrument query path (#3182) [aggregator] Remove indirection, large copy from unaggregated protobuf decoder (#3186) [aggregator] Sample timers completely (#3184) [aggregator] Reduce error handling overhead in rawtcp server (#3183) [aggregator] Move shardID calculation out of critical section (#3179) Move instrumentation cleanup to FetchTaggedResultIterator Close() (#3173) ...

arnikola added 5 commits January 29, 2021 01:34

[dbnode] Add aggregate term limit regression test

8501986

Linter

892cdd6

[dbnode] Revert #3133 with fix

77100c4

Added fixes to aggregate query limit calculation

01d2dbf

Lint + build fix

969c7bc

wesleyk reviewed Jan 29, 2021

View reviewed changes

Base automatically changed from arnikola/doc-prop-test to master January 29, 2021 14:55

Merge branch 'master' into arnikola/fix-agg-limit

a288e7e

arnikola added 5 commits February 1, 2021 11:12

Lint + response

2048c63

Merge branch 'master' into arnikola/fix-agg-limit

34c39eb

Merge branch 'master' into arnikola/fix-agg-limit

ed9886a

Merge branch 'master' into arnikola/fix-agg-limit

969a386

codegen

97f5761

robskillington reviewed Feb 9, 2021

View reviewed changes

src/dbnode/storage/index/aggregate_results.go Show resolved Hide resolved

robskillington reviewed Feb 9, 2021

View reviewed changes

robskillington approved these changes Feb 9, 2021

View reviewed changes

Merge branch 'master' into arnikola/fix-agg-limit

1b8d366

arnikola merged commit b44bc36 into master Feb 9, 2021

arnikola deleted the arnikola/fix-agg-limit branch February 9, 2021 02:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dbnode] Re-add aggregator doc limit update #3137

[dbnode] Re-add aggregator doc limit update #3137

arnikola commented Jan 29, 2021 •

edited

Loading

wesleyk Jan 29, 2021

wesleyk Jan 29, 2021

arnikola Feb 1, 2021 •

edited

Loading

wesleyk Jan 29, 2021

arnikola Feb 1, 2021

wesleyk Jan 29, 2021

arnikola Feb 1, 2021

codecov bot commented Jan 29, 2021 •

edited

Loading

robskillington Feb 9, 2021

arnikola Feb 9, 2021 •

edited

Loading

robskillington Feb 9, 2021

robskillington left a comment

[dbnode] Re-add aggregator doc limit update #3137

[dbnode] Re-add aggregator doc limit update #3137

Conversation

arnikola commented Jan 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arnikola Feb 1, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 29, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

arnikola Feb 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robskillington left a comment

Choose a reason for hiding this comment

arnikola commented Jan 29, 2021 •

edited

Loading

arnikola Feb 1, 2021 •

edited

Loading

codecov bot commented Jan 29, 2021 •

edited

Loading

arnikola Feb 9, 2021 •

edited

Loading