Restrict the number of results processed per index worker #3269

ryanhall07 · 2021-02-23T04:30:18Z

What this PR does / why we need it:

A new MaxResultsPerPermit option is introduced to cap how many index
results an index worker can process at a time. If the max is exceeded, the
index worker must yield the permit back and acquire it again
(potentially waiting) to continue processing the results.

This cap ensures large queries don't dominate the finite number of index
workers allowed to run concurrently and lock out smaller queries. The
idea is users would want to set the max large enough so the vast
majority of typical queries can finish with only a single permit
acquisition.
Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

Does this PR require updating code package or user-facing documentation?:

A new MaxResultsPerPermit option is introduced to cap how many index results an index worker can process at a time. If the max is exceeded, the index worker must yield the permit back and acquire it again (potentially waiting) to continue processing the results. This cap ensures large queries don't dominate the finite number of index workers allowed to run concurrently and lock out smaller queries. The idea is users would want to set the max large enough so the vast majority of typical queries can finish with only a single permit acquisition.

codecov · 2021-02-23T19:35:05Z

Codecov Report

Merging #3269 (b7ab449) into master (73bac90) will decrease coverage by 0.0%.
The diff coverage is 81.3%.

@@            Coverage Diff            @@
##           master    #3269     +/-   ##
=========================================
- Coverage    72.5%    72.4%   -0.1%     
=========================================
  Files        1099     1101      +2     
  Lines      101504   101562     +58     
=========================================
- Hits        73616    73607      -9     
- Misses      22830    22866     +36     
- Partials     5058     5089     +31

Flag	Coverage Δ
aggregator	`76.5% <ø> (+<0.1%)`	⬆️
cluster	`84.9% <ø> (+<0.1%)`	⬆️
collector	`84.3% <ø> (ø)`
dbnode	`78.8% <81.3%> (-0.2%)`	⬇️
m3em	`74.4% <ø> (ø)`
m3ninx	`73.5% <ø> (ø)`
metrics	`19.9% <ø> (ø)`
msg	`74.2% <ø> (-0.2%)`	⬇️
query	`67.3% <ø> (ø)`
x	`80.5% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 73bac90...1337a6d. Read the comment docs.

src/cmd/services/m3dbnode/config/config.go

src/dbnode/storage/index.go

ryanhall07 · 2021-02-23T19:49:58Z

src/dbnode/storage/index.go

+				}
+				permits.Release()
+			}
+			blockIter.searchTime += blockIter.iter.SearchDuration()


a nice plus of this change is we no longer have to acquire locks to update the timing info on the shared results. each goroutine has its own timing information and the sub results are added together when all queries are done.

ryanhall07 · 2021-02-23T19:52:02Z

src/dbnode/storage/index/options.go

@@ -64,6 +64,9 @@ const (
 	aggregateResultsEntryArrayPoolSize        = 256
 	aggregateResultsEntryArrayPoolCapacity    = 256
 	aggregateResultsEntryArrayPoolMaxCapacity = 256 // Do not allow grows, since we know the size
+
+	// defaultResultsPerPermit sets the default index results that can be processed per permit acquired.
+	defaultResultsPerPermit = 10000


not sure what a good default is. I'm hesitant to make the default really large or off, since we changed the access pattern for acquiring workers. now all blocks eagerly attempt to acquire workers in parallel, so without some kind of max, larger queries will dominate even more.

ryanhall07 · 2021-02-23T20:09:56Z

src/dbnode/storage/limits/permits/fixed_permits.go

+	select {
+	case f.permits <- struct{}{}:
+	default:
+		panic("more permits released than acquired")


not sure how we feel about panics in the code base. fwiw the std go semaphore panics for the same reason. this can only happen due to a logical bug, which should be caught with a test

Yeah seems sane. Although yes usually try to avoid panics. Alternatively we could make it return an error and queries start failing.

wesleyk

awesome, LGTM!

src/cmd/services/m3dbnode/config/config.go

src/dbnode/storage/index.go

wesleyk · 2021-02-23T21:06:32Z

src/dbnode/storage/index/block_test.go

 		QueryOptions{SeriesLimit: 3},
 		results,
+		10,


how are we deciding on the limit here?

pretty arbitrary. high enough for the tests to pass with a single iteration.

src/dbnode/storage/index/types.go

wesleyk · 2021-02-23T21:08:28Z

src/dbnode/storage/index_block_test.go

@@ -775,15 +814,6 @@ func TestLimits(t *testing.T) {
 			requireExhaustive: false,
 			expectedErr:       "",
 		},
-		{
-			name:              "no limits",


why is this no longer relevant?

it doesn't make sense that exhaustive would be false if there is no limit. just seems made up for the test.

src/dbnode/storage/limits/permits/fixed_permits_test.go

wesleyk · 2021-02-24T01:09:19Z

src/dbnode/storage/index.go

-		// make sure the query hasn't been canceled
-		if queryCanceled() {
+		// acquire a permit before kicking off the goroutine to process the iterator. this limits the number of
+		// concurrent goroutines to # of permits + large queries that needed multiple iterations to finish.


src/dbnode/server/server.go

robskillington · 2021-02-24T19:22:27Z

src/cmd/services/m3dbnode/config/config.go

@@ -387,6 +387,14 @@ type IndexConfiguration struct {
 	// as they are very CPU-intensive (regex and FST matching).
 	MaxQueryIDsConcurrency int `yaml:"maxQueryIDsConcurrency" validate:"min=0"`

+	// MaxResultsPerPermit is the maximum index results a query can process after obtaining a permit. If a query needs


Should we split this up into different values for 1) regular queries 2) aggregate queries?

Aggregate queries when scoped with match[]=... are relatively more expensive per iteration since they have to do postings list interception with each aggregate term which is progressed to, as per:

m3/src/dbnode/storage/index/fields_terms_iterator.go

Lines 229 to 248 in 812d585

fti.current.term, fti.current.postings = fti.termIter.Current()

if fti.restrictByPostings == nil {

// No restrictions.

return true, nil

}

bitmap, ok := roaring.BitmapFromPostingsList(fti.current.postings)

if !ok {

return false, errUnpackBitmapFromPostingsList

}

// Check term is part of at least some of the documents we're

// restricted to providing results for based on intersection

// count.

// Note: IntersectionCount is significantly faster than intersecting and

// counting results and also does not allocate.

if n := fti.restrictByPostings.IntersectionCount(bitmap); n > 0 {

// Matches, this is next result.

return true, nil

}

(term progression)
and

m3/src/dbnode/storage/index/fields_terms_iterator.go

Lines 149 to 177 in 812d585

field, pl := fieldIter.Current()

if !fti.opts.allow(field) {

continue

}

if fti.restrictByPostings == nil {

// No restrictions.

fti.current.field = field

return true

}

bitmap, ok := roaring.BitmapFromPostingsList(pl)

if !ok {

fti.err = errUnpackBitmapFromPostingsList

return false

}

// Check field is part of at least some of the documents we're

// restricted to providing results for based on intersection

// count.

// Note: IntersectionCount is significantly faster than intersecting and

// counting results and also does not allocate.

if n := fti.restrictByPostings.IntersectionCount(bitmap); n < 1 {

// No match, not next result.

continue

}

// Matches, this is next result.

fti.current.field = field

return true

(field progression)

I suppose we could do this in a followup? Might be too much to add to scope of this PR.

that's pretty easy to do now, just 2 different config values.

src/dbnode/storage/index.go

…l-worker-pool-iter

robskillington · 2021-02-26T17:46:05Z

src/dbnode/storage/index/block.go

+		docs, series := iter.Counts()
+		b.metrics.queryDocsMatched.RecordValue(float64(docs))
+		b.metrics.querySeriesMatched.RecordValue(float64(series))


Looks good 👍

robskillington · 2021-03-01T15:27:00Z

src/cmd/services/m3dbnode/config/config.go

+// MaxResultsPerWorkerConfiguration configures the max results per index worker.
+type MaxResultsPerWorkerConfiguration struct {
+	// Fetch is the max for fetch queries.
+	Fetch int `yaml:"fetch"`
+	// Aggregate is the max for aggregate queries.
+	Aggregate int `yaml:"aggregate"`
+}


Is this used anymore? Seems like it can be removed.

robskillington

LGTM other than removing MaxResultsPerWorkerConfiguration

…l-worker-pool-iter

asafm · 2021-07-14T13:36:51Z

src/dbnode/storage/index/block.go

@@ -561,6 +528,9 @@ func (b *block) QueryWithIter(
 		}
 	}

+	iter.AddSeries(size)


@ryanhall07 @robskillington Just verifying with you as I read this code multiple times, and I think I found a bug:
size is the number of elements in results as effectively returned originally here:

func (r *results) AddDocuments(batch []doc.Document) (int, int, error) { r.Lock() err := r.addDocumentsBatchWithLock(batch) size := r.resultsMap.Len() docsCount := r.totalDocsCount + len(batch) r.totalDocsCount = docsCount r.Unlock() return size, docsCount, err }

Since results only keeps growing, as I haven't seen any reset to this object, it means that when you call iter.Counts() to get the seriesMatched, you will get a wrong number, since the AddSeries(size) keeping adding up total and not adding delta. Maybe the correct way is to measure results before adding and after to get that delta?

yea this looks like a bug.

… worker (#3269)

ryanhall07 force-pushed the rhall-worker-pool-iter branch 3 times, most recently from a05f145 to 706d4c6 Compare February 23, 2021 19:14

ryanhall07 force-pushed the rhall-worker-pool-iter branch from 706d4c6 to 9abfd17 Compare February 23, 2021 19:19

Merge branch 'master' into rhall-worker-pool-iter

b4b543e

ryanhall07 changed the title ~~WIP~~ Restrict the number of results processed per index worker Feb 23, 2021

mock gen

7c11e7d

ryanhall07 marked this pull request as ready for review February 23, 2021 19:34

ryanhall07 requested review from nbroyles, wesleyk and robskillington February 23, 2021 19:34

ryanhall07 commented Feb 23, 2021

View reviewed changes

src/cmd/services/m3dbnode/config/config.go Outdated Show resolved Hide resolved

ryanhall07 commented Feb 23, 2021

View reviewed changes

src/dbnode/storage/index.go Show resolved Hide resolved

ryanhall07 commented Feb 23, 2021

View reviewed changes

use channels for fixed permits

c8e2e40

ryanhall07 commented Feb 23, 2021

View reviewed changes

wesleyk approved these changes Feb 23, 2021

View reviewed changes

ryanhall07 added 3 commits February 23, 2021 16:24

use TryAcquire to limit # of go routines

2fa777b

Merge branch 'master' into rhall-worker-pool-iter

ae102f5

Just use Acquire to get permits

4581f11

wesleyk reviewed Feb 24, 2021

View reviewed changes