New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[aggregator] Simplify (Active)StagedPlacement API #3199

Merged

vdarulis merged 6 commits into master from v/asp_atomic

Feb 9, 2021

Collaborator

vdarulis commented Feb 9, 2021

What this PR does / why we need it:

This simplifies staged placement API and significantly reduces contention (placement is fetched for each metric write).

name                                           old time/op  new time/op  delta
StagedPlacementWatcherActiveStagedPlacement    38.1ns ± 4%  19.4ns ±11%  -48.91%  (p=0.000 n=6+10)
StagedPlacementWatcherActiveStagedPlacement-3  83.6ns ± 1%  24.3ns ±16%  -70.89%  (p=0.001 n=5+10)
StagedPlacementWatcherActiveStagedPlacement-6  69.4ns ± 7%  21.0ns ±12%  -69.74%  (p=0.000 n=6+10)
StagedPlacementWatcherActiveStagedPlacement-8  63.1ns ± 4%  16.3ns ±11%  -74.14%  (p=0.000 n=6+10)

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

Does this PR require updating code package or user-facing documentation?:

vdarulis added 5 commits

February 9, 2021 13:47


          [aggregator] Reduce locking in staged placement management

7a3ed29


          partly refactor watcher

301044d


          complete refactor


          add bench

dce17db


          update callers

82b5598

codecov bot commented Feb 9, 2021 •

edited

Loading

Codecov Report

Merging #3199 (3a93768) into master (7bf4a4e) will decrease coverage by 0.0%.
The diff coverage is 85.9%.

@@            Coverage Diff            @@
##           master    #3199     +/-   ##
=========================================
- Coverage    72.2%    72.2%   -0.1%     
=========================================
  Files        1087     1087             
  Lines      100714   100686     -28     
=========================================
- Hits        72816    72774     -42     
- Misses      22838    22846      +8     
- Partials     5060     5066      +6

Flag	Coverage Δ
aggregator	`75.9% <100.0%> (+<0.1%)`	⬆️
cluster	`84.9% <84.0%> (-0.2%)`	⬇️
collector	`84.3% <ø> (ø)`
dbnode	`78.7% <ø> (+<0.1%)`	⬆️
m3em	`74.4% <ø> (ø)`
m3ninx	`73.2% <ø> (ø)`
metrics	`20.0% <ø> (ø)`
msg	`73.5% <ø> (-0.7%)`	⬇️
query	`67.2% <ø> (ø)`
x	`80.2% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7bf4a4e...3a93768. Read the comment docs.

mway reviewed

View reviewed changes

src/cluster/placement/staged_placement.go Outdated

@@ @@ -34,38 +33,36 @@ import ( @@
               var (
               	errNoApplicablePlacement       = errors.New("no applicable placement found")
               	errActiveStagedPlacementClosed = errors.New("active staged placement is closed")
+              	errPlacementInvalidType        = errors.New("type assertion failed, corrupt placement")

Collaborator

mway Feb 9, 2021

nit: type assertion is an implementation detail, this can probably just be "corrupt placement"

src/cluster/placement/staged_placement.go Outdated

               )
-              type activeStagedPlacement struct {
-              	sync.RWMutex
+              var _noPlacements Placements

Collaborator

mway Feb 9, 2021

nit: group this with the errors, just put a break between 'em

src/cluster/placement/staged_placement.go Outdated

Comment on lines 42 to 48

+              	placements            atomic.Value
               	version               int
               	nowFn                 clock.NowFn
               	onPlacementsAddedFn   OnPlacementsAddedFn
               	onPlacementsRemovedFn OnPlacementsRemovedFn
-              	expiring atomic.Int32
-              	closed   bool
-              	doneFn   DoneFn
+              	expiring              atomic.Int32
+              	closed                atomic.Bool

Collaborator

mway Feb 9, 2021

probably not much value in memory-aligning, so mind sorting these?

src/cluster/placement/staged_placement.go

               		return errActiveStagedPlacementClosed
               	}
               	if p.onPlacementsRemovedFn != nil {
-              		p.onPlacementsRemovedFn(p.placements)
+              		pl, ok := p.placements.Load().(Placements)

Collaborator

mway Feb 9, 2021

do you think it's worth adding a quick note that we understand that pl is still subject to mutability races, it's expected based on historical design, something to that effect?

src/cluster/placement/staged_placement.go Outdated

-              func (p *activeStagedPlacement) activePlacementWithLock(timeNanos int64) (Placement, error) {
-              	if p.closed {
+              	if p.closed.Load() {

Collaborator

mway Feb 9, 2021

nit: do this before the loading placements

mway approved these changes

View reviewed changes


          feedback

3a93768

abliqo reviewed

View reviewed changes

src/cluster/placement/staged_placement.go

@@ @@ -34,38 +33,36 @@ import ( @@
               var (
               	errNoApplicablePlacement       = errors.New("no applicable placement found")
               	errActiveStagedPlacementClosed = errors.New("active staged placement is closed")
+              	errPlacementInvalidType        = errors.New("corrupt placement")
+              	_noPlacements Placements

Collaborator

abliqo Feb 9, 2021

ultra-nit: _emptyPlacements would read better.

abliqo approved these changes

View reviewed changes

vdarulis merged commit 0526c0f into master

vdarulis deleted the v/asp_atomic branch

February 9, 2021 21:57

soundvibe added a commit that referenced this pull request


          Merge branch 'master' into linasn/bootstrap-profiler

885e8f1

* master: (30 commits)
  [dbnode] Use go context to cancel index query workers after timeout (#3194)
  [aggregator] Fix change ActivePlacement semantics on close (#3201)
  [aggregator] Simplify (Active)StagedPlacement API (#3199)
  [aggregator] Checking if metadata is set to default should not cause copying (#3198)
  [dbnode] Remove readers and writer from aggregator API (#3122)
  [aggregator] Avoid large copies in entry rollup comparisons by making them more inline-friendly (#3195)
  [dbnode] Re-add aggregator doc limit update (#3137)
  [m3db] Do not close reader in filterFieldsIterator.Close() (#3196)
  Revert "Remove disk series read limit (#3174)" (#3193)
  [instrument] Improve sampled timer and stopwatch performance (#3191)
  Omit unset fields in metadata json (#3189)
  [dbnode] Remove left-over code in storage/bootstrap/bootstrapper (#3190)
  [dbnode][coordinator] Support match[] in label endpoints (#3180)
  Instrument the worker pool with the wait time (#3188)
  Instrument query path (#3182)
  [aggregator] Remove indirection, large copy from unaggregated protobuf decoder (#3186)
  [aggregator] Sample timers completely (#3184)
  [aggregator] Reduce error handling overhead in rawtcp server (#3183)
  [aggregator] Move shardID calculation out of critical section (#3179)
  Move instrumentation cleanup to FetchTaggedResultIterator Close() (#3173)
  ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet