-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: use SetOptions()
when reusing Pebble iterators
#79291
storage: use SetOptions()
when reusing Pebble iterators
#79291
Conversation
SetOptions()
when reusing PebbleSetOptions()
when reusing Pebble iterators
@jbowens Note the removed |
de70b32
to
fb4c0b0
Compare
34e6ab0
to
4798ae2
Compare
a797bea
to
0289491
Compare
@jbowens I'm getting a test failure here ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a pass and . Would be good for either Jackson or Sumeer to give it a thumbs-up too.
Reviewed 16 of 16 files at r1, 3 of 3 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aliher1911, @jbowens, and @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbowens Note the removed pebbleIterator.setBounds() optimization that avoided calling Iterator.SetBounds() if the bounds did not change. I think we'll need to move this kind of optimization down into SetOptions() -- some options changes may not need the iterator to be repositioned or invalidated, but only Pebble can control that since we're not able to change individual options.
All changes require the Iterator to be repositioned (including changes to masking, key types, bounds). I don't think it's possible to move this optimization down into Pebble. Check out this comment: https://github.com/cockroachdb/pebble/blob/c8e7d4149ada9f4a06d8ae44b04b25618019bc4e/iterator.go#L1672-L1681
@jbowens I'm getting a test failure here (TestEngineTimeBound) that I think is caused by SetOptions() not replacing TableFilter. I'm essentially creating an iterator with a TableFilter set, and then cloning this iterator and calling SetOptions() on it with a new TableFilter. But it seems like the old TableFilter closure gets called on the cloned iterator. Does that check out?
Sorry, this is my mistake. I overlooked this. Filed (cockroachdb/pebble#1621) I'm not sure what we can do for the TableFilter
since it's just a closure, and we cannot determine whether or not the closure changed. We can (and should) detect whether the block property filters changed. Now that we have block property filters in 22.1, do we need to use TableFilter
? Block property filters are strictly more powerful and also allow filtering out whole tables. Maybe we can't because sstables were created in 21.2 that still exist in 22.2 would not be subject to filtering, since they wouldn't have block property annotations.
Reviewed 1 of 16 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aliher1911 and @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All changes require the Iterator to be repositioned (including changes to masking, key types, bounds). I don't think it's possible to move this optimization down into Pebble. Check out this comment: https://github.com/cockroachdb/pebble/blob/c8e7d4149ada9f4a06d8ae44b04b25618019bc4e/iterator.go#L1672-L1681
I see, thanks. I'll run some preliminary benchmarks to see what effect this has, and what can be done about it.
Sorry, this is my mistake. I overlooked this. Filed (cockroachdb/pebble#1621) I'm not sure what we can do for the TableFilter since it's just a closure, and we cannot determine whether or not the closure changed.
Why do we need to compare? Can't we unconditionally replace the existing table filter? If the concern is tearing down the iterator stacks then I suppose a nil
check would suffice? In the majority of cases this will be nil
.
Now that we have block property filters in 22.1, do we need to use TableFilter? Block property filters are strictly more powerful and also allow filtering out whole tables. Maybe we can't because sstables were created in 21.2 that still exist in 22.2 would not be subject to filtering, since they wouldn't have block property annotations.
I suppose the only option here would be a migration that rewrites the properties. I guess it's too late in the game to get that into 22.1? Seems unfortunate to have to maintain different code paths in 22.2 for this.
Worst case, we can just prevent reuse of time-bound iterators -- that's what we do today anyway.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aliher1911 and @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the code deletion. I'm not sure how important the set bounds optimization is, but we should make sure to circle back on that (thanks for recording the TODO).
If the concern is tearing down the iterator stacks then I suppose a nil check would suffice? In the majority of cases this will be nil.
Sounds good. SetOptions
will now [just merged in Pebble] replace the iterator stack if filters are non-nil before or after the SetOptions
call.
I suppose the only option here would be a migration that rewrites the properties. I guess it's too late in the game to get that into 22.1? Seems unfortunate to have to maintain different code paths in 22.2 for this.
Yeah, a migration would also be intense. Block property filters are new in 22.1, so this migration would effectively need to rewrite every sstable in the store. I don't think we'd want to block 22.1 cluster version finalization on that, in which case we'd need to support the different code path in 22.2 anyways. We could do the blocking migration in 22.2 with the expectation that most of the sstables within a store should've been written by 22.1. A quick 21.2->22.1->22.2 upgrade sequence would still need to rewrite almost the entirety of its stores though.
Reviewed 3 of 16 files at r1, 1 of 3 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @aliher1911 and @sumeerbhola)
4798ae2
to
599fe1b
Compare
0289491
to
fc07f22
Compare
This patch removes the `MVCCIterator.SetUpperBound()` method, which was only used by `MVCCIterator.ClearIterRange()`. `ClearIterRange` now also creates the iterator itself, instead of having it passed in. This ensures the iterator is created with appropriate bounds and options, which becomes even more important with the introduction of range keys. If further flexibility is needed, we can consider passing an `IterOptions` struct instead of a key span. Release note: None
Previously, `pebbleReadOnly` and `pebbleBatch` only supported a single concurrent iterator of a given type. Attempts to create an additional iterator would panic with "iterator already in use". This is particularly problematic because a subsequent change will make more iterators reusable (specifically TBI iterators), such that e.g. `MVCCIncrementalIterator` will be creating two concurrent reusable iterators and trigger this panic. This patch supports multiple iterators by creating new cloned iterators if the existing iterator is in use. This preserves the pinned engine state for the cloned iterators too. Release note: None
fc07f22
to
827766a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the concern is tearing down the iterator stacks then I suppose a nil check would suffice? In the majority of cases this will be nil.
Sounds good. SetOptions will now [just merged in Pebble] replace the iterator stack if filters are non-nil before or after the SetOptions call.
Awesome, thanks for the quick turnaround! Tests pass now, but I'm going to run a few benchmarks before merging.
Yeah, a migration would also be intense. Block property filters are new in 22.1, so this migration would effectively need to rewrite every sstable in the store. I don't think we'd want to block 22.1 cluster version finalization on that, in which case we'd need to support the different code path in 22.2 anyways. We could do the blocking migration in 22.2 with the expectation that most of the sstables within a store should've been written by 22.1. A quick 21.2->22.1->22.2 upgrade sequence would still need to rewrite almost the entirety of its stores though.
Good point, probably better to wait for 22.2 (or later). This will work for now.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @aliher1911, @jbowens, @nicktrav, and @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to run a few benchmarks before merging.
We're taking an ~8% hit on MVCC gets, using a batch which clones the batch iterator on every get. This is almost entirely due to the removed setOptions()
optimization. It's probably more than we can get away with merging to master, so I'll see if I can claw it back somehow.
name old time/op new time/op delta
MVCCGet_Pebble/versions=1/valueSize=8-24 2.60µs ± 1% 2.83µs ± 1% +8.91% (p=0.000 n=9+10)
MVCCGet_Pebble/versions=10/valueSize=8-24 3.90µs ± 0% 4.23µs ± 1% +8.46% (p=0.000 n=7+9)
MVCCGet_Pebble/versions=100/valueSize=8-24 10.2µs ± 4% 10.5µs ± 2% +3.33% (p=0.003 n=10+10)
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @aliher1911, @jbowens, @nicktrav, and @sumeerbhola)
827766a
to
f6068b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extended the setBounds()
check to cover all options, which didn't turn out too bad. Got the performance diff down to about 2%:
name old time/op new time/op delta
MVCCGet_Pebble/versions=1/valueSize=8-24 2.59µs ± 1% 2.63µs ± 1% +1.52% (p=0.000 n=18+20)
MVCCGet_Pebble/versions=10/valueSize=8-24 3.91µs ± 1% 4.00µs ± 1% +2.42% (p=0.000 n=18+19)
MVCCGet_Pebble/versions=100/valueSize=8-24 9.77µs ± 5% 9.95µs ± 5% +1.78% (p=0.012 n=19+19)
I think that's acceptable -- this is in the tens of nanoseconds, so I don't think it will show up on end-to-end benchmarks. We'll have to do another optimization pass once MVCC range tombstones are fully implemented anyway.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @aliher1911, @jbowens, @nicktrav, and @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @aliher1911, @jbowens, @nicktrav, and @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the latest set of changes -
Reviewed 3 of 8 files at r4, 1 of 3 files at r5, 4 of 4 files at r6, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @sumeerbhola)
pkg/storage/pebble_iterator.go, line 149 at r6 (raw file):
// We don't generate a pebble.Options for comparison, because we want to reuse // the byte slices of the existing p.options for any key encoding. optsChanged := p.options.OnlyReadGuaranteedDurable != (durability == GuaranteedDurability) ||
Is there a risk of introducing "brittleness" here in that we add some new options and they aren't taken into account here? Given this is just a performance optimization rather than a correctness issue, it's probably fine, just wanted to mention it.
f6068b7
to
b6ac05e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closed the performance gap by rearranging the predicates.
name old time/op new time/op delta
MVCCGet_Pebble/versions=1/valueSize=8-24 2.59µs ± 1% 2.60µs ± 1% +0.42% (p=0.013 n=18+19)
MVCCGet_Pebble/versions=10/valueSize=8-24 3.91µs ± 1% 3.91µs ± 1% ~ (p=0.597 n=18+20)
MVCCGet_Pebble/versions=100/valueSize=8-24 9.77µs ± 5% 9.41µs ± 4% -3.77% (p=0.000 n=19+20)
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @aliher1911, @nicktrav, and @sumeerbhola)
pkg/storage/pebble_iterator.go, line 149 at r6 (raw file):
Previously, nicktrav (Nick Travers) wrote…
Is there a risk of introducing "brittleness" here in that we add some new options and they aren't taken into account here? Given this is just a performance optimization rather than a correctness issue, it's probably fine, just wanted to mention it.
Yes, that's a very real risk here. And it's true that this is "just" a performance optimization, but forgetting to handle a new option here can cause that option to not take effect if we take the fast path.
However, every option that's passed down to a Pebble iterator must have logic in this function to translate from a CRDB IterOptions
to a Pebble IterOptions
, so the hope is that someone adding logic for an option here will see this condition and update it. That's probably sufficient?
Iterator reuse now relies on `Pebble.SetOptions()` to configure the reused Pebble iterator. This allows a wider range of iterators to be reused, since previously only the bounds could be changed on existing iterators. Release note: None
b6ac05e
to
b5e4c15
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r7, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @aliher1911, @erikgrinaker, and @sumeerbhola)
pkg/storage/pebble_iterator.go, line 149 at r6 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Yes, that's a very real risk here. And it's true that this is "just" a performance optimization, but forgetting to handle a new option here can cause that option to not take effect if we take the fast path.
However, every option that's passed down to a Pebble iterator must have logic in this function to translate from a CRDB
IterOptions
to a PebbleIterOptions
, so the hope is that someone adding logic for an option here will see this condition and update it. That's probably sufficient?
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale)
-- commits, line 25 at r5:
Can you elaborate on this statement about MVCCIncrementalIterator? Won't we continue to have one tbi that sees both point and range keys and another regular iter that also sees both kinds of keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale)
Previously, sumeerbhola wrote…
Can you elaborate on this statement about MVCCIncrementalIterator? Won't we continue to have one tbi that sees both point and range keys and another regular iter that also sees both kinds of keys?
Yes, that's right. There will be no changes to MVCCIncrementalIterator
in that regard.
This comment refers to the subsequent SetOptions()
commit in this PR. Previously, TBIs were not reusable because we had no way to change the TableFilter
on an existing iterator. But now that we have SetOptions()
we can reuse iterators for TBIs.
However, pebbleBatch
and pebbleReadOnly
did not support creating multiple concurrent reusable iterators, and would panic with "iterator already in use" because of this check here:
https://github.com/cockroachdb/cockroach/blob/master/pkg/storage/pebble_batch.go#L231-L233
MVCCIncrementalIterator needs to create two concurrent iterators (one regular and one TBI). This used to work as a side-effect of TBIs not being reusable, since it bypassed that check in pebbleBatch
and created an entirely new iterator. But now that TBIs are reusable, it triggers that assertion.
This patch removes this artificial restriction of only allowing a single concurrent iterator, by cloning a new iterator if the primary one is already in use.
80991: backupccl: Cosmetic changes to backup_processor.go to improve clarity. r=benbardin a=benbardin Release note: None Renames a few things, and moves the sstsink into its own file. 81141: ui: add TTL metrics dashboard r=rafiss,matthewtodd a=otan Release note: None example: <img width="981" alt="image" src="https://user-images.githubusercontent.com/3646147/167403258-c3d546ab-a6af-4088-a090-87cd0ed8bbe7.png"> 81194: storage: remove `Engine.SetUpperBound` r=jbowens a=erikgrinaker Partial resubmit of #79291, which was merged into the `mvcc-range-tombstones` branch. --- **storage: make `Engine.ClearIterRange` construct iterator internally** `Engine.ClearIterRange` now creates the iterator itself, instead of having it passed in. This ensures the iterator is created with appropriate bounds and options, which becomes even more important with the introduction of range keys. It will also allow removal of `Engine.SetUpperBound` in a subsequent commit. If further flexibility is needed, we can consider passing an `IterOptions` struct with appropriate bounds instead of a key span. Release note: None **storage: remove `Engine.SetUpperBound`** This patch removes the `Engine.SetUpperBound` method, as it is no longer in use. Release note: None 81195: storage: allow multiple iterators in `pebbleReadOnly` and `pebbleBatch` r=jbowens a=erikgrinaker Partial resubmit of #79291, which was merged into the mvcc-range-tombstones branch. --- Previously, `pebbleReadOnly` and `pebbleBatch` only supported a single concurrent iterator of a given type, as this iterator was reused between `NewMVCCIterator` calls. Attempts to create an additional iterator would panic with "iterator already in use". Time-bound iterators were excluded from this reuse, and always created from scratch, so they sidestepped this limitation. However, we will soon make these iterators reusable too via a new `SetOptions()` method in Pebble. When TBIs also become reusable, this will cause `MVCCIncrementalIterator` to panic with "iterator already in use", because it always creates two concurrent iterators: one regular and one time-bound. This patch therefore ensures all `Reader` implementations support multiple concurrent iterators, by cloning the existing cached iterator if it is already in use. This also preserves the pinned engine state for the new iterator. Release note: None 81295: outliers: introduce a composite "any" detector r=matthewtodd a=matthewtodd This gives us a place to hang the upcoming histogram-based detector coming in #79451. Release note: None Co-authored-by: Ben Bardin <bardin@cockroachlabs.com> Co-authored-by: Oliver Tan <otan@cockroachlabs.com> Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com> Co-authored-by: Matthew Todd <todd@cockroachlabs.com>
Note to reviewers: this is for the
mvcc-range-tombstones
feature branch.storage: remove
MVCCIterator.SetUpperBound
and tweakClearIterRange
This patch removes the
MVCCIterator.SetUpperBound()
method, which wasonly used by
MVCCIterator.ClearIterRange()
.ClearIterRange
now also creates the iterator itself, instead of havingit passed in. This ensures the iterator is created with appropriate
bounds and options, which becomes even more important with the
introduction of range keys. If further flexibility is needed, we can
consider passing an
IterOptions
struct instead of a key span.Release note: None
storage: handle multiple iterators in
pebbleReadOnly
andpebbleBatch
Previously,
pebbleReadOnly
andpebbleBatch
only supported a singleconcurrent iterator of a given type. Attempts to create an additional
iterator would panic with "iterator already in use".
This is particularly problematic because a subsequent change will make
more iterators reusable (specifically TBI iterators), such that e.g.
MVCCIncrementalIterator
will be creating two concurrent reusableiterators and trigger this panic.
This patch supports multiple iterators by creating new cloned iterators
if the existing iterator is in use. This preserves the pinned engine
state for the cloned iterators too.
Release note: None
storage: use
SetOptions()
when reusing Pebble iteratorsIterator reuse now relies on
Pebble.SetOptions()
to configure thereused Pebble iterator. This allows a wider range of iterators to be
reused, since previously only the bounds could be changed on existing
iterators.
Release note: None
Jira issue: CRDB-14701