-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage/engine: lift Go MVCC implementation above Iterator interface #42210
storage/engine: lift Go MVCC implementation above Iterator interface #42210
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable overall, though I'm saddened about the need to switch from SeekLT
to SeekReverse
. Is it urgent to get this in? Perhaps someone on the Storage team can find time to change SeekReverse
to SeekLT
everywhere. (Or perhaps you can). There are only a few uses the last time I checked.
Reviewed 1 of 1 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @petermattis)
pkg/storage/engine/pebble_iterator.go, line 245 at r1 (raw file):
// The seek key may have been past the last key. If so, position the // iterator at the last key. if !p.iter.Valid() && !p.iter.Last() {
I think this can be:
if !p.iter.Valid() {
p.iter.Last()
return
}
If the iterator isn't valid, we position it at the last key. Regardless of the validity of the iterator at that point, we return as there is nothing else to do.
pkg/storage/engine/pebble_iterator.go, line 251 at r1 (raw file):
// The new key could either be greater or equal to the supplied key. // Backtrack one step if it is greater. p.keyBuf = EncodeKeyToBuf(p.keyBuf[:0], key)
We already encoded key
to keyBuf
in pebbleIterator.Seek
. Perhaps that is using too much of an implementation detail. Rather than encoding key
, it would probably be better to decode p.iter.Key()
here as doing so doesn't involve a memory copy. I realize you're copying the existing pattern in this code, but I only just noticed this.
pkg/storage/engine/pebble_mvcc_scanner.go, line 76 at r2 (raw file):
binary.LittleEndian.PutUint32(p.repr[startIdx:], uint32(len(value))) binary.LittleEndian.PutUint32(p.repr[startIdx+4:], uint32(lenKey)) encodeKeyToBuf(p.repr[startIdx+kvLenSize:startIdx+kvLenSize+lenKey], key, lenKey)
I'm surprised this isn't any slower compared to the previous code which was just doing a copy.
pkg/storage/engine/pebble_mvcc_scanner.go, line 597 at r2 (raw file):
// SeekReverse positions the iterator at the last key that is less than or // equal to key AND strictly less than IterOptions.UpperBound. p.parent.SeekReverse(key)
This is unfortunate. I really dislike the semantics of SeekReverse
. I'd like to do a big rename of Iterator.Seek
to Iterator.SeekGE
and Iterator.SeekReverse
to Iterator.SeekLT
and change the semantics of SeekReverse
to match pebble.Iterator.SeekLT
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are only a few uses the last time I checked.
Actually, outside of test code, there is only a single use of SeekReverse
: StateLoader.LoadLastIndex
.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @petermattis)
This addresses a comment from cockroachdb#42210. The commit also adjusts the reverse seek (now SeekLT) to be exclusive instead of inclusive. This matches the API exposed by Pebble and eliminates the need for the bug fix in 79b4c77.
Thanks for the review Peter.
Nope, I'm just using it for my lock-table prototype. Although I might try to push it through later this week since I'd rather it not rot and it's a good change on its own. I'll take a close look at the performance impact of the extra memcpys when I do so.
Agreed. I had the same thought while making this change. It was a shame that the interface didn't line up with Pebble. How does #42390 look? |
42390: storage/engine: rename Iterator.{Seek,SeekReverse} to {SeekGE,SeekLT} r=petermattis a=nvanbenschoten This addresses a comment from #42210. The commit also adjusts the reverse seek (now SeekLT) to be exclusive instead of inclusive. This matches the API exposed by Pebble and eliminates the need for the bug fix in 79b4c77. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased on #42390. Running some numbers now and looking at profiles to see if there's any perf we can win back.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @petermattis)
pkg/storage/engine/pebble_iterator.go, line 245 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I think this can be:
if !p.iter.Valid() { p.iter.Last() return }
If the iterator isn't valid, we position it at the last key. Regardless of the validity of the iterator at that point, we return as there is nothing else to do.
No longer necessary.
pkg/storage/engine/pebble_iterator.go, line 251 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
We already encoded
key
tokeyBuf
inpebbleIterator.Seek
. Perhaps that is using too much of an implementation detail. Rather than encodingkey
, it would probably be better to decodep.iter.Key()
here as doing so doesn't involve a memory copy. I realize you're copying the existing pattern in this code, but I only just noticed this.
No longer necessary.
pkg/storage/engine/pebble_mvcc_scanner.go, line 76 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I'm surprised this isn't any slower compared to the previous code which was just doing a copy.
encodeKeyToBuf
isn't really doing all that much more than a copy, but I agree, I'm surprised it's not at least a bit slower. Do you think it's worth adding an UnsafeRawKey
method to the Iterator
interface to address this? I'm somewhat opposed because it makes key translations in intermediate iterators harder to do.
pkg/storage/engine/pebble_mvcc_scanner.go, line 597 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
This is unfortunate. I really dislike the semantics of
SeekReverse
. I'd like to do a big rename ofIterator.Seek
toIterator.SeekGE
andIterator.SeekReverse
toIterator.SeekLT
and change the semantics ofSeekReverse
to matchpebble.Iterator.SeekLT
.
Addressed elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @petermattis)
pkg/storage/engine/pebble_mvcc_scanner.go, line 76 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
encodeKeyToBuf
isn't really doing all that much more than a copy, but I agree, I'm surprised it's not at least a bit slower. Do you think it's worth adding anUnsafeRawKey
method to theIterator
interface to address this? I'm somewhat opposed because it makes key translations in intermediate iterators harder to do.
I made just this portion of the change on master
and it had the following impact:
name old time/op new time/op delta
MVCCScan_Pebble/rows=10/versions=100/valueSize=8-8 52.8µs ± 2% 53.3µs ± 1% ~ (p=0.123 n=10+10)
name old speed new speed delta
MVCCScan_Pebble/rows=10/versions=100/valueSize=8-8 1.52MB/s ± 2% 1.50MB/s ± 1% -0.99% (p=0.074 n=10+10)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @petermattis)
pkg/storage/engine/pebble_mvcc_scanner.go, line 597 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Addressed elsewhere.
I'm not seeing your changes here. Did you forget to push?
56f99c4
to
2bde3fa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @petermattis)
pkg/storage/engine/pebble_mvcc_scanner.go, line 597 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I'm not seeing your changes here. Did you forget to push?
No, I hadn't pushed yet. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @petermattis)
pkg/storage/engine/pebble_mvcc_scanner.go, line 76 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I made just this portion of the change on
master
and it had the following impact:name old time/op new time/op delta MVCCScan_Pebble/rows=10/versions=100/valueSize=8-8 52.8µs ± 2% 53.3µs ± 1% ~ (p=0.123 n=10+10) name old speed new speed delta MVCCScan_Pebble/rows=10/versions=100/valueSize=8-8 1.52MB/s ± 2% 1.50MB/s ± 1% -0.99% (p=0.074 n=10+10)
Ack. Thanks for checking. I'm ok with this size of an effect. This is sort of worst-case as the keys in the benchmark are small.
pkg/storage/engine/pebble_mvcc_scanner.go, line 330 at r3 (raw file):
} p.keyBuf = EncodeKeyToBuf(p.keyBuf[:0], p.curMVCCKey()) p.err = p.intents.Set(p.keyBuf, p.curValue, nil)
We don't expect to scan many intents, right? If we do, we could use the SetDeferred
API to encode directly into the batch repr.
pkg/storage/engine/pebble_mvcc_scanner.go, line 596 at r3 (raw file):
// SeekReverse positions the iterator at the last key that is less than or // equal to key AND strictly less than IterOptions.UpperBound.
This comment is now incorrect.
This commit removes the MVCCGet and MVCCScan methods from engine.Iterator and uses the rest of the interface to implement these methods as free functions. This restructuring allows the MVCC operations to support polymorphism of the iterator, which is what the code was intending to do when originally written. The code was moved to the current structure as a way of avoiding cgo calls when using RocksDB's iterator implementation. This is an important optimization when using RocksDB (but not Pebble) so the commit retains it through optional specializations of MVCCGet and MVCCScan. RocksDB's iterator implements this specialization but Pebble's does not need to. This isn't quite ready for a review. I'm mainly pushing it in case others want to take a look. It will be used to get the prototype of cockroachdb#41720 up and running. Benchmarks show about a 0-1% performance regression due to this change. More testing should be done if we actually want to productionize this.
2bde3fa
to
876ceec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten and @petermattis)
pkg/storage/engine/pebble_mvcc_scanner.go, line 330 at r3 (raw file):
Previously, petermattis (Peter Mattis) wrote…
We don't expect to scan many intents, right? If we do, we could use the
SetDeferred
API to encode directly into the batch repr.
No, scanning over an intent should be fairly rare and we don't expect to scan very many even when we do. And if we do find an intent then an extra memcopy is probably the least of our concerns, so I don't think it's worth complicating the code to avoid the copy.
For reference though, this is what that change would look like:
diff --git a/pkg/storage/engine/pebble_mvcc_scanner.go b/pkg/storage/engine/pebble_mvcc_scanner.go
index 20d38b9f75..e8b287171f 100644
--- a/pkg/storage/engine/pebble_mvcc_scanner.go
+++ b/pkg/storage/engine/pebble_mvcc_scanner.go
@@ -326,11 +326,13 @@ func (p *pebbleMVCCScanner) getAndAdvance() bool {
// that lie before the resume key.
return false
}
- p.keyBuf = EncodeKeyToBuf(p.keyBuf[:0], p.curMVCCKey())
- p.err = p.intents.Set(p.keyBuf, p.curValue, nil)
- if p.err != nil {
- return false
- }
+ curKey := p.curMVCCKey()
+ curKeyLen := curKey.Len()
+ deferredOp := p.intents.SetDeferred(curKeyLen, len(p.curValue))
+ encodeKeyToBuf(deferredOp.Key, curKey, curKeyLen)
+ copy(deferredOp.Value, p.curValue)
+ // NB: the batch is not indexed, obviating the need to call
+ // deferredOp.Finish.
return p.seekVersion(prevTS, false)
}
@@ -341,11 +343,14 @@ func (p *pebbleMVCCScanner) getAndAdvance() bool {
// intent. Note that this will trigger an error on the Go
// side. We continue scanning so that we can return all of the
// intents in the scan range.
- p.keyBuf = EncodeKeyToBuf(p.keyBuf[:0], p.curMVCCKey())
- p.err = p.intents.Set(p.keyBuf, p.curValue, nil)
- if p.err != nil {
- return false
- }
+ curKey := p.curMVCCKey()
+ curKeyLen := curKey.Len()
+ deferredOp := p.intents.SetDeferred(curKeyLen, len(p.curValue))
+ encodeKeyToBuf(deferredOp.Key, curKey, curKeyLen)
+ copy(deferredOp.Value, p.curValue)
+ // NB: the batch is not indexed, obviating the need to call
+ // deferredOp.Finish.
+
return p.advanceKey()
}
pkg/storage/engine/pebble_mvcc_scanner.go, line 596 at r3 (raw file):
Previously, petermattis (Peter Mattis) wrote…
This comment is now incorrect.
Good catch, done.
42210: storage/engine: lift Go MVCC implementation above Iterator interface r=nvanbenschoten a=nvanbenschoten This commit removes the `MVCCGet` and `MVCCScan` methods from `engine.Iterator` and uses the rest of the interface to implement these methods as free functions. This restructuring allows the MVCC operations to support polymorphism of the iterator, which is what the code was intending to do when originally written. The code was moved to the current structure as a way of avoiding cgo calls when using RocksDB's iterator implementation. This is an important optimization when using RocksDB (but not Pebble) so the commit retains it through optional specializations of `MVCCGet` and `MVCCScan`. RocksDB's iterator implements this specialization but Pebble's does not need to. This isn't quite ready for a review. I'm mainly pushing it in case others want to take a look. It will be used to get the prototype of #41720 up and running. Benchmarks show about a 0-1% performance regression due to this change. More testing should be done if we actually want to productionize this. cc. @petermattis @itsbilal Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Build succeeded |
This commit removes the
MVCCGet
andMVCCScan
methods fromengine.Iterator
and uses the rest of the interface to implement these methods as free functions. This restructuring allows the MVCC operations to support polymorphism of the iterator, which is what the code was intending to do when originally written.The code was moved to the current structure as a way of avoiding cgo calls when using RocksDB's iterator implementation. This is an important optimization when using RocksDB (but not Pebble) so the commit retains it through optional specializations of
MVCCGet
andMVCCScan
. RocksDB's iterator implements this specialization but Pebble's does not need to.This isn't quite ready for a review. I'm mainly pushing it in case others want to take a look. It will be used to get the prototype of #41720 up and running. Benchmarks show about a 0-1% performance regression due to this change. More testing should be done if we actually want to productionize this.
cc. @petermattis @itsbilal