Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage/engine: lift Go MVCC implementation above Iterator interface #42210

Merged
merged 1 commit into from
Nov 13, 2019

Conversation

nvanbenschoten
Copy link
Member

This commit removes the MVCCGet and MVCCScan methods from engine.Iterator and uses the rest of the interface to implement these methods as free functions. This restructuring allows the MVCC operations to support polymorphism of the iterator, which is what the code was intending to do when originally written.

The code was moved to the current structure as a way of avoiding cgo calls when using RocksDB's iterator implementation. This is an important optimization when using RocksDB (but not Pebble) so the commit retains it through optional specializations of MVCCGet and MVCCScan. RocksDB's iterator implements this specialization but Pebble's does not need to.

This isn't quite ready for a review. I'm mainly pushing it in case others want to take a look. It will be used to get the prototype of #41720 up and running. Benchmarks show about a 0-1% performance regression due to this change. More testing should be done if we actually want to productionize this.

cc. @petermattis @itsbilal

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable overall, though I'm saddened about the need to switch from SeekLT to SeekReverse. Is it urgent to get this in? Perhaps someone on the Storage team can find time to change SeekReverse to SeekLT everywhere. (Or perhaps you can). There are only a few uses the last time I checked.

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @petermattis)


pkg/storage/engine/pebble_iterator.go, line 245 at r1 (raw file):

	// The seek key may have been past the last key. If so, position the
	// iterator at the last key.
	if !p.iter.Valid() && !p.iter.Last() {

I think this can be:

if !p.iter.Valid() {
  p.iter.Last()
  return
}

If the iterator isn't valid, we position it at the last key. Regardless of the validity of the iterator at that point, we return as there is nothing else to do.


pkg/storage/engine/pebble_iterator.go, line 251 at r1 (raw file):

	// The new key could either be greater or equal to the supplied key.
	// Backtrack one step if it is greater.
	p.keyBuf = EncodeKeyToBuf(p.keyBuf[:0], key)

We already encoded key to keyBuf in pebbleIterator.Seek. Perhaps that is using too much of an implementation detail. Rather than encoding key, it would probably be better to decode p.iter.Key() here as doing so doesn't involve a memory copy. I realize you're copying the existing pattern in this code, but I only just noticed this.


pkg/storage/engine/pebble_mvcc_scanner.go, line 76 at r2 (raw file):

	binary.LittleEndian.PutUint32(p.repr[startIdx:], uint32(len(value)))
	binary.LittleEndian.PutUint32(p.repr[startIdx+4:], uint32(lenKey))
	encodeKeyToBuf(p.repr[startIdx+kvLenSize:startIdx+kvLenSize+lenKey], key, lenKey)

I'm surprised this isn't any slower compared to the previous code which was just doing a copy.


pkg/storage/engine/pebble_mvcc_scanner.go, line 597 at r2 (raw file):

	// SeekReverse positions the iterator at the last key that is less than or
	// equal to key AND strictly less than IterOptions.UpperBound.
	p.parent.SeekReverse(key)

This is unfortunate. I really dislike the semantics of SeekReverse. I'd like to do a big rename of Iterator.Seek to Iterator.SeekGE and Iterator.SeekReverse to Iterator.SeekLT and change the semantics of SeekReverse to match pebble.Iterator.SeekLT.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are only a few uses the last time I checked.

Actually, outside of test code, there is only a single use of SeekReverse: StateLoader.LoadLastIndex.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @petermattis)

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Nov 12, 2019
This addresses a comment from cockroachdb#42210.

The commit also adjusts the reverse seek (now SeekLT) to be exclusive instead of
inclusive. This matches the API exposed by Pebble and eliminates the need for
the bug fix in 79b4c77.
@nvanbenschoten
Copy link
Member Author

Thanks for the review Peter.

Is it urgent to get this in?

Nope, I'm just using it for my lock-table prototype. Although I might try to push it through later this week since I'd rather it not rot and it's a good change on its own. I'll take a close look at the performance impact of the extra memcpys when I do so.

This is unfortunate. I really dislike the semantics of SeekReverse. I'd like to do a big rename of Iterator.Seek to Iterator.SeekGE and Iterator.SeekReverse to Iterator.SeekLT and change the semantics of SeekReverse to match pebble.Iterator.SeekLT.

Agreed. I had the same thought while making this change. It was a shame that the interface didn't line up with Pebble. How does #42390 look?

craig bot pushed a commit that referenced this pull request Nov 12, 2019
42390: storage/engine: rename Iterator.{Seek,SeekReverse} to {SeekGE,SeekLT} r=petermattis a=nvanbenschoten

This addresses a comment from #42210.

The commit also adjusts the reverse seek (now SeekLT) to be exclusive instead of
inclusive. This matches the API exposed by Pebble and eliminates the need for
the bug fix in 79b4c77.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased on #42390. Running some numbers now and looking at profiles to see if there's any perf we can win back.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @petermattis)


pkg/storage/engine/pebble_iterator.go, line 245 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I think this can be:

if !p.iter.Valid() {
  p.iter.Last()
  return
}

If the iterator isn't valid, we position it at the last key. Regardless of the validity of the iterator at that point, we return as there is nothing else to do.

No longer necessary.


pkg/storage/engine/pebble_iterator.go, line 251 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

We already encoded key to keyBuf in pebbleIterator.Seek. Perhaps that is using too much of an implementation detail. Rather than encoding key, it would probably be better to decode p.iter.Key() here as doing so doesn't involve a memory copy. I realize you're copying the existing pattern in this code, but I only just noticed this.

No longer necessary.


pkg/storage/engine/pebble_mvcc_scanner.go, line 76 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I'm surprised this isn't any slower compared to the previous code which was just doing a copy.

encodeKeyToBuf isn't really doing all that much more than a copy, but I agree, I'm surprised it's not at least a bit slower. Do you think it's worth adding an UnsafeRawKey method to the Iterator interface to address this? I'm somewhat opposed because it makes key translations in intermediate iterators harder to do.


pkg/storage/engine/pebble_mvcc_scanner.go, line 597 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

This is unfortunate. I really dislike the semantics of SeekReverse. I'd like to do a big rename of Iterator.Seek to Iterator.SeekGE and Iterator.SeekReverse to Iterator.SeekLT and change the semantics of SeekReverse to match pebble.Iterator.SeekLT.

Addressed elsewhere.

Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @petermattis)


pkg/storage/engine/pebble_mvcc_scanner.go, line 76 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

encodeKeyToBuf isn't really doing all that much more than a copy, but I agree, I'm surprised it's not at least a bit slower. Do you think it's worth adding an UnsafeRawKey method to the Iterator interface to address this? I'm somewhat opposed because it makes key translations in intermediate iterators harder to do.

I made just this portion of the change on master and it had the following impact:

name                                                old time/op    new time/op    delta
MVCCScan_Pebble/rows=10/versions=100/valueSize=8-8    52.8µs ± 2%    53.3µs ± 1%    ~     (p=0.123 n=10+10)

name                                                old speed      new speed      delta
MVCCScan_Pebble/rows=10/versions=100/valueSize=8-8  1.52MB/s ± 2%  1.50MB/s ± 1%  -0.99%  (p=0.074 n=10+10)

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @petermattis)


pkg/storage/engine/pebble_mvcc_scanner.go, line 597 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Addressed elsewhere.

I'm not seeing your changes here. Did you forget to push?

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/mvccIter branch from 56f99c4 to 2bde3fa Compare November 12, 2019 21:26
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @petermattis)


pkg/storage/engine/pebble_mvcc_scanner.go, line 597 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I'm not seeing your changes here. Did you forget to push?

No, I hadn't pushed yet. Done.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @petermattis)


pkg/storage/engine/pebble_mvcc_scanner.go, line 76 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I made just this portion of the change on master and it had the following impact:

name                                                old time/op    new time/op    delta
MVCCScan_Pebble/rows=10/versions=100/valueSize=8-8    52.8µs ± 2%    53.3µs ± 1%    ~     (p=0.123 n=10+10)

name                                                old speed      new speed      delta
MVCCScan_Pebble/rows=10/versions=100/valueSize=8-8  1.52MB/s ± 2%  1.50MB/s ± 1%  -0.99%  (p=0.074 n=10+10)

Ack. Thanks for checking. I'm ok with this size of an effect. This is sort of worst-case as the keys in the benchmark are small.


pkg/storage/engine/pebble_mvcc_scanner.go, line 330 at r3 (raw file):

		}
		p.keyBuf = EncodeKeyToBuf(p.keyBuf[:0], p.curMVCCKey())
		p.err = p.intents.Set(p.keyBuf, p.curValue, nil)

We don't expect to scan many intents, right? If we do, we could use the SetDeferred API to encode directly into the batch repr.


pkg/storage/engine/pebble_mvcc_scanner.go, line 596 at r3 (raw file):

	// SeekReverse positions the iterator at the last key that is less than or
	// equal to key AND strictly less than IterOptions.UpperBound.

This comment is now incorrect.

This commit removes the MVCCGet and MVCCScan methods from engine.Iterator and
uses the rest of the interface to implement these methods as free functions.
This restructuring allows the MVCC operations to support polymorphism of the
iterator, which is what the code was intending to do when originally written.

The code was moved to the current structure as a way of avoiding cgo calls when
using RocksDB's iterator implementation. This is an important optimization when
using RocksDB (but not Pebble) so the commit retains it through optional
specializations of MVCCGet and MVCCScan. RocksDB's iterator implements this
specialization but Pebble's does not need to.

This isn't quite ready for a review. I'm mainly pushing it in case others want
to take a look. It will be used to get the prototype of cockroachdb#41720 up and running.
Benchmarks show about a 0-1% performance regression due to this change. More
testing should be done if we actually want to productionize this.
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/mvccIter branch from 2bde3fa to 876ceec Compare November 13, 2019 00:41
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten and @petermattis)


pkg/storage/engine/pebble_mvcc_scanner.go, line 330 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

We don't expect to scan many intents, right? If we do, we could use the SetDeferred API to encode directly into the batch repr.

No, scanning over an intent should be fairly rare and we don't expect to scan very many even when we do. And if we do find an intent then an extra memcopy is probably the least of our concerns, so I don't think it's worth complicating the code to avoid the copy.

For reference though, this is what that change would look like:

diff --git a/pkg/storage/engine/pebble_mvcc_scanner.go b/pkg/storage/engine/pebble_mvcc_scanner.go
index 20d38b9f75..e8b287171f 100644
--- a/pkg/storage/engine/pebble_mvcc_scanner.go
+++ b/pkg/storage/engine/pebble_mvcc_scanner.go
@@ -326,11 +326,13 @@ func (p *pebbleMVCCScanner) getAndAdvance() bool {
                        // that lie before the resume key.
                        return false
                }
-               p.keyBuf = EncodeKeyToBuf(p.keyBuf[:0], p.curMVCCKey())
-               p.err = p.intents.Set(p.keyBuf, p.curValue, nil)
-               if p.err != nil {
-                       return false
-               }
+               curKey := p.curMVCCKey()
+               curKeyLen := curKey.Len()
+               deferredOp := p.intents.SetDeferred(curKeyLen, len(p.curValue))
+               encodeKeyToBuf(deferredOp.Key, curKey, curKeyLen)
+               copy(deferredOp.Value, p.curValue)
+               // NB: the batch is not indexed, obviating the need to call
+               // deferredOp.Finish.

                return p.seekVersion(prevTS, false)
        }
@@ -341,11 +343,14 @@ func (p *pebbleMVCCScanner) getAndAdvance() bool {
                // intent. Note that this will trigger an error on the Go
                // side. We continue scanning so that we can return all of the
                // intents in the scan range.
-               p.keyBuf = EncodeKeyToBuf(p.keyBuf[:0], p.curMVCCKey())
-               p.err = p.intents.Set(p.keyBuf, p.curValue, nil)
-               if p.err != nil {
-                       return false
-               }
+               curKey := p.curMVCCKey()
+               curKeyLen := curKey.Len()
+               deferredOp := p.intents.SetDeferred(curKeyLen, len(p.curValue))
+               encodeKeyToBuf(deferredOp.Key, curKey, curKeyLen)
+               copy(deferredOp.Value, p.curValue)
+               // NB: the batch is not indexed, obviating the need to call
+               // deferredOp.Finish.
+
                return p.advanceKey()
        }

pkg/storage/engine/pebble_mvcc_scanner.go, line 596 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

This comment is now incorrect.

Good catch, done.

craig bot pushed a commit that referenced this pull request Nov 13, 2019
42210: storage/engine: lift Go MVCC implementation above Iterator interface r=nvanbenschoten a=nvanbenschoten

This commit removes the `MVCCGet` and `MVCCScan` methods from `engine.Iterator` and uses the rest of the interface to implement these methods as free functions. This restructuring allows the MVCC operations to support polymorphism of the iterator, which is what the code was intending to do when originally written.

The code was moved to the current structure as a way of avoiding cgo calls when using RocksDB's iterator implementation. This is an important optimization when using RocksDB (but not Pebble) so the commit retains it through optional specializations of `MVCCGet` and `MVCCScan`. RocksDB's iterator implements this specialization but Pebble's does not need to.

This isn't quite ready for a review. I'm mainly pushing it in case others want to take a look. It will be used to get the prototype of #41720 up and running. Benchmarks show about a 0-1% performance regression due to this change. More testing should be done if we actually want to productionize this.

cc. @petermattis @itsbilal 

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig
Copy link
Contributor

craig bot commented Nov 13, 2019

Build succeeded

@craig craig bot merged commit 876ceec into cockroachdb:master Nov 13, 2019
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/mvccIter branch December 27, 2019 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants