-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to Slice 2.0 #18471
Update to Slice 2.0 #18471
Conversation
e27c294
to
538d329
Compare
ReadBuffer buffer = buffers[0]; | ||
int shortsRemaining = length; | ||
while (shortsRemaining > 0) { | ||
ensureReadable(min(Long.BYTES, shortsRemaining * Short.BYTES)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand this line. If there are, say, 100 shorts remaining, this would do:
ensureReadable(min(8, 200))
-> ensureReadable(8)
Also, what's the significance of Long.BYTES
here? Why would this method care about having at least 8 readable bytes, assuming that's the intention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely agree. This is the style of the other methods in this class I had to ask @arhimondr how this code works, and this is what he said:
Hmm, it does look weird. I need to read it carefully to tell you more.
It basically tells that "at least" 8 bytes (or less if remaining bytes is less than 8" should be available
If not available - it will read an entire encryption / compression block
I think we all agree it need some renaming
int currentIndex = sourceIndex; | ||
int shortsRemaining = length; | ||
while (shortsRemaining > 0) { | ||
ensureCapacityFor(min(Long.BYTES, shortsRemaining * Short.BYTES)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
return XxHash64.hash(Slices.wrappedIntArray(positionLinks)); | ||
long hash = 0; | ||
for (int positionLink : positionLinks) { | ||
hash = XxHash64.hash(hash, positionLink); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably doesn't matter, but be aware that this is not equivalent to the old code in terms of semantics and performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add an update()
method that accepts a long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I agree to both comments. The code should be fine as this is only used during spilling, and only used as a checksum.
return createSlicesBlock(IPADDRESS, generateListWithNulls(positionCount, nullRate, () -> randomSlice(random, 16))); | ||
} | ||
|
||
private static Slice randomSlice(Random random, int length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add this to Slice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add this to Slices. I tooks a quick look at the code base, and there are like 20ish places that do this.
// there might be a value split across the buffers | ||
Slice slice = null; | ||
if (remaining != 0) { | ||
slice = Slices.allocate(Integer.BYTES); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to cache this one-int buffer (in the class, or for the whole method) to avoid potentially reallocating on every boundary? Not sure how often remaining < Integer.BYTES
is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added tempBuffer8
as a field variable and use it in int[] and long[] methods
@@ -414,19 +413,19 @@ public static long hash64(Slice data) | |||
long k = 0; | |||
switch (data.length() - current) { | |||
case 7: | |||
k ^= ((long) UnsafeSlice.getByteUnchecked(data, current + 6) & 0xff) << 48; | |||
k ^= ((long) data.getByteUnchecked(current + 6) & 0xff) << 48; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slice.getByte()
and Slice.getByteUnchecked()
should be equivalent now, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getByte
performs a bounds check on the Slice's view of the underlying byte[].
lib/trino-orc/src/main/java/io/trino/orc/metadata/statistics/BloomFilter.java
Outdated
Show resolved
Hide resolved
return XxHash64.hash(Slices.wrappedIntArray(positionLinks)); | ||
long hash = 0; | ||
for (int positionLink : positionLinks) { | ||
hash = XxHash64.hash(hash, positionLink); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add an update()
method that accepts a long
Description
This PR updates to the proposed Slice 2.0 apis.
This PR is based on #18461
Release notes
(x) This is not user-visible or docs only and no release notes are required.