Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize allocations and copies for SortedRanges.insert when it is effectively an append #4189

Merged
merged 8 commits into from
Jul 17, 2023

Conversation

jcferretti
Copy link
Member

@jcferretti jcferretti commented Jul 14, 2023

Related to DHE https://github.com/deephaven-ent/iris/pull/651 and https://deephaven.atlassian.net/browse/DH-15310

The testing for this change also found a pre-existing bug in SortedRangesTyped.java.
The method ensureCanAppend is missing a check to ensure a packed range (eg for SortedRangesShort) doesn't overflow.
DB tests began failing because with the new code this old bug was being hit.

     @Override
     protected final SortedRanges ensureCanAppend(final int newLastPos, final long unpackedNewLastKey, final boolean writeCheck) {
         final long absUnpackedLastKey = Math.abs(unpackedNewLastKey);
+        if (!fitsForAppend(absUnpackedLastKey)) {
+            return null;
+        }

@jcferretti jcferretti added this to the July 2023 milestone Jul 14, 2023
@jcferretti jcferretti self-assigned this Jul 14, 2023
@jcferretti jcferretti enabled auto-merge (squash) July 14, 2023 17:01
Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some pedantic suggestions, an optimization question, and a suggestion for further optimization. Change looks correct as-is, however.

// we know other can't be empty.
if (!USE_RANGES_ARRAY || isEmpty() || last() < other.first()) {
if (last() < other.first()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can optimize the same way if our last range overlaps other's last range, right? Worth a little more complexity?

Copy link
Member Author

@jcferretti jcferretti Jul 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it but punted due to complexity; I'd like the delta to go to Bard and Genesis so on the enterprise PR I think that is a good tradeoff. We could just do more here in the community PR, and also port that to vermillion in enterprise once the Bard fix rolls forward to vermillion... but the truth is I shouldn't be working on any of this due to priorities, so I don't want to put another couple of hours just now. Is not obvious to me how common that additional optimization would be hit either; maybe one of you guys Ryan or Charles can make an argument from the perspective of how some particular operation works (eg, byExternal for merges, join in other cases or whatever).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a reasonable answer. I think the case you've optimized is common. I'm certain we could also handle the overlapping case at the cost of a bit more complexity, but I'm not sure that a single overlapping range is much more common than general overlapping.

Comment on lines 2148 to 2150
for (int otherPosToRead = otherFirstPosToRead; otherPosToRead < other.count; ++otherPosToRead) {
result.unpackedSet(ourPosToWrite++, other.unpackedGet(otherPosToRead));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a candidate for an array copy, but I assume it's written this way to address different offsets and different internal widths?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, because essentially we are not guaranteed that our two sortedRanges are of the same underlying "type" (eg if they are SortedRangesInt or SortedRangesShort, the unpack part may apply an offset and other fun).
We "could" [TM] optimize to array copy if both are SortedRangesLong (no offset to apply in that case).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. This is exactly what I expected.

jcferretti and others added 2 commits July 14, 2023 16:09
…rtedranges/SortedRanges.java

Co-authored-by: Ryan Caudy <rcaudy@gmail.com>
…rtedranges/SortedRanges.java

Co-authored-by: Ryan Caudy <rcaudy@gmail.com>
rcaudy
rcaudy previously approved these changes Jul 14, 2023
@jcferretti jcferretti merged commit c00b8cb into deephaven:main Jul 17, 2023
10 checks passed
@jcferretti jcferretti deleted the cfs-sr-mergeappend branch July 17, 2023 13:36
@github-actions github-actions bot locked and limited conversation to collaborators Jul 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants