-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Turn all Binary/Utf8 into BinaryView/Utf8View in Parquet #18331
refactor: Turn all Binary/Utf8 into BinaryView/Utf8View in Parquet #18331
Conversation
Can we eventually completely go to binview you think? |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #18331 +/- ##
==========================================
+ Coverage 79.78% 79.88% +0.09%
==========================================
Files 1497 1495 -2
Lines 200424 200199 -225
Branches 2844 2867 +23
==========================================
+ Hits 159916 159925 +9
+ Misses 39983 39728 -255
- Partials 525 546 +21 ☔ View full report in Codecov by Sentry. |
I don't necessarily see a reason why not |
That would save unneeded casts and remove a lot of code. |
I converted everything now to use the |
// Verify the invariants | ||
#[cfg(debug_assertions)] | ||
{ | ||
// @TODO: Enable this. This is currently bugged with concatenate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is our information incorrect? Where does this happen? Should I fix this is in a seperate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is happening in the legacy::concatenate
kernel.
)? | ||
.collect_n(filter)? | ||
}, | ||
// These are all converted to View variants before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@@ -618,6 +651,9 @@ impl MutableBinaryViewArray<[u8]> { | |||
let buffer_idx = self.completed_buffers().len() as u32; | |||
let in_progress_buffer_offset = self.in_progress_buffer.len(); | |||
|
|||
self.total_bytes_len += sum_length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should total_bytes_len
not be added outside of the branch?
@@ -639,6 +675,8 @@ impl MutableBinaryViewArray<[u8]> { | |||
view | |||
})); | |||
} else if max_length <= View::MAX_INLINE_SIZE as usize { | |||
self.total_bytes_len += sum_length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see you do that here. Maybe we can do it only once before the branches.
7127577
to
9dbf0ba
Compare
No description provided.