-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add append_many to dictionary arrays to allow adding repeated values #6534
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if you've given thought to instead providing a way to manually append keys and manually add/lookup values in the dictionary? I think this would be more flexible, and would avoid needing to add nullable variants as an added bonus. The append_key method would then be an obvious candidate to be made inline
pub fn append(&mut self, value: impl AsRef<T::Native>) -> Result<K::Native, ArrowError> { | ||
self.append_many(value, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could instead extract the entry logic into a private member function, I'd be less concerned about potential performance regressions that way, LLVM does funny things with integer loops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
@@ -208,7 +208,21 @@ where | |||
/// value is appended to the values array. | |||
/// | |||
/// Returns an error if the new index would overflow the key type. | |||
#[inline] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation for this, is there some benchmark you've used? Given how problematic dictionary codegen already is, because everything is generated for each key-value combination, I'm rather wary of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to remove it
pub fn append_option_many(&mut self, value: Option<impl AsRef<T::Native>>, count: usize) { | ||
match value { | ||
None => { | ||
for _ in 0..count { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.rs/arrow-array/latest/arrow_array/builder/struct.PrimitiveBuilder.html#method.append_nulls will be significantly faster
@tustvold inspired the the existence of https://docs.rs/arrow-array/latest/arrow_array/builder/struct.PrimitiveBuilder.html#method.append_nulls I added methods such that we can now push the repetition all the way down into the buffers, which I think is as efficient as it's going to get. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API looks good to me, just some merge conflicts and one redundant API addition
arrow-buffer/src/builder/null.rs
Outdated
/// Appends `n` `true`s into the builder | ||
/// to indicate that these `n` items are not null. | ||
#[inline] | ||
pub fn append_n_non_null(&mut self, n: usize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this method already exists a few lines up
82c74d0
to
877e44a
Compare
Merge conflicts fixed and API removed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a slight naming nit, otherwise looks good
@@ -202,6 +202,13 @@ impl<T: ArrowPrimitiveType> PrimitiveBuilder<T> { | |||
self.values_builder.append(v); | |||
} | |||
|
|||
/// Appends a value of type `T` into the builder `n` times | |||
#[inline] | |||
pub fn append_value_repeated(&mut self, v: T::Native, n: usize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is kind of sad that the naming of this can't be consistent, but I don't really have any better suggestions. Perhaps append_value_n
??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I noticed the names are getting a bit messy and mixed up. I'm happy to go with that suggestion.
I never addressed this:
|
Closes #6529