Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support list_view_array & large_list_view layout and basic construction #5576

Closed
wants to merge 5 commits into from

Conversation

Kikkon
Copy link
Contributor

@Kikkon Kikkon commented Mar 31, 2024

Which issue does this PR close?

Closes #5501 .

Rationale for this change

What changes are included in this PR?

Add the basic implementation of ListView and LargeListView, as well as unit testing.

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Mar 31, 2024
@Kikkon Kikkon changed the title Support live_view_array layout and basic construction Support live_view_array & large_list_view layout and basic construction Mar 31, 2024
@Kikkon Kikkon force-pushed the feat/impl_list_view branch from 095a678 to 677bd2e Compare March 31, 2024 14:33
use std::ops::Deref;

#[derive(Debug, Clone)]
pub struct SizeBuffer<O: ArrowNativeType>(ScalarBuffer<O>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this really adds anything over ScalarBuffer?

Comment on lines 475 to 476
let value_offsets = unsafe { get_offsets(&data) };
let value_sizes = unsafe { get_sizes(&data) };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unsound as the OffsetBuffer constraints are not applicable to view types

assert_eq!(list_array.len(), 0)
}

#[test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we possibly use the constructors in these tests instead of ArrayData, this should be much easier to follow and maintain

DataType::ListView
};

/// Returns the data type of the list view array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears to be incorrect

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this, I only took a quick look as I am very far behind on reviews from being away and then sick, but this mostly looks good. I think the major changes needed are to remove the uses of OffsetBuffer and probably remove SizeBuffer as well, and just use ScalarBuffer.

@tustvold tustvold changed the title Support live_view_array & large_list_view layout and basic construction Support list_view_array & large_list_view layout and basic construction Apr 3, 2024
@Kikkon
Copy link
Contributor Author

Kikkon commented Apr 4, 2024

@tustvold 🚀 Thank you very much for your review, I will make the modifications based on the comments later this week.

@Kikkon Kikkon force-pushed the feat/impl_list_view branch from e26ac34 to 955eb2b Compare April 6, 2024 13:56
@Kikkon Kikkon requested a review from tustvold April 6, 2024 15:05
@Kikkon
Copy link
Contributor Author

Kikkon commented Apr 10, 2024

Thank you for this, I only took a quick look as I am very far behind on reviews from being away and then sick, but this mostly looks good. I think the major changes needed are to remove the uses of OffsetBuffer and probably remove SizeBuffer as well, and just use ScalarBuffer.

@tustvold If you have some free time, perhaps you could help review it again. 😄

@tustvold
Copy link
Contributor

It is on my list but I probably won't get to it until Friday

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the slow review, I am a little concerned that this PR is very large and appears to copy-pasta a lot of the ListArray code incorrectly.

Perhaps we might split this up into smaller pieces, something like:

  • Add basic ArrayData construction and validation
  • Add ArrayData equality
  • Add GenericListViewArray
  • Add GenericListViewBuilder

As it stands it is quite hard to guage what is tested and what isn't, and there are definitely a number of code paths that are simply incorrect

arrow-array/src/array/list_view_array.rs Outdated Show resolved Hide resolved
)));
}
}
if len != sizes.len() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test should be moved ahead of the iteration above

/// Returns ith value of this list view array.
/// # Safety
/// Caller must ensure that the index is within the array bounds
pub unsafe fn value_unchecked(&self, i: usize) -> ArrayRef {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method does not appear to be correct (and could probably do with a test)

arrow-array/src/array/list_view_array.rs Outdated Show resolved Hide resolved
}

fn is_empty(&self) -> bool {
self.value_offsets.len() <= 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not appear to be correct

///
/// Panics if the length of [`Self::values`] exceeds `OffsetSize::MAX`
#[inline]
pub fn append(&mut self, is_valid: bool, size: usize) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size should be inferred from the length of the values_builder, much like it is for GenericListBuilder

Comment on lines +105 to +109
/// Create a new [`ScalarBuffer`] containing a single 0 value
pub fn new_empty() -> Self {
let buffer = MutableBuffer::from_len_zeroed(std::mem::size_of::<T>());
Self::from(buffer.into_buffer())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is not necessary, and is incorrect as it creates a ScalarBuffer that isn't empty

/// Returns a reference to the data in `buffer` as a typed slice
/// after validating. The returned slice is guaranteed to have at
/// least `len` entries.
fn typed_sizes<T: ArrowNativeType + num::Num>(&self) -> Result<&[T], ArrowError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method isn't necessary

@@ -937,11 +972,23 @@ impl ArrayData {
self.validate_offsets::<i32>(values_data.len)?;
Ok(())
}
DataType::ListView(field) => {
let values_data = self.get_single_valid_child_data(field.data_type())?;
self.validate_offsets::<i32>(values_data.len)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This validates the offsets w.r.t to the ListArray semantics, this is incorrect

@@ -929,6 +944,26 @@ impl ArrayData {
Ok(())
}

fn validate_sizes<T: ArrowNativeType + num::Num + std::fmt::Display>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be incorrect

@Kikkon
Copy link
Contributor Author

Kikkon commented Apr 15, 2024

Sorry for the slow review, I am a little concerned that this PR is very large and appears to copy-pasta a lot of the ListArray code incorrectly.

Perhaps we might split this up into smaller pieces, something like:

  • Add basic ArrayData construction and validation
  • Add ArrayData equality
  • Add GenericListViewArray
  • Add GenericListViewBuilder

As it stands it is quite hard to guage what is tested and what isn't, and there are definitely a number of code paths that are simply incorrect

@tustvold Sounds good. I'll split it into several smaller pull requests separately. I'll ping you again once I'm ready.

@Kikkon Kikkon marked this pull request as draft April 15, 2024 11:03
@Kikkon Kikkon closed this Apr 18, 2024
@Kikkon Kikkon deleted the feat/impl_list_view branch April 18, 2024 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ListViewArray and LargeListViewArray implementation and layout and basic construction
2 participants