Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: remove casts of structs to record batch #2033

Merged
merged 3 commits into from
Jan 6, 2024

Conversation

Blajda
Copy link
Collaborator

@Blajda Blajda commented Jan 4, 2024

Description

Fixes an issue where the writer attempts to convert a Arrow Struct into a RecordBatch. This cannot be done since it will drop the validity array and would prevents structs with a value of null from being stored correctly.

This PR also extends the predicate representation for struct field access, list index access, and list range access.

Related Issue(s)

@github-actions github-actions bot added binding/rust Issues for the Rust crate crate/core labels Jan 4, 2024
@Blajda Blajda changed the title fix!: remove casts of structs to record batch fix: remove casts of structs to record batch Jan 4, 2024
@Blajda Blajda marked this pull request as ready for review January 4, 2024 03:16
Copy link
Member

@rtyler rtyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar enough with this code so I would like to ask @roeap to help here 😢

@liamphmurphy
Copy link
Contributor

Hi all, I reported the linked issue. I pulled down the changes locally and can confirm delete's are now working for me as expected!

let col = batch.column_by_name(f.name()).unwrap();

.map(|field| {
let col = struct_array.column_by_name(field.name()).unwrap();
if let (DataType::Struct(_), DataType::Struct(child_fields)) =
Copy link
Collaborator

@ion-elgreco ion-elgreco Jan 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one branch missing I guess and that's if the data type is List(Struct), so perhaps we need to add, such kind of check:

else if (DataType::List(_), DataType::List(to_inner_field)) { 
    match to_inner_field.data_type()  {
         Struct(child_fields) => .... do struct casting here,
         _ => ... continue normal flow
}
}

sorry for awful formatting (doing this on a phone)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a follow up PR. Primary scope for this is to remove the panic / unsoundness that occurred with null-able structs. If a List<Struct> is provided and LHS does not match RHS then a proper error value is returned by cast_with_options.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright let's have this in a separate PR :)

Copy link
Collaborator

@ion-elgreco ion-elgreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just left one comment on a possible branch we may need to cover

@Blajda Blajda merged commit 815c965 into delta-io:main Jan 6, 2024
20 checks passed
r3stl355 pushed a commit to r3stl355/delta-rs that referenced this pull request Jan 10, 2024
# Description
Fixes an issue where the writer attempts to convert a Arrow `Struct`
into a `RecordBatch`. This cannot be done since it will drop the
validity array and would prevents structs with a value of `null` from
being stored correctly.

This PR also extends the predicate representation for struct field
access, list index access, and list range access.

# Related Issue(s)
- closes delta-io#2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate crate/core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rust panicking through Python library when a delete predicate uses a nullable field
4 participants