Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix names and nullability when creating RecordBatch from MapArray #1258

Merged
merged 3 commits into from
Apr 9, 2023

Conversation

balbok0
Copy link
Contributor

@balbok0 balbok0 commented Apr 3, 2023

Description

When creating a RecordBatch with one of the columns being a MapArray, there are issues with naming of sub-columns, as well as nullability of the "elements/key_value" sub-column.

Related Issue(s)

Documentation

See https://www.rustexplorer.com/b/o7bfm4 for a breaking example.
I initially thought that the issue is specifically for array with 0-length maps, but nullable mismatch happens even when all of the map elements have values.

@github-actions github-actions bot added binding/rust Issues for the Rust crate rust labels Apr 3, 2023
@rtyler
Copy link
Member

rtyler commented Apr 3, 2023

Thanks for the pull request! @balbok0 would you be able to take your failing example and turn that into a unit or integration test so we don't have this regression in the future?

@wjones127
Copy link
Collaborator

I’d like for us to fix this, but I also hope we can find a way to simultaneously support older PyArrow versions in the Python bindings.

@balbok0
Copy link
Contributor Author

balbok0 commented Apr 4, 2023

@wjones127 are there some specific pyarrow versions you have in mind? I wasn't able to find a specific arrow-cpp/pyarrow version that changed the naming scheme.

@balbok0
Copy link
Contributor Author

balbok0 commented Apr 5, 2023

Added failing example as a test.

Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks good. I was worried about PyArrow and other languages because the Arrow standard specifies field names as "key" not "keys" and "value" not "values" (arrow-rs deviates from this). But the PyArrow C Data Interface import actually ignores these field names on map types anyways, so it shouldn't matter that they are different. So we can ignore that worry.

I have a few small nits on the tests, but otherwise looks good to go.

rust/src/delta_arrow.rs Outdated Show resolved Hide resolved
rust/src/delta_arrow.rs Outdated Show resolved Hide resolved
@wjones127
Copy link
Collaborator

Oh and could you rebase? We just fixed some of the deprecation warnings.

@balbok0
Copy link
Contributor Author

balbok0 commented Apr 9, 2023

Thanks for comments! I applied changes and rebased. Lmk if there is anything else that needs to done before merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate rust
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Not matching column names when creating a RecordBatch from MapArray
3 participants