Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support substrait serialization for ScalarValue::Utf8View and ScalarValue::BinaryView #12118

Closed
Tracked by #11752
alamb opened this issue Aug 22, 2024 · 5 comments · Fixed by #12199
Closed
Tracked by #11752

Support substrait serialization for ScalarValue::Utf8View and ScalarValue::BinaryView #12118

alamb opened this issue Aug 22, 2024 · 5 comments · Fixed by #12199
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Aug 22, 2024

Is your feature request related to a problem or challenge?

Part of #11752

We are trying to enable DataFusion to use StringViewArray by default. If we do that it means ScalarValue::Utf8View and ScalarValue::BinaryView will be more likely to be used in plans

Describe the solution you'd like

Thus we need to ensure ScalarValue::Utf8View and ScalarValue::BinaryView can be serialized using datafusion substrait

Describe alternatives you've considered

I recommend adding coverage for ScalarValue::Utf8View and `ScalarValue::BinaryView to the tests here

round_trip_type(DataType::Utf8)?;
round_trip_type(DataType::LargeUtf8)?;

And then update the code to get the tests to pass

Additional context

No response

@alamb alamb added the enhancement New feature or request label Aug 22, 2024
@alamb alamb changed the title ddd Support substrait serialization for ScalarValue::Utf8View and ScalarValue::BinaryView Aug 22, 2024
@alamb
Copy link
Contributor Author

alamb commented Aug 22, 2024

@XiangpengHao has a draft pr here #11898

@alamb
Copy link
Contributor Author

alamb commented Aug 23, 2024

I think @wiedld is planning to look at this soon -- maybe @Blizzara will have some ideas if we run into how to map the types to/from the substrait types

@wiedld
Copy link
Contributor

wiedld commented Aug 23, 2024

take

@wiedld
Copy link
Contributor

wiedld commented Aug 27, 2024

There is a long discussion over here about the type system in substrait. The summary outcome (also reflected in the spec here) is that we have logical type "string" and different variations for the physical type (e.g. uft8 vs largeutf8 vs utf8view).

I've passing tests using this approach; want to check code coverage on a few things before putting up the PR (a.k.a. it looks like the largeutf8 variation was not fully implementation everywhere -- so I want to make sure we have full test coverage).

@Blizzara
Copy link
Contributor

There is a long discussion over here about the type system in substrait. The summary outcome (also reflected in the spec here) is that we have logical type "string" and different variations for the physical type (e.g. uft8 vs largeutf8 vs utf8view).

Yup, this aligns with my understanding - the PR looks good from cursory review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants