Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial set of commits #1

Merged
merged 284 commits into from
May 26, 2022
Merged

Initial set of commits #1

merged 284 commits into from
May 26, 2022

Conversation

jvanstraten
Copy link
Collaborator

Replaces substrait-io/substrait#155, refer to that thread for details.

jacques-n and others added 30 commits May 9, 2022 12:33
- remove timestamp_tz parameter
- add uuid (exists in iceberg & trino)
- add more range information for types.
- move simple, compound and variations to wip status
- Clarify interval ranges and change interval arrow type.

Co-authored-by: Weston Pace <weston.pace@gmail.com>
* initial example of embedded protobuf
* bump mkdocs-protobuf version
- Add boolean type and literals
- Add a typed null literal
- Fix map type definition
- Add java option to generate multiple files.
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Updates to ideally support majority of tpch queries

- Remove aggregate expressions type from generalized expressions. (only allow aggregate expressions as root expressions for aggregation)
- Update function mapping to support options
- Remove named structs from type unions (should only be used in special places as root, not in arbitrary hierarchy)
- Add project, join, fetch, aggregate, sort, set logical relational operations.
- Introduce key scalar and aggregate functions in functions yaml.
- Remove old extensions docs
- Add nullability handling and type parsing syntax.

Address substrait-io#42, substrait-io#43, substrait-io#44
…trait-io#49)

* Proposed updates to support required and optional enumerations.
* Apply suggestions from code review

Co-authored-by: Weston Pace <weston.pace@gmail.com>
* Add List literal message type

* Add list field to oneof literal
Various YAML cleanups:

* Remove duplicate anchor
* Make whitespace consistent
* Add required space after map colon
* Make spacing more consistent
* Add yamllint config
* Add yamllint job to PR workflow
* Format aggregate_functions yaml
* Format extension_types yaml
* Format scalar_functions yaml
* Remove unnecessary brackets
* Format type_variations
* Use slightly more compact repr for some types fields
* Increase yaml line length restriction to 120
* Fix mkdocs.yml
* Wrap description
* Revert back to bracket syntax
* Unwrap lines
* Overriding the C# namespace as Io.Substrait is an odd namespace (Io is not an organization)

* Added Substrait.Protobuf C# namespace to more files
The join type enum is in JoinRel but not defined as an actual field in there.
* Update functions to flatter format.

This closes substrait-io#65
- Change the slack invite link to be a variable within mkdocs.yml
- Update the link for another 30 days.

This closes substrait-io#78
* Add NamedRel

* Move NamedRel out and rename to RelRoot

* Fix doc

* Fix numbering

* Remove newline

* Fix reuse bug
* Add schema for yaml extensions

- update existing extensions to validate against schema
- add an example "unknown" extension to illustrate simplified possibilities.

* Add github job to validate yaml extensions.

* fix gh workflow

* Address review comments.

* remove extraneous workflow op.
* Clarify/solidify extensions

- Clarify extensions including documentation, updated binary representation, distinction between simple and advanced.
- Provide documentation on serialization and more serialization examples, including details around surrogate keys and pointers
- Update all fields that are anchors/reference to use uint32 as opposed to complex structures. Update field names to better clarify behavior
- Add extension capabilities to protobuf for advanced extensions (along with supporting documentation)
- Update type variations proto definition and extension definition pattern to be consistent with other extension types.
- Reorder binary serialization to be first in nav given it's greater maturity over text.
- Remove HintKeyValue and replace with AdvancedExtension use.

Co-authored-by: Weston Pace <weston.pace@gmail.com>
* Add proto format CI check
* Fix formatting

Co-authored-by: Jacques Nadeau <jacques@apache.org>
* Minor expression cleanups

- Add java package name for all proto files
- Add an explicit CAST expression
- Update the varchar literal to specify it's length
- Move AggregateFunction, AggregationPhase and SortField out of the Expression message (top-level in expression.proto).
- Remove AggregationPhase from aggregate rel since it is specified on the individual aggregation function bindings
- Add support for a filter as part of an aggregation measure (SQL2003)
- Cleanup some markdown linebreaks
- Add structure for a return program (complex return operations)
…ubstrait-io#98)

* Rename Switch/If expressions to reflect their use case better

Also use `Literal` in case value to restrict according to the use case

* Use Literal for SwitchExpression if field
@jacques-n
Copy link
Contributor

@jvanstraten, looks like this is breaking the main build. Can you see if you can address?

@jvanstraten
Copy link
Collaborator Author

jvanstraten commented May 26, 2022

Seems unrelated to CI or anything the merge did; instead it looks like protobuf broke the world again. Google's idea of code generator vs runtime version is not dissimilar from "any color, as long as it's black," and I guess they've diverged, at least on CI. I could work around it pretty easily, but it's not going to fix the root cause, especially if we want to actually push the validator to PyPI; the generator version would be whatever is on CI, whereas the runtime version is whatever the user happens to have installed on their system, so 💥 unless the stars align. I'll have to think about this, I guess.

ETA info dump:

  • Python protobuf went from 3.20.1 to 4.21.0, incrementing both major and minor. That threw me for a loop. So I guess that's not as bad as I thought; they at least marked this as a breaking change. A naive but correct solution, I suppose, would thus be to pin the Python dependencies to version 3.x.x. But I'd rather stay up-to-date.

  • For simplicity to users building from source, all code generation has been relying on a pre-built version of protoc shipped with the prost-build rust crate. I know this is complete abuse of prost-build, but it was the most user-friendly solution I could come up with because it just worked.

    • prost-build was pinned to version 0.9, which shipped with protoc 3.15.8.
    • Python protobuf 4.21.0 requires at least protoc 3.19.0 (thankfully, they don't seem to require an exact version match in Python, like they are in C++).
    • prost-build and friends have been updated to 0.10 since I pinned it. Now, however, they no longer ship pre-built binaries, instead requiring a fully functional C++/CMake toolchain to compile from scratch during the Rust build (though it just uses the system protoc if one is found). So upgrading to 0.10 would mean that the dependencies for building the Rust validator explode from just needing stable Rust to also needing a full C++ toolchain, or having protoc in $PATH. However, the shipped sources are for 3.19.4, so they should at least be good enough for now.
    • prost-build is working on removing their dependency on protoc entirely.

    tl;dr updating prost-build to 0.10 would be a serious regression in build dependencies (though arguably someone working with Substrait will probably already have an up-to-date protobuf install), and beyond 0.10 (once released) will break this workflow entirely.

  • Guess I'll go with upgrading prost-build to 0.10 for now. Beyond that I guess I'll either have to roll my own platform detection + protoc binary download, or make this the user's problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.