Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-16782: [Format] Add REE definitions to FlatBuffers #14176

Merged
merged 28 commits into from
Jan 12, 2023

Conversation

zagto
Copy link
Contributor

@zagto zagto commented Sep 20, 2022

No description provided.

@github-actions
Copy link

@zeroshade zeroshade marked this pull request as ready for review December 19, 2022 19:44
@zeroshade zeroshade changed the title ARROW-16782: [Format] Add RLE definitions to FlatBuffers ARROW-16782: [Format] Add REE definitions to FlatBuffers Dec 19, 2022
@zeroshade zeroshade requested review from rok, pitrou and kou December 20, 2022 15:34
@kou kou requested review from alamb and jorgecarleitao December 21, 2022 07:33
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FBS change looks good to me -- thank you @zagto

Is it possible to incorporate the documentation from #13333 into this pR as well so that the new changes are documented?

For anyone following along, see https://lists.apache.org/thread/539scy67qom5t2fkkd1m6fvh5htvwo3s discussion for more context

cc @zeroshade

@pitrou
Copy link
Member

pitrou commented Jan 3, 2023

Is it possible to incorporate the documentation from #13333 into this pR as well so that the new changes are documented?

+1 on this. It will also make reviewing easier.

@zeroshade
Copy link
Member

@alamb @pitrou I rebased the docs from #13333 into this PR (to maintain the commit history from @zagto) and then updated them with the new name being "Run-End encoded" rather than "Run-Length encoded". Let me know if there's anything else needed to add.

@zeroshade
Copy link
Member

zeroshade commented Jan 4, 2023

@github-actions crossbow submit preview-docs

EDIT: this PR was old, I've rebased it now and am going to make this comment again so it should work now.

@github-actions
Copy link

github-actions bot commented Jan 4, 2023

Unable to match any tasks for `preview-docs`
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/3840303309

format/Schema.fbs Show resolved Hide resolved
docs/source/format/Columnar.rst Outdated Show resolved Hide resolved
@zeroshade
Copy link
Member

@westonpace I've updated with a line prescribing the standard names for the children and updated the example layout to add the underscore so that it uses those standard names run_ends and values.

Once I get some explicit approvals on this I'll merge it in.

docs/source/format/Columnar.rst Outdated Show resolved Hide resolved
docs/source/format/Columnar.rst Outdated Show resolved Hide resolved
docs/source/format/Columnar.rst Outdated Show resolved Hide resolved
docs/source/format/Columnar.rst Outdated Show resolved Hide resolved
docs/source/format/Columnar.rst Outdated Show resolved Hide resolved
docs/source/format/Columnar.rst Outdated Show resolved Hide resolved
docs/source/format/Columnar.rst Outdated Show resolved Hide resolved
docs/source/format/Columnar.rst Outdated Show resolved Hide resolved
format/Schema.fbs Show resolved Hide resolved
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
zeroshade and others added 2 commits January 10, 2023 10:59
Co-authored-by: Weston Pace <weston.pace@gmail.com>
@lidavidm
Copy link
Member

General question here: how are we going to handle versioning of the Arrow specification? Are we going to merge this and down the line release an "Arrow specification 1.1.0"/MetadataVersion.V6 once we also have types like StringView, or are we going to bump these versions for each new type?

@zeroshade
Copy link
Member

That is a very good question @lidavidm.

What do you think @westonpace @pitrou @alamb @ianmcook ?

@pitrou
Copy link
Member

pitrou commented Jan 11, 2023

I think we only increase MetadataVersion when we make incompatible changes to the Flatbuffers definition, which is not the case here.

As for the format version, we should perhaps bump it:

/// Format Version History.
/// Version 1.0 - Forward and backwards compatibility guaranteed.
/// Version 1.1 - Add Decimal256 (No format release).
/// Version 1.2 (Pending)- Add Interval MONTH_DAY_NANO

@zeroshade
Copy link
Member

@pitrou I can bump it in the docs in this PR, but do we need to do a vote on the mailing list or anything to make the new version official?

@pitrou
Copy link
Member

pitrou commented Jan 11, 2023

I don't think a new vote is needed, though I don't know for sure.

@lidavidm
Copy link
Member

I suppose the version bump is implicit in adding a new data type? Looking at the Git history/the MONTH_DAY_NANO PR, the versions were just bumped when we added the type. (...And 1.2 is still "pending", guess we should take that out now and put 1.3 as adding REE.)

@zeroshade
Copy link
Member

zeroshade commented Jan 11, 2023

I've updated the Schema.fbs and Columnar.rst files for the version and regenerated files to pick up the new comments.

I've got two approvals here, so unless someone comments otherwise I'll merge this after EOD today, to give people a chance to add any final comments. Thanks everyone!

format/Schema.fbs Outdated Show resolved Hide resolved
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed only the changes to docs/source/format/Columnar.rst and format/Schema.fbs but they look great to me. Thank you to everyone who has been working on this proposal -- it is super exciting to see it so close

@@ -19,8 +19,9 @@

/// Format Version History.
/// Version 1.0 - Forward and backwards compatibility guaranteed.
/// Version 1.1 - Add Decimal256 (No format release).
/// Version 1.2 (Pending)- Add Interval MONTH_DAY_NANO
/// Version 1.1 - Add Decimal256.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@ursabot
Copy link

ursabot commented Jan 13, 2023

Benchmark runs are scheduled for baseline = d0ffedb and contender = 3146588. 3146588 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.99% ⬆️0.24%] test-mac-arm
[Finished ⬇️0.51% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️1.84% ⬆️0.03%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 3146588a ec2-t3-xlarge-us-east-2
[Finished] 3146588a test-mac-arm
[Finished] 3146588a ursa-i9-9960x
[Finished] 3146588a ursa-thinkcentre-m75q
[Finished] d0ffedbe ec2-t3-xlarge-us-east-2
[Failed] d0ffedbe test-mac-arm
[Finished] d0ffedbe ursa-i9-9960x
[Finished] d0ffedbe ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants