Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37453: [C++][Parquet] Performance fix for WriteBatch #37454

Merged
merged 2 commits into from
Aug 30, 2023

Conversation

adamreeve
Copy link
Contributor

@adamreeve adamreeve commented Aug 30, 2023

Rationale for this change

Reduces the time taken for TypedColumnWriter::WriteBatch, which regressed with #35230

What changes are included in this PR?

This change computes the value for pages_change_on_record_boundaries once when a TypedColumnWriter is constructed rather than on every call to WriteBatch.

Are these changes tested?

This doesn't change behaviour so should be covered by existing tests.

Are there any user-facing changes?

No

Compute the value for pages_change_on_record_boundaries
on construction rather than on each call, as this adds
a lot of overhead.
@adamreeve adamreeve requested a review from wgtmac as a code owner August 30, 2023 05:03
@github-actions
Copy link

⚠️ GitHub issue #37453 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @wgtmac

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @adamreeve . LGTM.

@pitrou pitrou merged commit f40bf77 into apache:main Aug 30, 2023
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Aug 30, 2023
@adamreeve adamreeve deleted the fix_page_index_perf branch August 30, 2023 21:29
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit f40bf77.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…#37454)

### Rationale for this change

Reduces the time taken for `TypedColumnWriter::WriteBatch`, which regressed with apache#35230 

### What changes are included in this PR?

This change computes the value for `pages_change_on_record_boundaries` once when a `TypedColumnWriter` is constructed rather than on every call to `WriteBatch`.

### Are these changes tested?

This doesn't change behaviour so should be covered by existing tests.

### Are there any user-facing changes?

No
* Closes: apache#37453

Authored-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…#37454)

### Rationale for this change

Reduces the time taken for `TypedColumnWriter::WriteBatch`, which regressed with apache#35230 

### What changes are included in this PR?

This change computes the value for `pages_change_on_record_boundaries` once when a `TypedColumnWriter` is constructed rather than on every call to `WriteBatch`.

### Are these changes tested?

This doesn't change behaviour so should be covered by existing tests.

### Are there any user-facing changes?

No
* Closes: apache#37453

Authored-by: Adam Reeve <adreeve@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++][Parquet] Performance regression in TypedColumnWriter::WriteBatch
4 participants