-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-36819: [R] Use RunWithCapturedR for reading Parquet files #37274
Conversation
|
I'm not familiar enough with Naively, if |
@ursabot please benchmark |
Benchmark runs are scheduled for commit 775db90. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete. |
I believe the fact that the parquet reader issues reads on the calling (or any non-IO) thread is considered a bug (#30496). Good catch on issuing all read calls in the same way! |
Thanks for your patience. Conbench analyzed the 6 benchmarking runs that have been run so far on PR commit 775db90. There were 3 benchmark results indicating a performance regression:
The full Conbench report has more details. |
AFAIK all of the This for the plain Parquet reader, not the Parquet scanning through dataset, which I think has its own logic of threading. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @paleolimbot !
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit b5d36e9. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
…pache#37274) ### Rationale for this change When we first added RunWithCapturedR to support reading files from R connections, none of the Parquet tests seemed to call R from another thread. Because RunWithCapturedR comes with some complexity, I didn't add it anywhere it wasn't strictly needed. A recent StackOverflow post exposed that reading very large parquet files do use multiple threads and thus need RunWithCapturedR. ### What changes are included in this PR? The two most common calls to read a parquet in which a user might trigger this failure are now wrapped in RunWithCapturedR. ### Are these changes tested? The changes are tested in the current suite. ### Are there any user-facing changes? No. * Closes: apache#36819 Lead-authored-by: Dewey Dunnington <dewey@voltrondata.com> Co-authored-by: Dewey Dunnington <dewey@fishandwhistle.net> Signed-off-by: Dewey Dunnington <dewey@voltrondata.com>
…pache#37274) ### Rationale for this change When we first added RunWithCapturedR to support reading files from R connections, none of the Parquet tests seemed to call R from another thread. Because RunWithCapturedR comes with some complexity, I didn't add it anywhere it wasn't strictly needed. A recent StackOverflow post exposed that reading very large parquet files do use multiple threads and thus need RunWithCapturedR. ### What changes are included in this PR? The two most common calls to read a parquet in which a user might trigger this failure are now wrapped in RunWithCapturedR. ### Are these changes tested? The changes are tested in the current suite. ### Are there any user-facing changes? No. * Closes: apache#36819 Lead-authored-by: Dewey Dunnington <dewey@voltrondata.com> Co-authored-by: Dewey Dunnington <dewey@fishandwhistle.net> Signed-off-by: Dewey Dunnington <dewey@voltrondata.com>
Rationale for this change
When we first added RunWithCapturedR to support reading files from R connections, none of the Parquet tests seemed to call R from another thread. Because RunWithCapturedR comes with some complexity, I didn't add it anywhere it wasn't strictly needed. A recent StackOverflow post exposed that reading very large parquet files do use multiple threads and thus need RunWithCapturedR.
What changes are included in this PR?
The two most common calls to read a parquet in which a user might trigger this failure are now wrapped in RunWithCapturedR.
Are these changes tested?
The changes are tested in the current suite.
Are there any user-facing changes?
No.