Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GpuBatchScanExec partitions should be marked transient #5897

Merged
merged 1 commit into from
Jun 23, 2022

Conversation

jlowe
Copy link
Contributor

@jlowe jlowe commented Jun 23, 2022

SPARK-32168 marked the partitions member as @transient, but this change was not ported to the Spark 3.1.x version of GpuBatchScanExec. This can cause a significant increase in serialized task binary size when the query plan is inadvertently serialized when the partition objects are relatively large, as is the case with Apache Iceberg partitions.

Signed-off-by: Jason Lowe <jlowe@nvidia.com>
@jlowe jlowe added this to the Jun 20 - Jul 8 milestone Jun 23, 2022
@jlowe jlowe self-assigned this Jun 23, 2022
@jlowe
Copy link
Contributor Author

jlowe commented Jun 23, 2022

build

@jlowe jlowe merged commit 1abfc8f into NVIDIA:branch-22.08 Jun 23, 2022
@jlowe jlowe deleted the batchexec-transient-partitions branch June 23, 2022 19:48
@sameerz sameerz added the Spark 3.1+ Bugs only related to Spark 3.1 or higher label Jun 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Spark 3.1+ Bugs only related to Spark 3.1 or higher
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants