Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsider Parquet read buffer size #11162

Closed
jlowe opened this issue Jul 9, 2024 · 1 comment
Closed

Reconsider Parquet read buffer size #11162

jlowe opened this issue Jul 9, 2024 · 1 comment
Labels
duplicate This issue or pull request already exists reliability Features to improve reliability or bugs that severly impact the reliability of the plugin

Comments

@jlowe
Copy link
Member

jlowe commented Jul 9, 2024

ParquetPartitionReaderBase uses the parquet.read.allocation.size config to control the copy buffer used for reading from the Parquet input stream. The default is 8MB which is pretty big, especially when there are many reader threads via the multithreaded reader. With a large number of reader threads all performing Parquet reads, these per-thread buffers can add up to a nontrivial amount of heap memory which can trigger OOMs. We should do some empirical measurements in various environments to see if we can significantly lower the default for this.

@jlowe jlowe added the ? - Needs Triage Need team to review and classify label Jul 9, 2024
@mattahrens mattahrens added reliability Features to improve reliability or bugs that severly impact the reliability of the plugin and removed ? - Needs Triage Need team to review and classify labels Jul 16, 2024
@sameerz sameerz added the duplicate This issue or pull request already exists label Jul 23, 2024
@sameerz
Copy link
Collaborator

sameerz commented Jul 23, 2024

Duplicate of issue #9269

@sameerz sameerz closed this as not planned Won't fix, can't repro, duplicate, stale Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists reliability Features to improve reliability or bugs that severly impact the reliability of the plugin
Projects
None yet
Development

No branches or pull requests

3 participants