Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 readahead can lead to extra memory pressure and delays in certain cases #5649

Open
malhotrashivam opened this issue Jun 21, 2024 · 1 comment
Assignees
Labels
bug Something isn't working parquet Related to the Parquet integration s3 triage
Milestone

Comments

@malhotrashivam
Copy link
Contributor

malhotrashivam commented Jun 21, 2024

Consider the following experiment,

  • Parquet file size: 20 GB
  • Memory size: 8 GB
  • Time to do select:
    - Fragment size = 5 MB, Read ahead = 0 : 35 seconds
    - Fragment size = 5 MB, Read ahead = 32 : 212 seconds

This shows that in certain situations, read ahead can actually lead to additional delays and more timeouts.

As part of this issue, we should investigate it further and experiment with some heuristic approach, like, turning off read ahead when we have already used 90% of the memory.

Detected during #5613

@malhotrashivam malhotrashivam added bug Something isn't working triage parquet Related to the Parquet integration s3 labels Jun 21, 2024
@malhotrashivam malhotrashivam added this to the 5. Backlog milestone Jun 21, 2024
@malhotrashivam malhotrashivam self-assigned this Jun 21, 2024
@devinrsmith
Copy link
Member

Some potential ideas would be to develop heuristics around com.sun.management.GarbageCollectionNotificationInfo and package it in a common library that others could easily poll. Or, some sort of gauge on how many read ahead requests are created, but unable to be fulfilled later bc GC freed the SoftReference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working parquet Related to the Parquet integration s3 triage
Projects
None yet
Development

No branches or pull requests

2 participants