Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: due to buffering in the underlying stream reads are limited in size #25

Merged
merged 1 commit into from
Jan 16, 2024

Conversation

wolfeidau
Copy link
Owner

@wolfeidau wolfeidau commented Jan 9, 2024

Currently if you call Read and provide a buffer larger than a few k the stream returns only that few k, given the reader abstraction discards the stream between Read operations this results in many API calls.

To mitigate this I moved back to retaining the stream from an optimistic get object which is performed when the file is opened, this then is used by subsquent read calls, and closed when the file is closed.

This PR adds tests that illustrate the issue.

One thing to note is using Seek closes the initial get object stream as this doesn't support these operations, and instead uses ReadAt, which results in more API calls.

Currently if you call `Read` and provide a buffer larger than a few k the stream returns only that few k, given the reader abstraction discards the stream between `Read` operations this results in many API calls.

To mitigate this I moved back to retaining the stream from an optimistic get object which is configured when the file is opened, this then is used by subsquent read calls.

This PR adds tests that illustrate the issue.

One thing to note is using `Seek` closes the initial get object stream as this doesn't support these operations, and instead uses `ReadAt`, which results in more API calls.
@wolfeidau wolfeidau merged commit 78f80c6 into master Jan 16, 2024
1 check passed
@wolfeidau wolfeidau deleted the fix_large_reads_not_working branch January 16, 2024 01:22
wolfeidau added a commit to wolfeidau/iceberg-go that referenced this pull request Jan 16, 2024
This resolves issues with endpoint changes in the SDK, see aws/aws-sdk-go-v2#2370, and upgrades s3iofs which has been upgraded to improve performance, see wolfeidau/s3iofs#25.
nastra pushed a commit to apache/iceberg-go that referenced this pull request Jan 16, 2024
This resolves issues with endpoint changes in the SDK, see aws/aws-sdk-go-v2#2370, and upgrades s3iofs which has been upgraded to improve performance, see wolfeidau/s3iofs#25.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant