fix: due to buffering in the underlying stream reads are limited in size #25

wolfeidau · 2024-01-09T02:50:31Z

Currently if you call Read and provide a buffer larger than a few k the stream returns only that few k, given the reader abstraction discards the stream between Read operations this results in many API calls.

To mitigate this I moved back to retaining the stream from an optimistic get object which is performed when the file is opened, this then is used by subsquent read calls, and closed when the file is closed.

This PR adds tests that illustrate the issue.

One thing to note is using Seek closes the initial get object stream as this doesn't support these operations, and instead uses ReadAt, which results in more API calls.

Currently if you call `Read` and provide a buffer larger than a few k the stream returns only that few k, given the reader abstraction discards the stream between `Read` operations this results in many API calls. To mitigate this I moved back to retaining the stream from an optimistic get object which is configured when the file is opened, this then is used by subsquent read calls. This PR adds tests that illustrate the issue. One thing to note is using `Seek` closes the initial get object stream as this doesn't support these operations, and instead uses `ReadAt`, which results in more API calls.

This resolves issues with endpoint changes in the SDK, see aws/aws-sdk-go-v2#2370, and upgrades s3iofs which has been upgraded to improve performance, see wolfeidau/s3iofs#25.

wolfeidau force-pushed the fix_large_reads_not_working branch from ed9ea38 to 1e05bc6 Compare January 16, 2024 01:19

wolfeidau merged commit 78f80c6 into master Jan 16, 2024
1 check passed

wolfeidau deleted the fix_large_reads_not_working branch January 16, 2024 01:22

wolfeidau mentioned this pull request Jan 16, 2024

chore(deps): upgrade all the AWS SDK v2 deps, and s3iofs apache/iceberg-go#50

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: due to buffering in the underlying stream reads are limited in size #25

fix: due to buffering in the underlying stream reads are limited in size #25

wolfeidau commented Jan 9, 2024 •

edited

Loading

fix: due to buffering in the underlying stream reads are limited in size #25

fix: due to buffering in the underlying stream reads are limited in size #25

Conversation

wolfeidau commented Jan 9, 2024 • edited Loading

wolfeidau commented Jan 9, 2024 •

edited

Loading