Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Added support to write parquet files to S3 #5914

Merged
merged 19 commits into from
Aug 15, 2024

Conversation

malhotrashivam
Copy link
Contributor

@malhotrashivam malhotrashivam commented Aug 7, 2024

Added a new OutputStream to write to S3 and moved all parquet writing code from using SeekableByteChannels to OutputStreams.

@malhotrashivam malhotrashivam added feature request New feature or request parquet Related to the Parquet integration DocumentationNeeded ReleaseNotesNeeded Release notes are needed s3 labels Aug 7, 2024
@malhotrashivam malhotrashivam added this to the 0.36.0 milestone Aug 7, 2024
@malhotrashivam malhotrashivam self-assigned this Aug 7, 2024
Copy link
Member

@devinrsmith devinrsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick first past.

private CompletableFuture<UploadPartResponse> future;

OutgoingRequest(final int partSize) {
buffer = ByteBuffer.allocate(partSize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could think about taking these from a pool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these buffers are longer lived and we allocate a fixed maximum number of buffers to be used in a circular manner, I didn't use a pool here. But I am open to the idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it would be a good idea. Maybe better in a follow-on PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private CompletableFuture<UploadPartResponse> future;

OutgoingRequest(final int partSize) {
buffer = ByteBuffer.allocate(partSize);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these buffers are longer lived and we allocate a fixed maximum number of buffers to be used in a circular manner, I didn't use a pool here. But I am open to the idea.

py/server/deephaven/experimental/s3.py Outdated Show resolved Hide resolved
py/server/deephaven/parquet.py Outdated Show resolved Hide resolved
chipkent
chipkent previously approved these changes Aug 7, 2024
Copy link
Member

@chipkent chipkent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python LGTM

private CompletableFuture<UploadPartResponse> future;

OutgoingRequest(final int partSize) {
buffer = ByteBuffer.allocate(partSize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it would be a good idea. Maybe better in a follow-on PR.

chipkent
chipkent previously approved these changes Aug 13, 2024
Copy link
Member

@chipkent chipkent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Py LGTM

Copy link
Member

@devinrsmith devinrsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a shallower-than-normal review for purpose of expediency. I can dig in further to specific areas if we think there is value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: did not review.

@malhotrashivam malhotrashivam merged commit bda75ef into deephaven:main Aug 15, 2024
16 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Aug 15, 2024
@deephaven-internal
Copy link
Contributor

Labels indicate documentation is required. Issues for documentation have been opened:

Community: deephaven/deephaven-docs-community#294

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
DocumentationNeeded feature request New feature or request parquet Related to the Parquet integration ReleaseNotesNeeded Release notes are needed s3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants