Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[S3 upload] API to upload outputstream #1268

Closed
16srana opened this issue Aug 8, 2017 · 8 comments
Closed

[S3 upload] API to upload outputstream #1268

16srana opened this issue Aug 8, 2017 · 8 comments
Assignees
Labels
guidance Question that needs advice or information.

Comments

@16srana
Copy link

16srana commented Aug 8, 2017

Current API in sdk accepts either file or inputStream object to upload content, which is more sort of a pull mechanism from s3 to upload content.
Can we have a API more sort of a push mechanism, like an API which accepts outputStream object to upload any content.

Actually we are using s3 to upload large files in GBs to be distributed to our users, which we generate based on our users request, currently we save the file locally and then upload it, in ideal case we would like to generate/write the zip file directly to s3.

@millems
Copy link
Contributor

millems commented Aug 10, 2017

Can you describe your use-case in a little bit more depth? You'd like us to return an OutputStream to which you can push content?

Something like this? (Ignore syntax for now, just trying to understand)

SdkOutputStream<PutObjectResponse> contentStream = s3.initiatePutObject(request);
contentStream.write("My Data".getBytes());
PutObjectResponse response = contentStream.complete();

@millems millems self-assigned this Aug 10, 2017
@millems
Copy link
Contributor

millems commented Aug 11, 2017

See also: #1139 (comment)

@16srana
Copy link
Author

16srana commented Aug 25, 2017

Sorry for Late reply...
@millems Yeah the sudo code above is what expected, also the issue #1139 mentioned also has similar ask.
Can we Expect this API?
If yes then when ?
Actually in my service I have to generate huge zip files ranging from 2~12 GB in size, which I have to first write to filesystem and than upload it to s3, I would really love to merge these steps.

SdkOutputStream<PutObjectResponse> contentStream = s3.initiatePutObject(request);
ZipOutputStream os = new ZipOutputStream(contentStream.getOutputStream());
os.write("My Data".getBytes());
PutObjectResponse response = contentStream.complete();

Actually I will be wrapping the contentStream into my zip output stream object
I am expecting following two benefits:

  1. Less memory use.
  2. Less time taken to make the file available to my users.
  3. Less IO operations.

@shorea
Copy link
Contributor

shorea commented Aug 25, 2017

Would a PipedInputStream and output stream work for you? You'd still need to know the content length in this case and set in on the PutObjectRequest otherwise the SDK will buffer it into memory.

If you don't know the content length then I think your best option is to do a multipart upload and buffer each part into memory. The minimum for part size is 5MB so memory consumption is manageable. I don't believe S3 supports chunked encoding which would allow for dynamic content, if that's desired than you'll have to make a feature request to the S3 service team.

@16srana
Copy link
Author

16srana commented Aug 28, 2017

@shorea I have tried the Piped Input/Output Stream, it involves making a reader thread and a writer thread and synchronizing them and messaging between them, that wasn't quite effective and had few performance hits as well, also there is lot of boiler plate code, I am already considering this as my backup option.

I was hoping an option from the sdk itself, but since you have mentioned that the support is not from the s3 itself, than I think I will have to rely on workarounds.
I was reading the s3 upload documentations and it does mention that, If the length of the content is unknown than the upload would happen in a single thread, which would take forever to upload large files.

@shorea
Copy link
Contributor

shorea commented Aug 28, 2017

Yeah if we don't know the content length we have to buffer contents into memory which is obviously not ideal. Doing the multipart upload makes the buffering less of a problem but adds complexity to the upload.

@shorea
Copy link
Contributor

shorea commented Sep 5, 2017

Going to close this and open a feature request in our V2 repo. I think there are a couple of things we can do to make streaming easier via TransferManager

@shorea
Copy link
Contributor

shorea commented Sep 5, 2017

aws/aws-sdk-java-v2#139

@shorea shorea closed this as completed Sep 5, 2017
@srchase srchase added guidance Question that needs advice or information. and removed Question labels Jan 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
guidance Question that needs advice or information.
Projects
None yet
Development

No branches or pull requests

4 participants