Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get "java.lang.OutOfMemoryError" when using TransferManager.uploadDirectory to upload a 81G directory #1271

Closed
longi5rui opened this issue Aug 10, 2017 · 5 comments
Assignees
Labels
guidance Question that needs advice or information.

Comments

@longi5rui
Copy link

longi5rui commented Aug 10, 2017

Hi, below is my issue:

Issue:
I used TransferManager.uploadDirectory in aws-sdk-java to transfer a big directory (81G) to S3 bucket, and get "java.lang.OutOfMemoryError". I counted and there are 3015,427 files in that directory. By tracing the stack I can see this seems due to TransferManager.listFiles() which recursively lists all files and uploads them.

Trace Stack:

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.io.UnixFileSystem.resolve(UnixFileSystem.java:108)
	at java.io.File.<init>(File.java:262)
	at java.io.File.listFiles(File.java:1212)
	at com.amazonaws.services.s3.transfer.TransferManager.listFiles(TransferManager.java:1344)
	at com.amazonaws.services.s3.transfer.TransferManager.listFiles(TransferManager.java:1349)
	at com.amazonaws.services.s3.transfer.TransferManager.listFiles(TransferManager.java:1349)
	at com.amazonaws.services.s3.transfer.TransferManager.listFiles(TransferManager.java:1349)
        ... And a lot of same lines ...

Question:

  1. Is it proper to use TransferManager.uploadDirectory() to upload such large directory(81G) including large amount of files? I thought TransferManager should handle these things but I am not sure...
  2. Should I use different JVM settings? But I am not sure what setting I should use and I am a newbie to this.
  3. Any suggestion will be appreciated. Like should I iterate every files in that directory myself and upload them separately or use Multi-part uploading? I am not sure and didn't find something about this in your docs.

Thank you so much!

@millems millems self-assigned this Aug 10, 2017
@millems
Copy link
Contributor

millems commented Aug 10, 2017

uploadDirectory will definitely have trouble with directories that have a really large number of files because all of the file names are loaded into memory. Because we're built on Java 6, we don't have Java 7's DirectoryStream API to use for only loading some of the file names into memory.

As a workaround, your suggestion of iterating the files yourself (using DirectoryStream) would work, but that's not very fun for you.

Is this a one-off upload or something you'll be building into a service?

@longi5rui
Copy link
Author

Thanks @millems. I got the idea and I think DirectoryStream may help me. I am working on a project that back up such large directory with a large number files to S3 by the way.

@millems
Copy link
Contributor

millems commented Aug 11, 2017

I gotcha. If it's ever just a one-off thing, they AWS CLI might handle it better for now (it has a mechanism for synchronizing a directory to S3). I've added a note that we should fix this in our 2.x version, which won't be limited to Java 6.

Let me know if you have any trouble getting it to work the "hard" way. We could consider adding an overload that takes in an Iterable instead of the current List. That will allow us to load them from a DirectoryStream that you provide.

@dagnir
Copy link
Contributor

dagnir commented Sep 5, 2017

Going to go ahead a close this due to inactivity. Please feel free to reopen if you @millems's suggestion didn't work for you and you want to open this as a feature request.

@dagnir dagnir closed this as completed Sep 5, 2017
@longi5rui
Copy link
Author

Hi guys,

I need to reopen this issue. I followed your suggestions and used DataStream to iterate file system myself and then used TransferManager.upload(file, bucket, key) to upload each file. And I always got this error (java aws sdk):

AmazonS3Exception: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed....

And then every thread just died gradullay. I saw other issues related to this problem like here

Any suggestions on how to solve this when using java sdk? I cannot find any option to retry uploading in API TransferManager.upload(file, bucket, key) or PutObjectRequest.

Thanks a lot.

@srchase srchase added guidance Question that needs advice or information. and removed Question labels Jan 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
guidance Question that needs advice or information.
Projects
None yet
Development

No branches or pull requests

4 participants