Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FutureManager hangs indefinitely in the case of a bad file. #30

Closed
TheBoysAreZachInTown opened this issue Apr 8, 2015 · 26 comments · Fixed by #35
Closed

FutureManager hangs indefinitely in the case of a bad file. #30

TheBoysAreZachInTown opened this issue Apr 8, 2015 · 26 comments · Fixed by #35

Comments

@TheBoysAreZachInTown
Copy link

In the process of trying to do a TransferManager download, I found a bug with the FutureTransfer.listenFor code. In the simple example of

FutureTransfer.listenFor {
  transferManager.download("bucket", "location", new File("foobar")
}

where "foobar" is a file with permissions of 400, instead of throwing a FileNotFoundException with bad permissions, it just hangs indefinitely. No amount of mapping, or making use of waitForCompletion() seemed to help, leading to the conclusion that the Future is never finishing. I looked a little into the source code and found that in the listenFor function, there are cases where the 'progressEvent' will never fire. In this case, since the progress hadn't changed because the File failed before the, and the transfer wasn't done, it just hung.

@dwhjames
Copy link
Owner

dwhjames commented Apr 8, 2015

Hi @marcinx27, could you try running your example with logging at debug level for com.github.dwhjames.awswrap.s3.FutureTransfer and see if anything useful is logged?

@dwhjames
Copy link
Owner

dwhjames commented Apr 8, 2015

And I assume if you call
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/Transfer.html#waitForCompletion()
directly on the download object, you get the expected exception?

I’m guessing the download does not fail immediately, as listenFor checks for
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/Transfer.html#isDone()
after attaching the progress listener. (Only fails once a write is attempted to the file?)

And here I’m assuming that isDone() will eventually return true.

@TheBoysAreZachInTown
Copy link
Author

@dwhjames yeah. If I called it directly and it threw the exception. But, if I do it like

FutureTransfer.listenFor {
  transferManager.download("bucket", "key", new File("foobar")
} map ( result => result.waitForCompletion())

and then later do an Await.result on that future, it times out.

@TheBoysAreZachInTown
Copy link
Author

Also, I'm working on figuring out how to get debug level logging working for com.github.dwhjames.awswrap.s3.FutureTransfer , so when I get that finished, I'll post the debug messages.

@dwhjames
Copy link
Owner

dwhjames commented Apr 8, 2015

With regards to logging, take a look at the logback config in the scratch project:
https://github.com/dwhjames/aws-wrap/blob/master/scratch/src/main/resources/logback.xml

@dwhjames
Copy link
Owner

dwhjames commented Apr 8, 2015

If you capture the exception with
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/Transfer.html#waitForException()
is it a FileNotFoundException directly? Or wrapped in an AWS exception type?

And after that blocking call completes, .isDone() returns true? And .getState() returns …?

@TheBoysAreZachInTown
Copy link
Author

It's wrapped by com.amazonaws.AmazonClientException: Unable to store object contents to disk .

And that block never finishes. The future returned by FutureTransfer.listenFor is never complete, so it can never hit the map to run anything on the object the future is wrapping.

@dwhjames
Copy link
Owner

dwhjames commented Apr 8, 2015

val d = transferManager.download("bucket", "key", new File("foobar"))
val e = d.waitForException()
val b = d.isDone()
val s = d.getState()

What are the values of b and s?

@TheBoysAreZachInTown
Copy link
Author

b is true
s is Failed

@dwhjames
Copy link
Owner

dwhjames commented Apr 8, 2015

I think we are somewhat at the mercy of what the Java SDK chooses to publish to the progress listener. Have a look up the call stack from that exception:

  1. https://github.com/aws/aws-sdk-java/blob/3baab096839f6848eed40940fb7b6f519a8e0580/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/ServiceUtils.java#L297-L300
  2. https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/DownloadCallable.java#L162-L168
  3. https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/DownloadCallable.java#L84-L99
  4. https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/internal/DownloadImpl.java#L113-L119

Interestingly, there also seems to be a difference between uploads and download in how events are handled. I have a suspicion that the failure event might actually get fired when uploading from a non-existent file… in contrast to downloading to a non-existent file.

@TheBoysAreZachInTown
Copy link
Author

Oh. So, if I understand the Amazon code correctly, the IOException is getting caught and immediately throwing an AmazonClientException. But this causes the Download state to be Failed, and thus it doesn't fire a progress event.

If I'm understanding that correctly, then in s3.scala in the listenFor function, should there also be a

if(transfer.getState == TransferState.Failed)

and then handle the failed state (a p failure could work, but you've got documentation stating that the Future will always succeed, so I don't know if you want to change that or not)?

@dwhjames
Copy link
Owner

dwhjames commented Apr 8, 2015

listenFor already tests that condition by calling .isDone().

See
https://github.com/aws/aws-sdk-java/blob/cf8174ab5fdd9e5f012b1657835fc30456622598/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/internal/AbstractTransfer.java#L74-L78

There is an intentional race in listenFor.

  1. it attaches a listener to the transfer
  2. it tests isDone()

Crucially, 2. is performed after 1. so that if the progress listener was attached after the transfer already finished (and thus the listener doesn’t receive any events) then the final status will still be caught with .isDone().

If .isDone() is not true at this point, then the listener has already been attached, so will receive all events from this point forward.

In the execution that you are encountering, the transfer state is not in a done state at the point immediately after attaching the progress listener. But as it appears that the SDK fails to signal the failure progress event, the future will never complete.

Note that the catch block that sets the failure state is in call of a Callable, so
https://github.com/aws/aws-sdk-java/blob/cf8174ab5fdd9e5f012b1657835fc30456622598/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/DownloadCallable.java#L80-L105
is running in another thread. Hence the race, and hence why despite the download failing ‘immediately’, it is not ‘immediately enough’, so .isDone() is still false at the point after attaching the progress listener.

So, I’m somewhat at a loss about how to fix this on the side of aws-wrap. I think it might be worth filing a bug on the SDK, as it seems bad that failure events are not being propagated.

@TheBoysAreZachInTown
Copy link
Author

@dwhjames right, right. That makes sense. I forgot that isDone() only gets called once as a way to stop the race condition of the process being finished before a listener gets attached. This seems like an AWS bug to me. Thanks for the help, though.

Since I need to be able to do the operation of getting an object to a file, not an array of bytes, it seems like the only work around I have now is to just extend the AWS-Wrap to include getObject(GetObjectRequest, File)

@dwhjames
Copy link
Owner

dwhjames commented Apr 9, 2015

Ah, I haven’t published since your PR #29. I can do that today, if you’d like?

I think TransferManager is definitely the way to go in terms of efficiency. IIRC you will get progress events on errors on the AWS side, so I suppose you can work around the client-side issue by defensively checking file system status?

Also, if you just use the direct wrapping of getObject, depending on what executor you are passing in (or just using the default

this(awsCredentialsProvider, clientConfiguration, Executors.newFixedThreadPool(clientConfiguration.getMaxConnections, new S3ThreadFactory()))
) you are going to tie up a thread per transfer.

I think TransferManager is pretty efficient with its thread usage (despite blocking I/O). You could copy its pattern of dedicating a timer thread to poll. You can see this in

So an alternative implementation of listenFor could have a single thread scheduled to poll for N transfers and complete Scala promises on transfer state changes.

@TheBoysAreZachInTown
Copy link
Author

Um. Give me a few hours before you publish. I noticed that I didn't implement the getObject with a file, and I would like to have that in there just in case I get too frustrated with hacking around the TransferManager and want to use the standard implementation. I'll work on it now and push by like 2:30 EST.

@dwhjames
Copy link
Owner

dwhjames commented Apr 9, 2015

No rush, I won’t have access to a machine I can publish from until late today.

@dwhjames
Copy link
Owner

dwhjames commented Apr 9, 2015

I’m not sure if this crosses the icky line, but one solution for this problem is to cast the com.amazonaws.services.s3.transfer.Transfer to com.amazonaws.services.s3.transfer.internal.AbstractTransfer and attach a TransferStateChangeListener rather than a progress listener…

@TheBoysAreZachInTown
Copy link
Author

That might be a workable thing. I'll probably look into going that way in a little bit. For right now, I think I'm going to go with just straight AmazonS3ScalaClient stuff, and then try and transition the things that I can over to the TransferManager when everything's stable.

@dwhjames
Copy link
Owner

This approach seems to work. See 087cbdf...4615ada

@TheBoysAreZachInTown
Copy link
Author

@dwhjames that seems pretty reasonable. I dig it. I'm going to flesh out what I've got going on now with the S3Client first, but I'm absolutely going to look into the new TransferManager stuff that you're working on when I'm at least at a point where everything works.

@dwhjames
Copy link
Owner

@marcinx27 did you submit an issue to https://github.com/aws/aws-sdk-java/ ? I couldn’t see anything.

I’m going to refine the listenFor stuff a little further—try to consolidate the two approaches. I’ll open a PR and ping you when I’ve polish it.

@TheBoysAreZachInTown
Copy link
Author

@dwhjames I didn't. I was going to, got side tracked, and then kind of forgot about doing it. And definitely let me know when you've got that new listenFor code down. I'm stoked to see what you do with it.

@dwhjames
Copy link
Owner

Let me know when you do. I’d like to track it.

dwhjames added a commit that referenced this issue Apr 22, 2015
Cast to internal type `AbstractTransfer` so that a
low-level state change listener can be attached.
In light of issue #30, this provides a more foolproof
implementation of listening to transfer events.
There is now a three-way race to complete the
promise that signals the completion/termination of
the transfer.

This adds an integration test for s3, using the
fakes3 ruby gem.

closes #30
@dwhjames
Copy link
Owner

@marcinx27, I just closed this from #35

@dwhjames
Copy link
Owner

@marcinx27, I released #35 as part of v0.7.2

@TheBoysAreZachInTown
Copy link
Author

@dwhjames Cool, man. That's awesome. When I find a free second I want to go over the new release and check it out. I'm sure it's great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants