-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: A stuck when the client fail to get DoneCallback #1637
Conversation
LGTM |
Do we know why the doneCallback isn't being called? Is there any chance of a memory leak from this because resources aren't being cleaned up? e.g. is there any chance we're swallowing an exception here and need a catch block to set connectionFinalStatus Line 622 in 3a33f49
Otherwise LGTM |
I don't exactly know, what we saw is the connector stuck for days in this: https://b.corp.google.com/issues/230501926#comment3 According to Reuven: Reuven Lax, Wed 11:37 AM What do you mean by swallow an exception on L622? |
@@ -505,6 +506,14 @@ private void waitForDoneCallback() { | |||
} | |||
Uninterruptibles.sleepUninterruptibly(100, TimeUnit.MILLISECONDS); | |||
} | |||
this.lock.lock(); | |||
if (connectionFinalStatus == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super familiar w/java, but do we need the try {} finallly {lock.unlock();} around this block like I see in other places, or is that unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
🤖 I have created a release *beep* *boop* --- ## [2.13.0](v2.12.2...v2.13.0) (2022-05-05) ### Features * add support to a few more specific StorageErrors for the Write API ([#1563](#1563)) ([c26091e](c26091e)) * next release from main branch is 2.12.2 ([#1624](#1624)) ([b2aa2a4](b2aa2a4)) ### Bug Fixes * A stuck when the client fail to get DoneCallback ([#1637](#1637)) ([3baa84e](3baa84e)) * Fix a possible NULL PTR after introduced timeout on waitForDone ([#1638](#1638)) ([e1c6ded](e1c6ded)) ### Dependencies * update dependency com.google.cloud:google-cloud-bigquery to v2.10.10 ([#1623](#1623)) ([54b74b8](54b74b8)) * update dependency org.apache.avro:avro to v1.11.0 ([#1632](#1632)) ([b47eea0](b47eea0)) ### Documentation * **samples:** update WriteComittedStream sample code to match best practices ([#1628](#1628)) ([5d4c7e1](5d4c7e1)) * **sample:** update WriteToDefaultStream sample to match best practices ([#1631](#1631)) ([73ddd7b](73ddd7b)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Add a timeout of one minute waiting for done callback to be called. Same timeout as client close.
The donecallback mainly gives back the server side error status, so it is not critical. In Dataflow connector, we saw hang because the DoneCallback is lost and we wait forever on it.
Stack trace in b/230501926