Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce Logging of Errors in GCS Rest RetriesTests #50761

Merged

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented Jan 8, 2020

It's impossible to tell why #50754 fails without this change.
We're failing to close the exchange somewhere and there is no
write timeout in the GCS SDK (something to look into separately)
only a read timeout on the socket so if we're failing on an assertion without
reading the full request body (at least into the read-buffer ... this test works
in some other spots that intentionally don't fully drain the requestBody because the body is small enough to fit into the read-buffer I think ... that's why it's only failing for the large write ... I left the small write spots as is for that reason for now) we're locking up waiting forever on write0.

This change ensure the exchange is closed in the tests where we could lock up
on a write and logs the failure so we can find out what broke #50754.

It's impossible to tell why elastic#50754 fails without this change.
We're failing to close the `exchange` somewhere and there is no
write timeout in the GCS SDK (something to look into separately)
only a read timeout on the socket so if we're failing on an assertion without
reading the full request body (at least into the read-buffer) we're locking up
waiting forever on `write0`.

This change ensure the `exchange` is closed in the tests where we could lock up
on a write and logs the failure so we can find out what broke elastic#50754.
@original-brownbear original-brownbear added >test Issues or PRs that are addressing/adding tests :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.6.0 labels Jan 8, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@original-brownbear
Copy link
Member Author

original-brownbear commented Jan 8, 2020

@tlrx it doesn't end :( but getting closer :)

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tlrx it doesn't end :( but getting closer :)

I know - so sorry about that. Once you got bored I'll take the relay :)

@original-brownbear original-brownbear merged commit 95fcee2 into elastic:master Jan 9, 2020
@original-brownbear original-brownbear deleted the 50754-logging branch January 9, 2020 08:43
@original-brownbear
Copy link
Member Author

Thanks Tanguy! :)

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jan 9, 2020
It's impossible to tell why elastic#50754 fails without this change.
We're failing to close the `exchange` somewhere and there is no
write timeout in the GCS SDK (something to look into separately)
only a read timeout on the socket so if we're failing on an assertion without
reading the full request body (at least into the read-buffer) we're locking up
waiting forever on `write0`.

This change ensure the `exchange` is closed in the tests where we could lock up
on a write and logs the failure so we can find out what broke elastic#50754.
original-brownbear added a commit that referenced this pull request Jan 9, 2020
It's impossible to tell why #50754 fails without this change.
We're failing to close the `exchange` somewhere and there is no
write timeout in the GCS SDK (something to look into separately)
only a read timeout on the socket so if we're failing on an assertion without
reading the full request body (at least into the read-buffer) we're locking up
waiting forever on `write0`.

This change ensure the `exchange` is closed in the tests where we could lock up
on a write and logs the failure so we can find out what broke #50754.
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
It's impossible to tell why elastic#50754 fails without this change.
We're failing to close the `exchange` somewhere and there is no
write timeout in the GCS SDK (something to look into separately)
only a read timeout on the socket so if we're failing on an assertion without
reading the full request body (at least into the read-buffer) we're locking up
waiting forever on `write0`.

This change ensure the `exchange` is closed in the tests where we could lock up
on a write and logs the failure so we can find out what broke elastic#50754.
@original-brownbear original-brownbear restored the 50754-logging branch August 6, 2020 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test Issues or PRs that are addressing/adding tests v7.6.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants