Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: do not timeout when wait is true #4254

Merged
merged 1 commit into from
Jan 22, 2016
Merged

Conversation

gyuho
Copy link
Contributor

@gyuho gyuho commented Jan 21, 2016

Current V2 watch waits by encoding URL with wait=true.
When a client sets 'no-sync', it requests directly to
proxy and the proxy redirects it by cloning the request
object, which leads to cancel the original request when
it times out and the cloned request gets closed prematurely.

This fixes #3894 by querying
the original client request in order to not use context timeout
when 'wait=true'.

@gyuho gyuho changed the title client: do not timeout when wait is true [WIP] client: do not timeout when wait is true Jan 21, 2016
@gyuho gyuho changed the title [WIP] client: do not timeout when wait is true client: do not timeout when wait is true Jan 21, 2016
@heyitsanthony
Copy link
Contributor

would it be worthwhile to have an e2e test for this / proxying?

@gyuho
Copy link
Contributor Author

gyuho commented Jan 21, 2016

@heyitsanthony Good idea. Will add e2e test for this case. Thanks,

@gyuho gyuho changed the title client: do not timeout when wait is true *: do not timeout when wait is true Jan 21, 2016
@@ -442,9 +443,16 @@ func (c *simpleHTTPClient) Do(ctx context.Context, act httpAction) (*http.Respon
return nil, nil, err
}

isWait := false
if req != nil {
if req.URL != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&& instead of nesting if's

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will fix. thanks

@heyitsanthony
Copy link
Contributor

lgtm

@gyuho
Copy link
Contributor Author

gyuho commented Jan 21, 2016

@heyitsanthony Just added e2e test that only fails without this patch.

--- FAIL: TestBasicOpsV2CtlWatchWithProxyNoSync (6.91s)
    etcd_test.go:190: failed watch (Error:  client: etcd cluster is unav)ilable or misconfigured
FAIL
exit status 1

With this patch, e2e test passes:

PTAL. Thanks!

@@ -42,6 +44,7 @@ func TestBasicOpsNoTLS(t *testing.T) {
isPeerTLS: false,
initialToken: "new",
},
etcdProcessBasePort,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why have a new base port argument? the same ports were reusable across tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed like test was hanging for watch cases when using the same ports.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds broken; a process might be hanging

@xiang90
Copy link
Contributor

xiang90 commented Jan 22, 2016

@gyuho This change seems to be OK. But please update the documentation accordingly.

defer to @heyitsanthony on the testing changes.

@gyuho
Copy link
Contributor Author

gyuho commented Jan 22, 2016

@xiang90 Now understand what you were talking about. I will make the comment clearer. And will update tests addressing Anthony's feedback.

Thanks,

go func() {
defer wg.Done()
args = append(args, "watch", key)
proc, err := spawnCmd(args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a proc.Close() (spawnWithExpect will do it for you)

@xiang90
Copy link
Contributor

xiang90 commented Jan 22, 2016

@gyuho @heyitsanthony

Basically the whole long polling behavior is broken for the proxy case when proxy does not forward response header or response in time. The request header is designed for watch request in the first place. Or we have no way to know if the remote machine is hang or not. I am not even sure if we still need request header timeout or not.

}()

endpoint, be := "", ""
for _, p := range epc.procs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this loop should probably be replaced by calling methods like (e *EtcdProcessCluster) Members() []*etcdProcess and (e *EtcdProcessCluster) Proxies() []*etcdProcess

@gyuho gyuho force-pushed the check_wait branch 2 times, most recently from c595075 to bc58090 Compare January 22, 2016 02:24
@gyuho gyuho changed the title *: do not timeout when wait is true [wip] *: do not timeout when wait is true Jan 22, 2016
Current V2 watch waits by encoding URL with wait=true.
When a client sets 'no-sync', it requests directly to
proxy and the proxy redirects it by cloning the request
object, which leads to cancel the original request when
it times out and the cloned request gets closed prematurely.

This fixes coreos#3894 by querying
the original client request in order to not use context timeout
when 'wait=true'.
@gyuho gyuho changed the title [wip] *: do not timeout when wait is true *: do not timeout when wait is true Jan 22, 2016
@gyuho
Copy link
Contributor Author

gyuho commented Jan 22, 2016

I think e2e test is currently broken. #4257
I locally tested the e2e and confirmed that this fixes the issue #3894.

Let's fix the e2e issue in a separate PR.
Once we figure out the goroutine leak issue, I will rewrite the test.

@gyuho
Copy link
Contributor Author

gyuho commented Jan 22, 2016

@heyitsanthony Could you take a look at client.go again? I will write the end to end test in a separate PR. Need to figure out why it was hanging. Thanks!

@gyuho gyuho changed the title *: do not timeout when wait is true client: do not timeout when wait is true Jan 22, 2016
@heyitsanthony
Copy link
Contributor

lgtm

@gyuho
Copy link
Contributor Author

gyuho commented Jan 22, 2016

Thanks!

gyuho added a commit that referenced this pull request Jan 22, 2016
client: do not timeout when wait is true
@gyuho gyuho merged commit 6413c96 into etcd-io:master Jan 22, 2016
@gyuho gyuho deleted the check_wait branch January 31, 2016 00:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

etcdctl watch to a etcd in proxy mode with TLS to cluster fails
3 participants