Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve proxy log for retrying an unavailable endpoint #3564

Merged
merged 1 commit into from
Sep 25, 2015
Merged

Improve proxy log for retrying an unavailable endpoint #3564

merged 1 commit into from
Sep 25, 2015

Conversation

gouyang
Copy link
Contributor

@gouyang gouyang commented Sep 21, 2015

Fixes #3541

@@ -126,7 +126,7 @@ func timedUnavailabilityFunc(wait time.Duration) func(*endpoint) {
return func(ep *endpoint) {
time.AfterFunc(wait, func() {
ep.Available = true
log.Printf("proxy: marked endpoint %s available", ep.URL.String())
log.Printf("proxy: marked endpoint %s available to retry it until refresh interval time arrived.", ep.URL.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we improve this message? I cannot understand it well still.

Maybe: ... after refresh interval to hope the endpoint back online

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will give up the endpoint after refresh interval.
How about "marked endpoint %s available to give it another try".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will give up the endpoint after refresh interval.

What does this mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would not try the endpoint after the refresh interval, doesn't it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. it fails to get response from the endpoint
  2. it marks endpoint as unavailable
  3. it will not try the endpoint in refresh interval
  4. it marks endpoint as available to hope it back online

So it will try the endpoint after the refresh interval

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the fact is:

  1. it fails to get response from the endpoint
  2. it marks endpoint as unavailable
  3. it marks endpoint as available
  4. it try the endpoint after 5s(failure-wait)
  5. it repeat steps 1-4 6 times
  6. it goes out of the loop when the refresh interval (30s) is arrived

A full log looks like below:

./etcd -proxy on --listen-client-urls http://localhost:8002 --initial-cluster 'default=http://localhost:2380,default=http://localhost:7001'
2015-09-23 09:22:55.166076 I | etcdmain: etcd Version: 2.2.0+git
2015-09-23 09:22:55.166164 I | etcdmain: Git SHA: 89acdd6
2015-09-23 09:22:55.166178 I | etcdmain: Go Version: go1.4.2
2015-09-23 09:22:55.166195 I | etcdmain: Go OS/Arch: linux/amd64
2015-09-23 09:22:55.166210 I | etcdmain: setting maximum number of CPUs to 1, total number of available CPUs is 4
2015-09-23 09:22:55.166235 W | etcdmain: no data-dir provided, using default data-dir ./default.etcd
2015-09-23 09:22:55.166511 I | etcdmain: proxy: using peer urls [http://localhost:2380 http://localhost:7001] 
2015-09-23 09:22:55.167775 E | etcdmain: failed to notify systemd for readiness
2015-09-23 09:22:55.167821 I | etcdmain: proxy: listening for client requests on http://localhost:8002
2015-09-23 09:24:56.652827 I | proxy: failed to direct request to http://localhost:4001: dial tcp 127.0.0.1:4001: connection refused
2015-09-23 09:24:56.652855 I | proxy: marked endpoint http://localhost:4001 unavailable
2015-09-23 09:24:56.653014 I | proxy: failed to direct request to http://localhost:2379: dial tcp 127.0.0.1:2379: connection refused
2015-09-23 09:24:56.653026 I | proxy: marked endpoint http://localhost:2379 unavailable
2015-09-23 09:24:56.653040 I | proxy: unable to get response from 2 endpoint(s)
2015-09-23 09:24:57.654024 I | proxy: zero endpoints currently available
2015-09-23 09:24:58.654589 I | proxy: zero endpoints currently available
2015-09-23 09:24:59.655133 I | proxy: zero endpoints currently available
2015-09-23 09:25:00.656550 I | proxy: zero endpoints currently available
2015-09-23 09:25:01.653012 I | proxy: marked endpoint http://localhost:4001 available
2015-09-23 09:25:01.653221 I | proxy: marked endpoint http://localhost:2379 available
2015-09-23 09:25:01.657350 I | proxy: failed to direct request to http://localhost:4001: dial tcp 127.0.0.1:4001: connection refused
2015-09-23 09:25:01.657384 I | proxy: marked endpoint http://localhost:4001 unavailable
2015-09-23 09:25:01.657536 I | proxy: failed to direct request to http://localhost:2379: dial tcp 127.0.0.1:2379: connection refused
2015-09-23 09:25:01.657559 I | proxy: marked endpoint http://localhost:2379 unavailable
2015-09-23 09:25:01.657571 I | proxy: unable to get response from 2 endpoint(s)
2015-09-23 09:25:02.658089 I | proxy: zero endpoints currently available
2015-09-23 09:25:03.658649 I | proxy: zero endpoints currently available
2015-09-23 09:25:04.659187 I | proxy: zero endpoints currently available
2015-09-23 09:25:05.659737 I | proxy: zero endpoints currently available
2015-09-23 09:25:06.657566 I | proxy: marked endpoint http://localhost:4001 available
2015-09-23 09:25:06.657639 I | proxy: marked endpoint http://localhost:2379 available
2015-09-23 09:25:06.660500 I | proxy: failed to direct request to http://localhost:4001: dial tcp 127.0.0.1:4001: connection refused
2015-09-23 09:25:06.660535 I | proxy: marked endpoint http://localhost:4001 unavailable
2015-09-23 09:25:06.660684 I | proxy: failed to direct request to http://localhost:2379: dial tcp 127.0.0.1:2379: connection refused
2015-09-23 09:25:06.660702 I | proxy: marked endpoint http://localhost:2379 unavailable
2015-09-23 09:25:06.660714 I | proxy: unable to get response from 2 endpoint(s)
2015-09-23 09:25:07.661248 I | proxy: zero endpoints currently available
2015-09-23 09:25:08.661791 I | proxy: zero endpoints currently available
2015-09-23 09:25:09.662331 I | proxy: zero endpoints currently available
2015-09-23 09:25:10.662861 I | proxy: zero endpoints currently available
2015-09-23 09:25:11.660711 I | proxy: marked endpoint http://localhost:4001 available
2015-09-23 09:25:11.660912 I | proxy: marked endpoint http://localhost:2379 available
2015-09-23 09:25:11.663666 I | proxy: failed to direct request to http://localhost:4001: dial tcp 127.0.0.1:4001: connection refused
2015-09-23 09:25:11.663683 I | proxy: marked endpoint http://localhost:4001 unavailable
2015-09-23 09:25:11.663832 I | proxy: failed to direct request to http://localhost:2379: dial tcp 127.0.0.1:2379: connection refused
2015-09-23 09:25:11.663847 I | proxy: marked endpoint http://localhost:2379 unavailable
2015-09-23 09:25:11.663863 I | proxy: unable to get response from 2 endpoint(s)
2015-09-23 09:25:12.664438 I | proxy: zero endpoints currently available
2015-09-23 09:25:13.664995 I | proxy: zero endpoints currently available
2015-09-23 09:25:14.665557 I | proxy: zero endpoints currently available
2015-09-23 09:25:15.666097 I | proxy: zero endpoints currently available
2015-09-23 09:25:16.663843 I | proxy: marked endpoint http://localhost:4001 available
2015-09-23 09:25:16.664041 I | proxy: marked endpoint http://localhost:2379 available
2015-09-23 09:25:16.666930 I | proxy: failed to direct request to http://localhost:4001: dial tcp 127.0.0.1:4001: connection refused
2015-09-23 09:25:16.666950 I | proxy: marked endpoint http://localhost:4001 unavailable
2015-09-23 09:25:16.667087 I | proxy: failed to direct request to http://localhost:2379: dial tcp 127.0.0.1:2379: connection refused
2015-09-23 09:25:16.667101 I | proxy: marked endpoint http://localhost:2379 unavailable
2015-09-23 09:25:16.667112 I | proxy: unable to get response from 2 endpoint(s)
2015-09-23 09:25:17.667707 I | proxy: zero endpoints currently available
2015-09-23 09:25:18.668332 I | proxy: zero endpoints currently available
2015-09-23 09:25:19.669113 I | proxy: zero endpoints currently available
2015-09-23 09:25:20.669719 I | proxy: zero endpoints currently available
2015-09-23 09:25:21.667121 I | proxy: marked endpoint http://localhost:4001 available
2015-09-23 09:25:21.667299 I | proxy: marked endpoint http://localhost:2379 available
2015-09-23 09:25:21.670711 I | proxy: failed to direct request to http://localhost:4001: dial tcp 127.0.0.1:4001: connection refused
2015-09-23 09:25:21.670738 I | proxy: marked endpoint http://localhost:4001 unavailable
2015-09-23 09:25:21.670878 I | proxy: failed to direct request to http://localhost:2379: dial tcp 127.0.0.1:2379: connection refused
2015-09-23 09:25:21.670893 I | proxy: marked endpoint http://localhost:2379 unavailable
2015-09-23 09:25:21.670915 I | proxy: unable to get response from 2 endpoint(s)
2015-09-23 09:25:22.671437 I | proxy: zero endpoints currently available
2015-09-23 09:25:23.672617 I | proxy: zero endpoints currently available
2015-09-23 09:25:24.673179 I | proxy: zero endpoints currently available
2015-09-23 09:25:25.173348 W | etcdserver: could not get cluster response from http://localhost:2380: Get http://localhost:2380/members: dial tcp 127.0.0.1:2380: connection refused
2015-09-23 09:25:25.173485 W | etcdserver: could not get cluster response from http://localhost:7001: Get http://localhost:7001/members: dial tcp 127.0.0.1:7001: connection refused
2015-09-23 09:25:25.173499 W | etcdmain: proxy: could not retrieve cluster information from the given urls
2015-09-23 09:25:25.673707 I | proxy: zero endpoints currently available
2015-09-23 09:25:26.670882 I | proxy: marked endpoint http://localhost:4001 available
2015-09-23 09:25:26.671082 I | proxy: marked endpoint http://localhost:2379 available
2015-09-23 09:25:26.674240 I | proxy: zero endpoints currently available
2015-09-23 09:25:27.674785 I | proxy: zero endpoints currently available
2015-09-23 09:25:28.675317 I | proxy: zero endpoints currently available
2015-09-23 09:25:29.675892 I | proxy: zero endpoints currently available
2015-09-23 09:25:30.676479 I | proxy: zero endpoints currently available
2015-09-23 09:25:31.677025 I | proxy: zero endpoints currently available
2015-09-23 09:25:32.677583 I | proxy: zero endpoints currently available
2015-09-23 09:25:33.678114 I | proxy: zero endpoints currently available
2015-09-23 09:25:34.678676 I | proxy: zero endpoints currently available
2015-09-23 09:25:35.679222 I | proxy: zero endpoints currently available
2015-09-23 09:25:36.679781 I | proxy: zero endpoints currently available
2015-09-23 09:25:37.680309 I | proxy: zero endpoints currently available
2015-09-23 09:25:38.680848 I | proxy: zero endpoints currently available
2015-09-23 09:25:39.681380 I | proxy: zero endpoints currently available
2015-09-23 09:25:40.681958 I | proxy: zero endpoints currently available
2015-09-23 09:25:41.682491 I | proxy: zero endpoints currently available
2015-09-23 09:25:42.683049 I | proxy: zero endpoints currently available
2015-09-23 09:25:43.683595 I | proxy: zero endpoints currently available
^C2015-09-23 09:25:44.362578 N | osutil: received interrupt signal, shutting down...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood the meaning of refresh interval, and should use failure-wait time instead. Let rephrase what I said.

proxy keeps some cached endpoints, and set availability on them to avoid meaningless retry. At the meantime, it refreshes the endpoints every refresh interval.

So the endpoint attempt workflow is

  1. proxy fails to get response from the endpoint
  2. proxy marks endpoint as unavailable
  3. proxy will not try the endpoint in failure-wait time
  4. proxy marks endpoint as available to hope it back online

The refresh interval is a separate thing. It is used to refresh cached endpoints by fetching the advertised client urls from etcd cluster. The proxy in example may cache endpoint for refresh interval(30s). When it finds that it cannot refresh from etcd cluster, it thinks no endpoint exists at this point, and clears the cache that may be available to redirect requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the log entry, removed the confused refresh interval. Please review again.

@yichengq
Copy link
Contributor

@gouyang And could you use proxy: improve log for retrying an unavailable endpoint as commit title? Ref: https://github.com/coreos/etcd/blob/master/CONTRIBUTING.md#format-of-the-commit-message

Fixes #3541

Signed-off-by: Guohua ouyang <guohuaouyang@gmail.com>
@yichengq
Copy link
Contributor

LGTM. Thanks for the PR and contribution!

yichengq added a commit that referenced this pull request Sep 25, 2015
Improve proxy log for retrying an unavailable endpoint
@yichengq yichengq merged commit dff702b into etcd-io:master Sep 25, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants