-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: enable etcd health-check #4191
Conversation
…e apisix fail Signed-off-by: yiyiyimu <wosoyoung@gmail.com>
…nto fix/etcd-retry
…rks in stream mode Signed-off-by: yiyiyimu <wosoyoung@gmail.com>
…not been blocked Signed-off-by: yiyiyimu <wosoyoung@gmail.com>
Signed-off-by: yiyiyimu <wosoyoung@gmail.com>
Signed-off-by: yiyiyimu <wosoyoung@gmail.com>
Signed-off-by: yiyiyimu <wosoyoung@gmail.com>
Signed-off-by: yiyiyimu <wosoyoung@gmail.com>
Signed-off-by: yiyiyimu <wosoyoung@gmail.com>
Things get a bit weird right now 😕 Before I work on this fix, I test that if killing one node out of an etcd cluster, apisix would fail, as those related issues talk about. However, right now when I test this scenario again on the master branch, apisix would not be affected by the closed etcd node and runs normally. I checked the recent commits and it seems no changes have applied to this problem. Could someone help me to reproduce the program again Test running apisix and etcd in Kubernetes, delete one etcd node, and apisix runs normally with master branch. So don't know if anything went wrong yet. |
Waiting for lua-resty-etcd new release: api7/lua-resty-etcd#129 |
…nto fix/etcd-retry
Signed-off-by: yiyiyimu <wosoyoung@gmail.com>
…st etcd endpoint Signed-off-by: yiyiyimu <wosoyoung@gmail.com>
Signed-off-by: yiyiyimu <wosoyoung@gmail.com>
Signed-off-by: yiyiyimu <wosoyoung@gmail.com>
Signed-off-by: yiyiyimu <wosoyoung@gmail.com>
Waiting for api7/lua-resty-etcd#131 Done |
Signed-off-by: yiyiyimu <wosoyoung@gmail.com>
Signed-off-by: yiyiyimu <wosoyoung@gmail.com>
…oint choose problem in etcd (https://github.com/api7/lua-resty-etcd/pull/131\#discussion_r655804238) Signed-off-by: yiyiyimu <wosoyoung@gmail.com>
Signed-off-by: yiyiyimu <wosoyoung@gmail.com>
apisix/core/config_etcd.lua
Outdated
if string.find(err, err_etcd_unhealthy_all) then | ||
local reconnected = false | ||
while err and not reconnected do | ||
local backoff_duration, backoff_factor, backoff_step = 1, 2, 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will step 10
too large? As 1024 seconds is too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so familiar with the production environment. Do you have some recommendations, like 8, around 4 mins at most? @tokers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe 6
is enough, also, since we are in a timer, keep a Nginx timer living for a long while is not a good practice as it might cause the memory leaky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for suggestion!
Changed to use outside counter till 32
to avoid keeping nginx timer too long. PTAL @tokers
Signed-off-by: yiyiyimu <wosoyoung@gmail.com>
What this PR does / why we need it:
fix #3673
fix #3937.
Import health check of lua-resty-etcd, so
TODO:
Pre-submission checklist: