-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry to download maxmind DB if it fails #7242
Conversation
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Welcome @ssh2go! |
Hi @ssh2go. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
bf68878
to
be0e060
Compare
Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com>
/check-cla |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look into the suggestions and also sign the CLA
cmd/nginx/flags.go
Outdated
return false, nil, err | ||
} | ||
klog.InfoS("downloading maxmind GeoIP2 databases") | ||
if err := nginx.DownloadGeoLite2DB(); err != nil { | ||
klog.InfoS("trying to download maxmind GeoIP2 databases...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this change? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted
cmd/nginx/flags.go
Outdated
if err := nginx.DownloadGeoLite2DB(); err != nil { | ||
klog.InfoS("trying to download maxmind GeoIP2 databases...") | ||
defaultRetry := wait.Backoff{ | ||
Steps: 5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Let's make this configurable as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, see arg --maxmind-retries-count
. Default value is 1, any lower value will be changed to 1 as well
cmd/nginx/flags.go
Outdated
retries := 0 | ||
|
||
err = wait.ExponentialBackoff(defaultRetry, func() (bool, error) { | ||
err := nginx.DownloadGeoLite2DB() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we move all of this to DownloadGeoLite2DB instead of adding to this main function in flags.
ingress-nginx/internal/nginx/maxmind.go
Line 63 in b1c8e30
func DownloadGeoLite2DB() error { |
Also, I would like to keep here simple, if possible. I do not opose myself of changing DownloadGeoLite2DB signature to pass the delay and retries as arguments, but infact we should deal an error as an error and not rely too much on the syscall in this case :)
If we have an error and the error happens here, we should just return to the caller function an error downloading the database, and parse this on the inner functions IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved all the download retry logic into DownloadGeoLite2DB func
Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com>
Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com>
internal/nginx/maxmind.go
Outdated
return err | ||
} | ||
MaxmindEditionFiles = append(MaxmindEditionFiles, dbName+dbExtension) | ||
func DownloadGeoLite2DB(period time.Duration, attempts int) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the order of these two parameters should be reversed, IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
/check-cla |
internal/nginx/maxmind.go
Outdated
MaxmindEditionFiles = append(MaxmindEditionFiles, dbName+dbExtension) | ||
func DownloadGeoLite2DB(attempts int, period time.Duration) error { | ||
if attempts < 1 { | ||
attempts = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can set it to MaxmindRetriesCount
variable, instead of hard-coding it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that point. The only thing - we cannot use MaxmindRetriesCount
because we are filling that variable with the value of according command line argument, and using it later as a first parameter in func DownloadGeoLite2DB
. So I've added extra constant const minimumRetriesCount = 1
, to avoid the hardcoding in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 yep. I missed it.
internal/nginx/maxmind.go
Outdated
Jitter: 0.1, | ||
} | ||
if period == time.Duration(0) { | ||
defaultRetry.Steps = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ok-to-test
Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com>
/approve Just fix the conflict in go.sum, and once @tao12345666333 is happy he can issue a lgtm and this gets merged :) |
/assign
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for delay. I think it could be merged. Thanks!
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rikatz, ssh2go, tao12345666333 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Retry to download maxmind DB if it fails. Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Add retries count arg, move retry logic into DownloadGeoLite2DB function Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Reorder parameters in DownloadGeoLite2DB Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Remove hardcoded value Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com>
* Retry to download maxmind DB if it fails. Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Add retries count arg, move retry logic into DownloadGeoLite2DB function Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Reorder parameters in DownloadGeoLite2DB Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Remove hardcoded value Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com>
* Retry to download maxmind DB if it fails. Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Add retries count arg, move retry logic into DownloadGeoLite2DB function Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Reorder parameters in DownloadGeoLite2DB Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Remove hardcoded value Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com>
* Drop v1beta1 from ingress nginx (#7156) * Drop v1beta1 from ingress nginx Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix intorstr logic in controller Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * fixing admission Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * more intorstr fixing * correct template rendering Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix e2e tests for v1 api Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix gofmt errors * This is finally working...almost there... Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Re-add removed validation of AdmissionReview * Prepare for v1.0.0-alpha.1 release Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Update changelog and matrix table for v1.0.0-alpha.1 (#7274) Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * add docs for syslog feature (#7219) * Fix link to e2e-tests.md in developer-guide (#7201) * Use ENV expansion for namespace in args (#7146) Update the DaemonSet namespace references to use the `POD_NAMESPACE` environment variable in the same way that the Deployment does. * chart: using Helm builtin capabilities check (#7190) Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> * Update proper default value for HTTP2MaxConcurrentStreams in Docs (#6944) It should be 128 as documented in https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/controller/config/config.go#L780 * Fix MaxWorkerOpenFiles calculation on high cores nodes (#7107) * Fix MaxWorkerOpenFiles calculation on high cores nodes * Add e2e test for rlimit_nofile * Fix doc for max-worker-open-files * ingress/tcp: add additional error logging on failed (#7208) * Add file containing stable release (#7313) * Handle named (non-numeric) ports correctly (#7311) Signed-off-by: Carlos Panato <ctadeu@gmail.com> * Updated v1beta1 to v1 as its deprecated (#7308) * remove mercurial from build (#7031) * Retry to download maxmind DB if it fails (#7242) * Retry to download maxmind DB if it fails. Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Add retries count arg, move retry logic into DownloadGeoLite2DB function Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Reorder parameters in DownloadGeoLite2DB Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Remove hardcoded value Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Release v1.0.0-alpha.1 * Add changelog for v1.0.0-alpha.2 * controller: ignore non-service backends (#7332) * controller: ignore non-service backends Signed-off-by: Carlos Panato <ctadeu@gmail.com> * update per feedback Signed-off-by: Carlos Panato <ctadeu@gmail.com> * fix: allow scope/tcp/udp configmap namespace to altered (#7161) * Lower webhook timeout for digital ocean (#7319) * Lower webhook timeout for digital ocean * Set Digital Ocean value controller.admissionWebhooks.timeoutSeconds to 29 * update OWNERS and aliases files (#7365) (#7366) Signed-off-by: Carlos Panato <ctadeu@gmail.com> * Downgrade Lua modules for s390x (#7355) Downgrade Lua modules to last known working version. * Fix IngressClass logic for newer releases (#7341) * Fix IngressClass logic for newer releases Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Change e2e tests for the new IngressClass presence * Fix chart and admission tests Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix helm chart test Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix reviews * Remove ingressclass code from admission * update tag to v1.0.0-beta.1 * update readme and changelog for v1.0.0-beta.1 * Release v1.0.0-beta.1 - helm and manifests (#7422) * Change the order of annotation just to trigger a new helm release (#7425) * [cherry-pick] Add dev-v1 branch into helm releaser (#7428) * Add dev-v1 branch into helm releaser (#7424) * chore: add link for artifacthub.io/prerelease annotations Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> Co-authored-by: Ricardo Katz <rikatz@users.noreply.github.com> * k8s job ci pipeline for dev-v1 br v1.22.0 (#7453) * k8s job ci pipeline for dev-v1 br v1.22.0 Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com> * k8s job ci pipeline for dev-v1 br v1.21.2 Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com> * remove v1.21.1 version Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com> * Add controller.watchIngressWithoutClass config option (#7459) Signed-off-by: Akshit Grover <akshit.grover2016@gmail.com> * Release new helm chart with certgen fixed (#7478) * Update go version, modules and remove ioutil * Release new helm chart with certgen fixed * changed appversion, chartversion, TAG, image (#7490) * Fix CI conflict * Fix CI conflict * Fix build.sh from rebase process * Fix controller_test post rebase Co-authored-by: Tianhao Guo <rggth09@gmail.com> Co-authored-by: Ray <61553+rctay@users.noreply.github.com> Co-authored-by: Bill Cassidy <cassid4@gmail.com> Co-authored-by: Jintao Zhang <tao12345666333@163.com> Co-authored-by: Sathish Ramani <rsathishx87@gmail.com> Co-authored-by: Mansur Marvanov <nanorobocop@gmail.com> Co-authored-by: Matt1360 <568198+Matt1360@users.noreply.github.com> Co-authored-by: Carlos Tadeu Panato Junior <ctadeu@gmail.com> Co-authored-by: Kundan Kumar <kundan.kumar@india.nec.com> Co-authored-by: Tom Hayward <thayward@infoblox.com> Co-authored-by: Sergey Shakuto <sshakuto@infoblox.com> Co-authored-by: Tore <tore.lonoy@gmail.com> Co-authored-by: Bouke Versteegh <info@boukeversteegh.nl> Co-authored-by: Shahid <shahid@us.ibm.com> Co-authored-by: James Strong <strong.james.e@gmail.com> Co-authored-by: Long Wu Yuan <longwuyuan@gmail.com> Co-authored-by: Jintao Zhang <zhangjintao9020@gmail.com> Co-authored-by: Neha Lohia <nehapithadiya444@gmail.com> Co-authored-by: Akshit Grover <akshit.grover2016@gmail.com>
@ssh2go I believe this PR is causing issues, please see #8059 #8059 (comment) |
* Retry to download maxmind DB if it fails. Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Add retries count arg, move retry logic into DownloadGeoLite2DB function Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Reorder parameters in DownloadGeoLite2DB Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Remove hardcoded value Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com>
* Drop v1beta1 from ingress nginx (kubernetes#7156) * Drop v1beta1 from ingress nginx Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix intorstr logic in controller Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * fixing admission Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * more intorstr fixing * correct template rendering Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix e2e tests for v1 api Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix gofmt errors * This is finally working...almost there... Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Re-add removed validation of AdmissionReview * Prepare for v1.0.0-alpha.1 release Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Update changelog and matrix table for v1.0.0-alpha.1 (kubernetes#7274) Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * add docs for syslog feature (kubernetes#7219) * Fix link to e2e-tests.md in developer-guide (kubernetes#7201) * Use ENV expansion for namespace in args (kubernetes#7146) Update the DaemonSet namespace references to use the `POD_NAMESPACE` environment variable in the same way that the Deployment does. * chart: using Helm builtin capabilities check (kubernetes#7190) Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> * Update proper default value for HTTP2MaxConcurrentStreams in Docs (kubernetes#6944) It should be 128 as documented in https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/controller/config/config.go#L780 * Fix MaxWorkerOpenFiles calculation on high cores nodes (kubernetes#7107) * Fix MaxWorkerOpenFiles calculation on high cores nodes * Add e2e test for rlimit_nofile * Fix doc for max-worker-open-files * ingress/tcp: add additional error logging on failed (kubernetes#7208) * Add file containing stable release (kubernetes#7313) * Handle named (non-numeric) ports correctly (kubernetes#7311) Signed-off-by: Carlos Panato <ctadeu@gmail.com> * Updated v1beta1 to v1 as its deprecated (kubernetes#7308) * remove mercurial from build (kubernetes#7031) * Retry to download maxmind DB if it fails (kubernetes#7242) * Retry to download maxmind DB if it fails. Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Add retries count arg, move retry logic into DownloadGeoLite2DB function Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Reorder parameters in DownloadGeoLite2DB Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Remove hardcoded value Signed-off-by: Sergey Shakuto <sshakuto@infoblox.com> * Release v1.0.0-alpha.1 * Add changelog for v1.0.0-alpha.2 * controller: ignore non-service backends (kubernetes#7332) * controller: ignore non-service backends Signed-off-by: Carlos Panato <ctadeu@gmail.com> * update per feedback Signed-off-by: Carlos Panato <ctadeu@gmail.com> * fix: allow scope/tcp/udp configmap namespace to altered (kubernetes#7161) * Lower webhook timeout for digital ocean (kubernetes#7319) * Lower webhook timeout for digital ocean * Set Digital Ocean value controller.admissionWebhooks.timeoutSeconds to 29 * update OWNERS and aliases files (kubernetes#7365) (kubernetes#7366) Signed-off-by: Carlos Panato <ctadeu@gmail.com> * Downgrade Lua modules for s390x (kubernetes#7355) Downgrade Lua modules to last known working version. * Fix IngressClass logic for newer releases (kubernetes#7341) * Fix IngressClass logic for newer releases Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Change e2e tests for the new IngressClass presence * Fix chart and admission tests Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix helm chart test Signed-off-by: Ricardo Pchevuzinske Katz <ricardo.katz@gmail.com> * Fix reviews * Remove ingressclass code from admission * update tag to v1.0.0-beta.1 * update readme and changelog for v1.0.0-beta.1 * Release v1.0.0-beta.1 - helm and manifests (kubernetes#7422) * Change the order of annotation just to trigger a new helm release (kubernetes#7425) * [cherry-pick] Add dev-v1 branch into helm releaser (kubernetes#7428) * Add dev-v1 branch into helm releaser (kubernetes#7424) * chore: add link for artifacthub.io/prerelease annotations Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> Co-authored-by: Ricardo Katz <rikatz@users.noreply.github.com> * k8s job ci pipeline for dev-v1 br v1.22.0 (kubernetes#7453) * k8s job ci pipeline for dev-v1 br v1.22.0 Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com> * k8s job ci pipeline for dev-v1 br v1.21.2 Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com> * remove v1.21.1 version Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com> * Add controller.watchIngressWithoutClass config option (kubernetes#7459) Signed-off-by: Akshit Grover <akshit.grover2016@gmail.com> * Release new helm chart with certgen fixed (kubernetes#7478) * Update go version, modules and remove ioutil * Release new helm chart with certgen fixed * changed appversion, chartversion, TAG, image (kubernetes#7490) * Fix CI conflict * Fix CI conflict * Fix build.sh from rebase process * Fix controller_test post rebase Co-authored-by: Tianhao Guo <rggth09@gmail.com> Co-authored-by: Ray <61553+rctay@users.noreply.github.com> Co-authored-by: Bill Cassidy <cassid4@gmail.com> Co-authored-by: Jintao Zhang <tao12345666333@163.com> Co-authored-by: Sathish Ramani <rsathishx87@gmail.com> Co-authored-by: Mansur Marvanov <nanorobocop@gmail.com> Co-authored-by: Matt1360 <568198+Matt1360@users.noreply.github.com> Co-authored-by: Carlos Tadeu Panato Junior <ctadeu@gmail.com> Co-authored-by: Kundan Kumar <kundan.kumar@india.nec.com> Co-authored-by: Tom Hayward <thayward@infoblox.com> Co-authored-by: Sergey Shakuto <sshakuto@infoblox.com> Co-authored-by: Tore <tore.lonoy@gmail.com> Co-authored-by: Bouke Versteegh <info@boukeversteegh.nl> Co-authored-by: Shahid <shahid@us.ibm.com> Co-authored-by: James Strong <strong.james.e@gmail.com> Co-authored-by: Long Wu Yuan <longwuyuan@gmail.com> Co-authored-by: Jintao Zhang <zhangjintao9020@gmail.com> Co-authored-by: Neha Lohia <nehapithadiya444@gmail.com> Co-authored-by: Akshit Grover <akshit.grover2016@gmail.com>
We are using ingress-nginx in our project and in some cases we get the error like:
At the same time, when we are trying to download that database manually a bit later, all works fine. This is because network connection become available a bit later than service starts. This issue is particularly common when using a service mesh proxy sidecar such as linkerd.
What this PR does / why we need it:
We can add extra delay on start, or try to download the database few times with a short delay. The last approach is proposed in that fix. User can define the download timeout for maxmind database, and the service will try to download it if that operation has been failed first time. The result in logs looks like:
Types of changes
Which issue/s this PR fixes
How Has This Been Tested?
Tested it manually, also added new unit test
TestMaxmindRetryDownload
to cover the new code.Checklist: