-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nginx-ingress pod crashloop if it can not redownload GeoIP2 database #8059
Comments
@jsalatiel: This issue is currently awaiting triage. If Ingress contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-kind bug Please add more information. It will help to elaborate why its a bug in the ingress-nginx controller, if download of the controller image or the download of a dependency like geoip db is broken due to firewall |
The container image is download and run, but in the logs it shows failing downloading the geoip db and thus the container will never be in ready state. |
If your firewall is blocking the download, its obviously not a bug in the controller. Can you elaborate what you want the controller to do if your firewall is blocking the download of the geoip db. |
Hi @longwuyuan , sorry but I probably not expressed myself the right way. I used the firewall example just to show how the bug could be replicated. I have no firewall blocking that. |
That wil require some research as to how maxmind database use & updates
work in the controller. I think your expectation could be valid but I
don't know for sure. Would you be able to submit a PR.
Thanks,
; Long Wu Yuan
…On 12/24/21 7:11 PM, jsalatiel wrote:
Hi @longwuyuan <https://github.com/longwuyuan> , sorry but I probably
not expressed myself the right way. I used the firewall example just
to show how the bug could be replicated. I have no firewall blocking that.
The real problem is: In the last couple months, for 3 times the
download.maxmind.com URL was down while I was doing some maintenance
in my Cluster and due to that the nginx controller was not able to
start EVEN though it had already downloaded the databases on previous
run and they were saved on a persistent volume. If a previous database
exists AND nginx can not update those database on container start,
that should not be fatal IMHO.
—
Reply to this email directly, view it on GitHub
<#8059 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGZVWSRF7XPN477E2GG3KLUSR2ABANCNFSM5KNTBWHA>.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@theunrealgeek, have you worked with maxmind geoip db internals before |
But I think the problem here is some place else, probably in the controller code before nginx starts up where the initial download is happening. I haven’t looked at that part of the code yet. |
Updating the GeoIP db seems to be in the critical path and I suppose
that is absolutely correct because the alternative is to bring up the
controller with a potentially outdated GeoIP db.
Then its like running the controller with both a updated GeoIP db if
possible and ALSO with a outdated GeoIP db as a alternative which is
basically breaking all the features which depend on GeoIP.
Personally I don't think its a bug at all and just like the network
being healthy being a requirement, the update of GeoIP db on each init
is a hard requirement and a fail on that should stop the controller from
running.
But all above are just opinions and I could be completely wrong as I
have no exposure to GeoIP.
Thanks,
; Long Wu Yuan
…On 12/25/21 10:25 AM, Aditya Kamath wrote:
have you worked with maxmind geoip db internals before
Not really with the internals of Maxmind db itself.
But I think the problem here is some place else, probably in the
controller code before nginx starts up where the initial download is
happening. I haven’t looked at that part of the code yet.
—
Reply to this email directly, view it on GitHub
<#8059 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGZVWVLCOJOTZIESTGERELUSVFDZANCNFSM5KNTBWHA>.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I don't think running with a potentially outdated geoip is really a problem. If that geoip is there in the persistent volume it means that it is the one that was actively in use before the nginx pod gets restarted/killed. Also since there is a database there, any nginx rules relying on geoip will work anyways. And this is much better than being unable to recover your cluster after a node failure ( without disabling geoip download) if maxmind ( external site ) is offline. Of course this is also just my opinion. |
There are multiple aspects here.
(1) Running the controller only after a fresh download of GeoIP-DB,
hence ensuring the latest iplist.
(2) Attempting a update of the GeoIP-DB.
(3) Running the pod even on a failed attempt, with a assumption that the
previous successful update attempt is good enough to continue.
(3) Existence or absense of a copy of the latest updated GeoIP-DB in
some accessible location.
I too can not code so I can't attempt a PR here. @theunrealgeek will
help out with thoughts on the code.
I am not in support of changing code, when its not broken. So far there
is no data to show GeoIP-DB related broken controller for multiple users
of GeoIP-DB, because no other user has reported it.
Thanks,
; Long Wu Yuan
…On 12/25/21 6:26 PM, jsalatiel wrote:
I don't think running with a potentially outdated geoip is really a
problem. If that geoip is there in the persistent volume it means that
it is the one that was actively in use before the nginx pod gets
restarted/killed. Also since there is a database there, any nginx
rules relying on geoip will work anyways. And this is much better than
being unable to recover your cluster after a node failure ( without
disabling geoip download) if maxmind ( external site ) is offline. Of
course this is also just my opinion.
I cant code to submit a PR.
—
Reply to this email directly, view it on GitHub
<#8059 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGZVWWJRLXOQ33CEH3O5GDUSW5P7ANCNFSM5KNTBWHA>.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Running into this same issue when running locally for dev after upgrading from This issue started in release
|
The current behavior is to fatal when Maxmind license is defined and the database cannot be downloaded. There are various reasons this is desirable, such as compliance requirements that certain countries be blocked, etc. One way to work around a Maxmind outage is to run a Maxmind mirror or caching proxy for your organization. This is recommended as a best practice by Maxmind to minimize load on their infrastructure. It works well for us. Another solution could be some way for you to declare that download failures are allowed in your systems, or allowed only when a file is already present. Maybe a new config option could be added to allow you to define this policy. |
I don't believe that, when the database was unable to be downloaded, nginx would disable Maxmind in This My post #8059 (comment) does not show a controlled exit of the process, but a full crash and stacktrace. Why would a stacktrace be the desired behavior? |
There was no retry attempt in your test case because
It does indeed make a controlled call to ingress-nginx/cmd/nginx/main.go Lines 66 to 67 in 2aa3420
Stacktraces are desirable so that you can trace the problem back in the code. Sometimes reading the relevant line of code can help you understand how to change your config to a valid one. What did you think about my idea to allow configuring a Maxmind DB failure policy? |
"Another solution could be some way for you to declare that download failures are allowed in your systems, or allowed only when a file is already present. Maybe a new config option could be added to allow you to define this policy." For me the best solution would be this one. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
NGINX Ingress controller version : 1.1.0
Kubernetes version : v1.21.7
Environment:
uname -a
): 5.4.0-91-genericWhat happened:
If for any reason during the pod start nginx-ingress-controller fails to download the geoip2 database from https://download.maxmind.com/app/geoip_download... the pod will crashloop even if there is a previous database already downloaded and present in /etc/nginx/geoip using a persistent volume.
What you expected to happen:
It should fail to start only if it can not download the geoip2 database AND it can not find a previous downloaded database on /etc/nginx/geoip. In the current way, If there is any problem with maxmind website, the ingresses will refuse to start.
How to reproduce it:
/kind bug
The text was updated successfully, but these errors were encountered: