Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The new v1.0.0 IngressClass handling logic makes a zero-downtime Ingress controller upgrade hard for the users #7502

Closed
janosi opened this issue Aug 17, 2021 · 7 comments · Fixed by #7609
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Milestone

Comments

@janosi
Copy link
Contributor

janosi commented Aug 17, 2021

NGINX Ingress controller version: v1.0.0 vs 0.4x

Kubernetes version (use kubectl version):
1.19, 1.20, 1.21, 1.22

Environment:

  • Cloud provider or hardware configuration: Not relevant, the problem is generic for all users

  • OS (e.g. from /etc/os-release): Not relevant

  • Kernel (e.g. uname -a): Not relevant

  • Install tools: Not relevant

  • Basic cluster related info:

    • kubectl version 1.20, any kubectl that supports v1 Ingress
  • How was the ingress-nginx-controller installed: Not relevant

What happened:

The 0.4x version of ingress-nginx-controller require that the ingressClass.spec.controller has a fixed value: k8s.io/ingress-nginx. In order to shard Ingresses between two ingress-nginx-controller deployments there must be 2 IngressClasses in the following form:

metadata.Name: C
spec.Controller: "k8s.io/ingress-nginx"

metadata.Name: D
spec.Controller: "k8s.io/ingress-nginx"

The The 0.4x version of ingress-nginx-controller uses the metadata.Name field of the IngressClass to identify which Ingresses it should process.

There were two ingress-nginx-controller deployments on the cluster with version 0.48.1. Controller A was configured to watch IngressClass C. Controller B was controlled to watch IngressClass D.

An Ingress resource refers to an IngressClass (and thus to a processing ingress-nginx-controller instance) via its ingress.spec.ingressClassName field. Ingress E had ingress.spec.ingressClassName=C, Ingress F had ingress.spec.ingressClassName=D on the cluster.

As the result, Controller A processed Ingress E, and Controller B processed Ingress F. The two controllers did not process the other Ingress. This was OK, Ingresses were shared between the controllers as expected. This setup is used since 1.18.

A new v1.0.0-beta2 ingress-nginx-controller instance was deployed on the same cluster to replace Controller A in the long run. I wanted to run the old and new controllers parallel to test and to verify that the new controller works OK. But the new 1.0.0-beta2 ingress-nginx-controller processed both Ingress E and Ingress F immediately.

That is, the new v1.0.0 controller cannot differentiate Ingresses based on the exisitng IngressClasses and the Ingresses that refer to those IngressClasses.

What you expected to happen:

As a user I would like to re-use at least my existing Ingresses during an upgrade to v1.0.0. I have v1 Ingresses since 1.19, so the restriction that the new ingress controller supports only v1 Ingresses does not affect me. For this reason I expect that my existing Ingresses are OK.

But it is not possible with the new IngressClass handling logic in v1.0.0. I should create new IngressClasses for v1.0.0 on my cluster, and if I want to run the old (e.g. v0.48.1) and new (v1.0.0) controllers parallel on the same cluster I also have to duplicate my Ingresses, and configure the new ones to refer to the new IngressClasses.

The root cause is, that the v1.0.0 Ingress controller uses the ingressClass.spec.controller field to identify the IngressClasses that it owns. And because the old IngressClasses must have the value k8s.io/ingress-nginx in that field the v1.0.0. controller will process all old IngressClasses on the cluster with that controller value.

How to reproduce it:

  • Create 2 IngressClass resources on the cluster so, that their .spec.controller field has the value k8s.io/ingress-nginx
  • Deploy 2 v0.48.1 ingress-nginx-controllers on the same cluster so, that they have different ingress-class parameter. One deployment should refer to the first IngressClass like --ingress-class=<ingressclass_1.metadata.name>. The other deployment should refer to the second IngressClass like --ingress-class=<ingressclass_2.metadata.name>.
  • Create 2 Ingresses. One of them should refer to the first IngressClass in its .spec.IngressClassName. The other Ingress shall refer to the other IngressClass.
  • Verify that the ingress-nginx-controllers process only one of the Ingresses. Verify that the controllers process the Ingress that refers to the right IngressClass
  • Deploy a new v1.0.0 ingress-nginx-controller on the cluster
  • The new v1.0.0 ingress-nginx-controller processes both Ingresses

Anything else we need to know:

Slack discussion: https://kubernetes.slack.com/archives/CANQGM8BA/p1629105520296900

/kind bug

@janosi janosi added the kind/bug Categorizes issue or PR as related to a bug. label Aug 17, 2021
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Aug 17, 2021
@k8s-ci-robot
Copy link
Contributor

@janosi: This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@janosi
Copy link
Contributor Author

janosi commented Aug 17, 2021

@rikatz @strongjz Please check this issue. I created it according to our agreement on the SIG Network Ingress meeting on 17th August 2021.

@rikatz rikatz added this to the v1.0.1 milestone Aug 21, 2021
@rikatz
Copy link
Contributor

rikatz commented Aug 22, 2021

So, my proposal is:

  • If we prioritize the annotation over the class, does this work? If so, the correction is easy and we can in a future feature release (like 1.1 or 1.2) change this behavior
  • If not, we should think in a better approach. In my personal opinion, rolling back to old ingress class behavior (watching the name instead of the controller spec) is doable, but would require a huge amount of effort.

So with the first proposal, you can for example:

  • Annotate every ingress object with something like "nginx"
  • Say to the controller "watch every ingress object that have the annotation of ingressclass = bla"

Does this works?

@strongjz @tao12345666333 wdyt?

@janosi
Copy link
Contributor Author

janosi commented Aug 23, 2021

I am afraid, prioritization of the annotation would not help :( The annotation has been in deprecated state since 1.18 and for that reason many users have started using the IngressClassName field instead. Those Ingresses do not have the annotation. That is, we would end up in requesting a mass Ingress update from the users, which update is not a long term solution either, as the annotation is deprecated.

I wonder about that huge effort: I am not 100% familiar with the code. My understanding so far was, that the controller watches for the IngressClass resources based on the spec.Controller field, but after that everything works based on the metadata.Name field.
So far my understanding was that these checks should be changed to check metadata.Name instead of spec.Controller

if ingressclass.Spec.Controller != icConfig.Controller {

if ingressclass.Spec.Controller != icConfig.Controller {

if cic.Spec.Controller != icConfig.Controller {

Another alternative I can think of is:

  • a new config option is introduced in a previous 0.4x version of the controller. With the new option a user can tell the controller to ignore the ingressClass.spec.Controller field value. On this way the 0.4x controller would not check that spec.Controller is "k8s.io/ingress-nginx".
  • once the new 0.4x controller is deployed the user can change the ingressClass.spec.Controller value so, that that field has different value in the different IngressClass instances. I.e. the IngressClass becomes compatible with the v1.0.0 Ingress controller
  • after these steps the user can start updating the controller to v1.0.0

@benjamin-tucker
Copy link

Doing this worked for me. Superficially, fixed all broken old & new ingresses on my dev system before too many people noticed - YMMV.

controller:
  extraArgs:
    ingress-class: nginx-private

The process in the container is:

     6 www-data  0:01 /nginx-ingress-controller --election-id=ingress-controller-leader --controller-class=k8s.io/nginx-private --configmap=kube-system-ingress/kube-system-ingress-internal-controller --ingress-class=nginx-private

@rikatz
Copy link
Contributor

rikatz commented Sep 7, 2021

@janosi PTAL #7609

Now you can specify --ingress-class-by-name and it will watch for ingressClasses by name (I guess this was the old behavior) but also by .spec.controller if it cannot find by name and the flag is enabled.

@alexvaque
Copy link

alexvaque commented Sep 15, 2021

Great! As @benjamin-tucker mentioned here, I used the "controller.extraArgs" and works for me too

controller:
  extraArgs:
    ingress-class: nginx-private

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants