The new v1.0.0 IngressClass handling logic makes a zero-downtime Ingress controller upgrade hard for the users #7502

janosi · 2021-08-17T17:29:11Z

NGINX Ingress controller version: v1.0.0 vs 0.4x

Kubernetes version (use kubectl version):
1.19, 1.20, 1.21, 1.22

Environment:

Cloud provider or hardware configuration: Not relevant, the problem is generic for all users
OS (e.g. from /etc/os-release): Not relevant
Kernel (e.g. uname -a): Not relevant
Install tools: Not relevant
Basic cluster related info:
- kubectl version 1.20, any kubectl that supports v1 Ingress
How was the ingress-nginx-controller installed: Not relevant

What happened:

The 0.4x version of ingress-nginx-controller require that the ingressClass.spec.controller has a fixed value: k8s.io/ingress-nginx. In order to shard Ingresses between two ingress-nginx-controller deployments there must be 2 IngressClasses in the following form:

metadata.Name: C
spec.Controller: "k8s.io/ingress-nginx"

metadata.Name: D
spec.Controller: "k8s.io/ingress-nginx"

The The 0.4x version of ingress-nginx-controller uses the metadata.Name field of the IngressClass to identify which Ingresses it should process.

There were two ingress-nginx-controller deployments on the cluster with version 0.48.1. Controller A was configured to watch IngressClass C. Controller B was controlled to watch IngressClass D.

An Ingress resource refers to an IngressClass (and thus to a processing ingress-nginx-controller instance) via its ingress.spec.ingressClassName field. Ingress E had ingress.spec.ingressClassName=C, Ingress F had ingress.spec.ingressClassName=D on the cluster.

As the result, Controller A processed Ingress E, and Controller B processed Ingress F. The two controllers did not process the other Ingress. This was OK, Ingresses were shared between the controllers as expected. This setup is used since 1.18.

A new v1.0.0-beta2 ingress-nginx-controller instance was deployed on the same cluster to replace Controller A in the long run. I wanted to run the old and new controllers parallel to test and to verify that the new controller works OK. But the new 1.0.0-beta2 ingress-nginx-controller processed both Ingress E and Ingress F immediately.

That is, the new v1.0.0 controller cannot differentiate Ingresses based on the exisitng IngressClasses and the Ingresses that refer to those IngressClasses.

What you expected to happen:

As a user I would like to re-use at least my existing Ingresses during an upgrade to v1.0.0. I have v1 Ingresses since 1.19, so the restriction that the new ingress controller supports only v1 Ingresses does not affect me. For this reason I expect that my existing Ingresses are OK.

But it is not possible with the new IngressClass handling logic in v1.0.0. I should create new IngressClasses for v1.0.0 on my cluster, and if I want to run the old (e.g. v0.48.1) and new (v1.0.0) controllers parallel on the same cluster I also have to duplicate my Ingresses, and configure the new ones to refer to the new IngressClasses.

The root cause is, that the v1.0.0 Ingress controller uses the ingressClass.spec.controller field to identify the IngressClasses that it owns. And because the old IngressClasses must have the value k8s.io/ingress-nginx in that field the v1.0.0. controller will process all old IngressClasses on the cluster with that controller value.

How to reproduce it:

Create 2 IngressClass resources on the cluster so, that their .spec.controller field has the value k8s.io/ingress-nginx
Deploy 2 v0.48.1 ingress-nginx-controllers on the same cluster so, that they have different ingress-class parameter. One deployment should refer to the first IngressClass like --ingress-class=<ingressclass_1.metadata.name>. The other deployment should refer to the second IngressClass like --ingress-class=<ingressclass_2.metadata.name>.
Create 2 Ingresses. One of them should refer to the first IngressClass in its .spec.IngressClassName. The other Ingress shall refer to the other IngressClass.
Verify that the ingress-nginx-controllers process only one of the Ingresses. Verify that the controllers process the Ingress that refers to the right IngressClass
Deploy a new v1.0.0 ingress-nginx-controller on the cluster
The new v1.0.0 ingress-nginx-controller processes both Ingresses

Anything else we need to know:

Slack discussion: https://kubernetes.slack.com/archives/CANQGM8BA/p1629105520296900

/kind bug

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2021-08-17T17:29:18Z

@janosi: This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

janosi · 2021-08-17T17:30:36Z

@rikatz @strongjz Please check this issue. I created it according to our agreement on the SIG Network Ingress meeting on 17th August 2021.

rikatz · 2021-08-22T18:55:09Z

So, my proposal is:

If we prioritize the annotation over the class, does this work? If so, the correction is easy and we can in a future feature release (like 1.1 or 1.2) change this behavior
If not, we should think in a better approach. In my personal opinion, rolling back to old ingress class behavior (watching the name instead of the controller spec) is doable, but would require a huge amount of effort.

So with the first proposal, you can for example:

Annotate every ingress object with something like "nginx"
Say to the controller "watch every ingress object that have the annotation of ingressclass = bla"

Does this works?

@strongjz @tao12345666333 wdyt?

janosi · 2021-08-23T15:35:56Z

I am afraid, prioritization of the annotation would not help :( The annotation has been in deprecated state since 1.18 and for that reason many users have started using the IngressClassName field instead. Those Ingresses do not have the annotation. That is, we would end up in requesting a mass Ingress update from the users, which update is not a long term solution either, as the annotation is deprecated.

I wonder about that huge effort: I am not 100% familiar with the code. My understanding so far was, that the controller watches for the IngressClass resources based on the spec.Controller field, but after that everything works based on the metadata.Name field.
So far my understanding was that these checks should be changed to check metadata.Name instead of spec.Controller

ingress-nginx/internal/ingress/controller/store/store.go

Line 429 in 557604f

if ingressclass.Spec.Controller != icConfig.Controller {

ingress-nginx/internal/ingress/controller/store/store.go

Line 446 in 557604f

if ingressclass.Spec.Controller != icConfig.Controller {

ingress-nginx/internal/ingress/controller/store/store.go

Line 463 in 557604f

if cic.Spec.Controller != icConfig.Controller {

Another alternative I can think of is:

a new config option is introduced in a previous 0.4x version of the controller. With the new option a user can tell the controller to ignore the ingressClass.spec.Controller field value. On this way the 0.4x controller would not check that spec.Controller is "k8s.io/ingress-nginx".
once the new 0.4x controller is deployed the user can change the ingressClass.spec.Controller value so, that that field has different value in the different IngressClass instances. I.e. the IngressClass becomes compatible with the v1.0.0 Ingress controller
after these steps the user can start updating the controller to v1.0.0

benjamin-tucker · 2021-08-25T11:09:45Z

Doing this worked for me. Superficially, fixed all broken old & new ingresses on my dev system before too many people noticed - YMMV.

controller:
  extraArgs:
    ingress-class: nginx-private

The process in the container is:

     6 www-data  0:01 /nginx-ingress-controller --election-id=ingress-controller-leader --controller-class=k8s.io/nginx-private --configmap=kube-system-ingress/kube-system-ingress-internal-controller --ingress-class=nginx-private

rikatz · 2021-09-07T19:49:08Z

@janosi PTAL #7609

Now you can specify --ingress-class-by-name and it will watch for ingressClasses by name (I guess this was the old behavior) but also by .spec.controller if it cannot find by name and the flag is enabled.

alexvaque · 2021-09-15T16:28:34Z

Great! As @benjamin-tucker mentioned here, I used the "controller.extraArgs" and works for me too

controller:
  extraArgs:
    ingress-class: nginx-private

Thank you

janosi added the kind/bug Categorizes issue or PR as related to a bug. label Aug 17, 2021

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Aug 17, 2021

k8s-ci-robot added the needs-priority label Aug 17, 2021

rikatz added this to the v1.0.1 milestone Aug 21, 2021

strongjz mentioned this issue Aug 24, 2021

release v1.0.0 #7535

Merged

8 tasks

longwuyuan mentioned this issue Aug 25, 2021

Second controller with controller-class configured listens to incorrect IngressClass #7538

Closed

rikatz mentioned this issue Sep 7, 2021

Add new flag to watch ingressclass by name #7609

Merged

2 tasks

k8s-ci-robot closed this as completed in #7609 Sep 10, 2021

Evesy mentioned this issue Oct 19, 2021

ingress-class-by-name not working as expected #7823

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The new v1.0.0 IngressClass handling logic makes a zero-downtime Ingress controller upgrade hard for the users #7502

The new v1.0.0 IngressClass handling logic makes a zero-downtime Ingress controller upgrade hard for the users #7502

janosi commented Aug 17, 2021

k8s-ci-robot commented Aug 17, 2021

janosi commented Aug 17, 2021

rikatz commented Aug 22, 2021

janosi commented Aug 23, 2021

benjamin-tucker commented Aug 25, 2021

rikatz commented Sep 7, 2021

alexvaque commented Sep 15, 2021 •

edited

Loading

The new v1.0.0 IngressClass handling logic makes a zero-downtime Ingress controller upgrade hard for the users #7502

The new v1.0.0 IngressClass handling logic makes a zero-downtime Ingress controller upgrade hard for the users #7502

Comments

janosi commented Aug 17, 2021

k8s-ci-robot commented Aug 17, 2021

janosi commented Aug 17, 2021

rikatz commented Aug 22, 2021

janosi commented Aug 23, 2021

benjamin-tucker commented Aug 25, 2021

rikatz commented Sep 7, 2021

alexvaque commented Sep 15, 2021 • edited Loading

alexvaque commented Sep 15, 2021 •

edited

Loading