Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kong Ingress Controller Chart] High Load on k8s API with KongUpstreamPolicy and Multiple Ingress Controllers #6270

Closed
1 task done
kirillfomichev opened this issue Jun 28, 2024 · 7 comments · Fixed by #6421
Assignees
Labels
bug Something isn't working release/required it is required that this be resolved before releasing
Milestone

Comments

@kirillfomichev
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

When using UpstreamPolicy in a Kubernetes cluster with two ingress controllers, there is a cyclic configuration overwrite. Both ingress controllers compete for the UpstreamPolicy resource, leading to high load on the k8s API.

Expected Behavior

The ingress controllers should handle the UpstreamPolicy resource without cyclic overwrites, avoiding unnecessary load on the k8s API.

Steps To Reproduce

- Set up a Kubernetes cluster
- Deploy two ingress controllers via helm chart.
- Set up debug-level
- Create a CRD UpstreamPolicy for the ingress controllers.
- Observe the cyclic configuration overwrite and high load on the k8s API.

Kong Ingress Controller version

Kong (version 3.6), KIC (version 3.2)

Kubernetes version

No response

Anything else?

No response

@kirillfomichev kirillfomichev added the bug Something isn't working label Jun 28, 2024
@dweber019
Copy link

dweber019 commented Jul 4, 2024

Problem:
In our case the status.ancestors.*.controllerName switches between the ingress controllers names.

Analysis:
I think the issue is with the following code https://github.com/Kong/kubernetes-ingress-controller/blob/main/internal/controllers/configuration/kongupstreampolicy_controller.go#L59-L110 where no check is done if the ingress controller is responsible for the KongUpstreamPolicy. For example by checking in https://github.com/Kong/kubernetes-ingress-controller/blob/main/internal/controllers/configuration/kongupstreampolicy_controller.go#L69 that not only the annotation exist but that the service is used by a HTTPRoute managed by the current ingress controller.
The only way this can be answered when the service is referenced by a Ingress or HTTPRoute where the current ingress controller is responsible for. Am I right here or are there other indicators?

This causes a lot of API request to k8s API and the Kong admin API as ALL ingress controllers in a single k8s cluster are fighting over the same resource, even when not responsible.

The reason why this anyway works when translating KongUpsteramPolicies, is as in https://github.com/Kong/kubernetes-ingress-controller/blob/main/internal/dataplane/kongstate/kongstate.go#L243 only services where the current ingress controller is responsible for are available and the service namespace is used at https://github.com/Kong/kubernetes-ingress-controller/blob/main/internal/dataplane/kongstate/kongupstreampolicy.go#L58 for fetching the KongUpstreamPolicy.

Suggestion:
I would extend the doesObjectReferUpstreamPolicy for example at https://github.com/Kong/kubernetes-ingress-controller/blob/main/internal/controllers/configuration/kongupstreampolicy_controller.go#L69 to check if the ingress controller is responsible for the Service or KongServiceFacade by checking the HTTPRoute or Ingress backendRefs relations to the current ingress controller over the annotation kubernetes.io/ingress.class or parentRefs.

Does this sound right?

This would leave at least one question open: If for some reason the reference to a KongUpstreamPolicy previously referenced by any thing (e.g. service) is removed, then the KongUpstreamPolicy status will be stale and wrong. But this issue also applies for the current stable version of KIC.

PR:
As #3745 isn't fixed yet, should I do a PR? I think it's a rather important bug and don't know what's the fastest approach.

@dweber019
Copy link

Don't know if the --watch-namespace from https://docs.konghq.com/kubernetes-ingress-controller/latest/reference/cli-arguments/ would help here, but a workaround for this could be to give the ingress controller only access the the namespace he should be responsible for k8s kinds like Service, HTTPRoute, KongServiceFacade, KongUpstreamPolicy.
Would not do it for KongUpstreamPolicy without the other kinds as this will probably fill up your logs with https://github.com/Kong/kubernetes-ingress-controller/blob/main/internal/controllers/configuration/kongupstreampolicy_controller.go#L229.

@dweber019
Copy link

We where now forced to take action but the suggested workaround by using ClusterRole and local RoleBinding did not work as the ingress controller has thrown errors

W0726 15:14:48.440265 1 reflector.go:547] pkg/mod/k8s.io/client-go@v0.30.2/tools/cache/reflector.go:232: failed to list *v1beta1.KongUpstreamPolicy: kongupstreampolicies.configuration.konghq.com is forbidden: User "system:serviceaccount:ch-kong-dev:kong-serviceaccount" cannot list resource "kongupstreampolicies" in API group "configuration.konghq.com" at the cluster scope
E0726 15:14:48.440320 1 reflector.go:150] pkg/mod/k8s.io/client-go@v0.30.2/tools/cache/reflector.go:232: Failed to watch *v1beta1.KongUpstreamPolicy: failed to list *v1beta1.KongUpstreamPolicy: kongupstreampolicies.configuration.konghq.com is forbidden: User "system:serviceaccount:ch-kong-dev:kong-serviceaccount" cannot list resource "kongupstreampolicies" in API group "configuration.konghq.com" at the cluster scope

We are testing now if we can use something like

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kong-ingress-namespaced-dev
rules:
  - apiGroups:
      - configuration.konghq.com
    resources:
      - kongupstreampolicies
    resourceNames:
      - ch-kong-dev-session-stickiness
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - configuration.konghq.com
    resources:
      - kongupstreampolicies/status
    resourceNames:
      - ch-kong-dev-session-stickiness
    verbs:
      - get
      - patch
      - update

and bind this role only to services accounts responsible for the resources. This is cumbersome and needs whitelist management of resources.

Getting back if this worked.

@dweber019
Copy link

The above workaround worked by removing the first resourceNames, in the end we had

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kong-ingress-namespaced-dev
rules:
  - apiGroups:
      - configuration.konghq.com
    resources:
      - kongupstreampolicies
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - configuration.konghq.com
    resources:
      - kongupstreampolicies/status
    verbs:
      - get
  - apiGroups:
      - configuration.konghq.com
    resources:
      - kongupstreampolicies/status
    resourceNames:
      - ch-kong-dev-session-stickiness
    verbs:
      - patch
      - update

This is still throwing errors like

{"level":"error","time":"2024-07-29T11:09:19Z","logger":"controllers.KongUpstreamPolicy","msg":"Reconciler error","reconcileID":"2998f359-de26-4e0a-bb06-680521652fd4","error":"kongupstreampolicies.configuration.konghq.com \"ch-kong-int-session-stickiness\" is forbidden: User \"system:serviceaccount:ch-kong-test:kong-serviceaccount\" cannot patch resource \"kongupstreampolicies/status\" in API group \"configuration.konghq.com\" in the namespace \"ch-kong-httpbin-int\""}

But this error happens only once every few minutes (around 5m) and the load on the k8s API server is reduced.

@randmonkey randmonkey self-assigned this Aug 6, 2024
@randmonkey
Copy link
Contributor

@dweber019 Thanks for pointing it out and your work on resolving it. I'll create the PR following your approach.

@dweber019
Copy link

Awesome thanks @randmonkey.
Could you like the PR to this issue?

@randmonkey
Copy link
Contributor

@dweber019 PR is here: #6421. We will include it in the upcoming KIC 3.3.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working release/required it is required that this be resolved before releasing
Projects
None yet
3 participants