Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kong routes requests to targets deleted via Kong Ingress controller #6312

Closed
1 task done
zackery-parkhurst opened this issue Jul 10, 2024 · 4 comments
Closed
1 task done
Labels
bug Something isn't working pending author feedback

Comments

@zackery-parkhurst
Copy link

zackery-parkhurst commented Jul 10, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Whenever a scale down event happens on a kubernetes pod from HPA scaling down or a deployment deleting pods in update strategy kong sends request to the pod after its already been deleted. Resulting in a request that hangs for 60 seconds until eventually kong cancels and responds with 504.

image

image

Notice in these logs you can see that the pod is deleted. Kong Ingress Controller updates the configuration in kong. But a request comes through in between that time and just hangs for 60 seconds.
image

Expected Behavior

Whenever I delete a pod, whether that happens through an HPA downscale event or a deployment. Kong should immediately no longer use the target ip.

However, there is a delay in the time from pod being deleted to kong updating its configuration. And if any requests come through during that time. They just hand and timeout.

Steps To Reproduce

1. Install Kong Ingress Controller
2. Create HPAs for a service and let it scale down
3. Have HPA scale down a services pods
4. Have a request come into kong after pod has been deleted but before kong updates its configuration. 
5. Witness Kong use stale ip

Kong Ingress Controller version

KIC image - kong/kubernetes-ingress-controller:3.0
Kong image - kong:3.5

Kubernetes version

Server Version: v1.28.9-eks-036c24b

Anything else?

The only other thing I would point out is that it appears kong does not sync the configuration until after the pod has been deleted. Which is odd as I though that kong watches the endpoint objects for changes. And the endpiont object is immediately updated the moment the pod is marked for deletion.

So theoretically kong should update itself before a pod is deleted. As a pod has a default termination grace period of 30 seconds. So when the pod is first marked for deletion the endpoint is immediately updated. Which should cause kong to start updating at that point.

And then on a final note. Regardless if kong is updating its configuration as soon as pod is marked for deletion. Or after its fully deleted still doesn't change the fact that there is always some latency between a pod being deleted or marked for deletion.....to kic updating its upstream targets and then sync'ing that with kong.
So technically there is always a window where a request can come in and hit a stale target and then sit and wait for 60 seconds(or however long timeout is for).

How can this be avoided? Or rectified as to not cause problems? As our issue is that kong takes 60 seconds waiting to timeout on an ip that no longer exists. The clients end up timing out or kong then times out and returns a 504.

@zackery-parkhurst zackery-parkhurst added the bug Something isn't working label Jul 10, 2024
@randmonkey
Copy link
Contributor

@zackery-parkhurst KIC does not watch for HPA events but watch for services and endpoints. After KIC noticed that a pod is deleted (or marked as not ready that reflected to a change on related Endpoints), it will update its own cache immediately. While for configuring Kong, KIC does it in a 3 second period since doing a full sync on Kong config is a heavy operation. You can configure CONTROLLER_PROXY_SYNC_SECONDS to configure the period of applying configuration to Kong.

@zackery-parkhurst
Copy link
Author

@randmonkey Thank you for the response.

I understand that KIC watches the endpoints to update the upstream targets. Its the 3 second sync period that is causing my issues.

As when a pod is deleted or marked for deletion. It takes KIC ~3 seconds to then update Kong. Which means if any requests come in during that time. Kong will then send requests to a target that is being shut down.

And if the request does not finish before the app shuts down its server. Then the request will just hang until the kong timeout is reached and kong kills it with a 504.

Thank you for the information on We could shorten the CONTROLLER_PROXY_SYNC_SECONDS , but that would probably increase the load on kong and therefore increase latency that kong takes to process requests. And still leave a time from when the endpoint changes. To when the KIC actually updates Gateway with changes. Which would be the 1 second it takes KIC to sync changes to Kong.

So what would be the recommended way to handle this situation? As if there is no way to prevent kong from sending a request to a pod that is being shut down/terminated?

Last thing:
I tried researching how to accomplish this but could not find out how to do other than manually setting upstream targets via Admin API. Right now, we use Ingress objects that point at a clusterIP service and Kong automatically updates the target endpoints. Is there a way to have KIC to configure upstream targets with either clusterIP or Service Name from the ingress object instead?

@zackery-parkhurst
Copy link
Author

That is exactly what I was looking for. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pending author feedback
Projects
None yet
Development

No branches or pull requests

3 participants