Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kong routes requests to targets deleted via Admin API #8757

Closed
1 task done
mkornatz opened this issue May 4, 2022 · 9 comments
Closed
1 task done

Kong routes requests to targets deleted via Admin API #8757

mkornatz opened this issue May 4, 2022 · 9 comments
Labels
core/admin-api core/balancer task/needs-investigation Requires investigation and reproduction before classifying it as a bug or not.

Comments

@mkornatz
Copy link

mkornatz commented May 4, 2022

Is there an existing issue for this?

Kong version ($ kong version)

2.8.1

Current Behavior

When deleting a target from an upstream via the Admin API, we see a recurring issue where the Kong Gateway proxy nodes continue to route traffic to the target. The Admin API shows the correct state (the target is removed), but we can confirm from the request logs that traffic is still being routed to the target long after the target is removed.

Our current workaround has been to restart the proxy nodes when we detect this issue, which resolves the issue, but is causing other side-effects, such as cutting off long running API requests.

About 1 second after the target is removed, using the DELETE /ustreams/{upstream_id}/targets/{target_id} Admin API, we see log entries like the following coming from the Kong Gateway. Note: these log lines may be out of order due to how our logging system collects entries with the same timestamp.

2022/05/02 15:10:39 [error] 1108#0: *20612 [lua] events.lua:194: do_handlerlist(): worker-events: event callback failed; source=balancer, event=targets, pid=1115 error='/usr/local/openresty/lualib/resty/lock.lua:153: API disabled in the context of log_by_lua*
log_by_lua(nginx-kong.conf:136):2: in main chunk', data={"operation":"create","entity":{"upstream":{"id":"943f01f4-60d0-4435-933d-c76adbbe7060"}}} while logging request, client: REDACTED, server: kong, request: "POST / HTTP/1.1", upstream: "http://REDACTED:80/", host: "REDACTED"
/usr/local/share/lua/5.1/kong/init.lua:1380: in function 'log'
/usr/local/share/lua/5.1/kong/runloop/handler.lua:1651: in function 'after'
...l/share/lua/5.1/kong/runloop/balancer/healthcheckers.lua:169: in function 'report_http_status'
/usr/local/share/lua/5.1/resty/healthcheck.lua:582: in function 'report_http_status'
/usr/local/share/lua/5.1/resty/healthcheck.lua:560: in function 'run_mutexed_fn'
[C]: in function 'pcall'
/usr/local/share/lua/5.1/resty/healthcheck.lua:676: in function </usr/local/share/lua/5.1/resty/healthcheck.lua:640>
/usr/local/share/lua/5.1/resty/healthcheck.lua:1242: in function 'raise_event'
/usr/local/share/lua/5.1/resty/worker/events.lua:361: in function 'post'
/usr/local/share/lua/5.1/resty/worker/events.lua:219: in function 'do_event_json'
/usr/local/share/lua/5.1/resty/worker/events.lua:185: in function 'do_handlerlist'
[C]: in function 'xpcall'
/usr/local/share/lua/5.1/kong/runloop/handler.lua:240: in function </usr/local/share/lua/5.1/kong/runloop/handler.lua:235>
/usr/local/share/lua/5.1/kong/runloop/balancer/targets.lua:154: in function 'on_target_event'
.../local/share/lua/5.1/kong/runloop/balancer/balancers.lua:198: in function 'create_balancer'
.../local/share/lua/5.1/kong/runloop/balancer/balancers.lua:105: in function 'create_balancer_exclusive'
/usr/local/share/lua/5.1/kong/cache/init.lua:168: in function 'fetch_targets'
/usr/local/share/lua/5.1/resty/mlcache.lua:685: in function 'get'
/usr/local/openresty/lualib/resty/lock.lua:153: in function 'lock'
[C]: in function 'sleep'
stack traceback:

Intermixed with those errors above, we also see several log entries like the following related to the given target/upstream.

1108#0: *26698 [lua] healthcheckers.lua:171: report_http_status(): [healthchecks] failed reporting status: true while logging request

This issue is intermittent, and doesn't always occur. It seems to be most prevalent when the targets being delete are having issues and have become unresponsive or take a very long time (30s) to respond to requests.

Expected Behavior

When a target is removed from an upstream, we expect that traffic will no longer be routed to it. With eventual Kong worker consistency, we don't expect this change to be immediately applied. However, we do expect it to take effect within at least several seconds after the target is removed.

Steps To Reproduce

We are still working to reproduce the issue in an isolated environment. However, in our production environment we can confirm the follow steps are taken to see the behavior.

1. In an environment with multiple gateway proxy nodes (at least 2)
2. Create a service and upstream and configure active and passive health checks with a standard configuration
3. Add target_1 and target_2 to the upstream
4. Force target_2 to become unresponsive or take a long time to respond to requests
5. Delete target_2 from the upstream

Anything else?

  • We are using the following config values. We have tried multiple variations of these configs with no change in behavior.
KONG_DB_UPDATE_FREQUENCY: "5"
KONG_WORKER_STATE_UPDATE_FREQUENCY: "5"
KONG_WORKER_CONSISTENCY: "eventual"
  • In our environment, we often have targets that become unresponsive or start serving incorrect data when they are under heavy load. A background job in a separate system is responsible for deleting these targets when they get into a problematic state. This often leads to targets being added and removed every 5 minutes or so.
  • Both the admin and proxy nodes are running in a Kubernetes cluster
  • The logs included in the report above seem to be the only relevant logs
  • The upstream is configured to use both active and passive checks for the targets
@ADD-SP ADD-SP added core/balancer task/needs-investigation Requires investigation and reproduction before classifying it as a bug or not. core/admin-api labels May 5, 2022
@23henne
Copy link

23henne commented May 9, 2022

I'm wondering if it might be same issue or at least related to #8517. But in your case it logs at least a more or less meaningful error message. Maybe same but different but same.

@mkornatz
Copy link
Author

@23henne Thanks for pointing me to that other issue. It does indeed look similar. I'm still having trouble reproducing in a non-production environment, but if I can, I'll also mention it in the other issue.

@uvish
Copy link

uvish commented May 25, 2022

Same issue here ,Kong 2.8.1 logs shows requests routed to old IP ,but admin api /targets show only one target with new ip.
Update : On checking the DB we found multiple entries for same upstreams , one with old format and one with new format . On clearing all the older entries , this problem was fixed . This might have occured because of incomplete migration.

@esatterwhite
Copy link

This may be related to #8976

@locao
Copy link
Contributor

locao commented Jul 8, 2022

Hi @mkornatz! Thanks for you report. The log you shared is short of the stack traceback. Can you share the next lines after stack traceback:?

@ybce
Copy link

ybce commented Jul 8, 2022

Hello @locao I'm Matt's colleague and I believe that that is the stack traceback, as Matt mentioned Note: these log lines may be out of order due to how our logging system collects entries with the same timestamp.

@altnti
Copy link

altnti commented Jul 25, 2022

Some issue, kong version is 2.1.4.

Use kong admin api add one target(10...:47179) and remove one target(10...:57593).
One of kong continue to access the removed target(10...:57593).
Use kong admin get target list that it is correct, the info:
{
"next": null,
"data": [
{
"created_at": 1658732450.015,
"id": "a2c1f98b-2142-43b4-b489-eb223b9ae970",
"tags": null,
"weight": 100,
#new target(is live)
"target": "10.
..:47179",
"upstream": {
"id": "fadc44a1-550e-410f-be86-de67ce1a3acb"
}
}
]
}
The question myabe auto fix when update the upstream.

@raoxiaoyan
Copy link
Contributor

The related PR is #10384. Close it.

@1madhu9
Copy link

1madhu9 commented Apr 7, 2024

Hi Team,
We are observing the exact same issue in Kong version 3.2.2 as well i.e., When deleting a target from an upstream via the Admin API, we see a recurring issue where the Kong Gateway proxy nodes continue to route traffic to the target. We had instances in our production environment where some of the kong nodes were proxying the request to an upstream target which has been terminated more than 2 days ago.
The Admin API shows the correct state of all targets (the target is removed), but from the request logs we see that traffic is still being proxied to the target long after the target is removed.

We are trying to reproduce the issue in Non-Production environment, but we haven't been able to do so yet. We shall share more detail if we are able to reproduce the issue.

Kong Configuration:
version : 3.2.2
Mode : traditional mode with postgres DB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core/admin-api core/balancer task/needs-investigation Requires investigation and reproduction before classifying it as a bug or not.
Projects
None yet
Development

No branches or pull requests

10 participants