Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backport -> release/3.4.x] fix(dns): Eliminate asynchronous timer in syncQuery() to prevent hang #12038

Merged
merged 2 commits into from
Nov 16, 2023

Conversation

chobits
Copy link
Contributor

@chobits chobits commented Nov 16, 2023

backport #11900 and #11386 to 3.4.x

Summary

  1. fix(dns): eliminate asynchronous timer in syncQuery() to prevent deadlock risk (fix(dns): Eliminate asynchronous timer in syncQuery() to prevent hang risk #11900)
    * Revert "fix(conf): set default value of `dns_no_sync` to `on` (#11869)"
     
    This reverts commit 3be2513a60b9f5f0a89631ff17c202e6113981c0.

    * fix(dns): introduce the synchronous query in syncQuery() to prevent hang risk

    Originally the first request to `syncQuery()` will trigger an asynchronous timer
    event, which added the risk of thread pool hanging.

    With this patch, cold synchronously DNS query will always happen in the current
    thread if current phase supports yielding.
  1. fix(dns): fix retry and timeout handling (fix(dns): fix retry and timeout handling #11386)
    - Stop retrying in dns/client.lua, let the resolver handle this.  This
       change also makes it possible to disable retries, which previously
       was not possible
     - Be more faithful to the timeouts set by the user.  Previously, the
       timeout configured was used only for the ultimate request sent to
       the DNS server, but asynchronous requests allowed longer timeouts
       which was not transparent.
     - When the DNS server fails, stop trying other query types.  Previously,
       the behavior was such that after an (intermediate) failure to query
       for one record type (say "SRV"), the client would try the next record
       type (say "A") and succeed with that.  It would then return the
       contents of the "A" record even if the "SRV" record pointed to a
       different address.
     - Change domain names used for testing the DNS client into the
       kong-gateway-testing.link zone, which is controlled by the Kong Gateway
       team.

    Fixes https://github.com/Kong/kong/issues/10182
    KAG-2300

Checklist

  • The Pull Request has tests
  • A changelog file has been created under changelog/unreleased/kong or skip-changelog label added on PR if changelog is unnecessary. README.md
  • There is a user-facing docs PR against https://github.com/Kong/docs.konghq.com - PUT DOCS PR HERE

Full changelog

  • [Implement ...]

Issue reference

Fix #[issue number]

hanshuebner and others added 2 commits November 16, 2023 10:28
- Stop retrying in dns/client.lua, let the resolver handle this.  This
   change also makes it possible to disable retries, which previously
   was not possible
 - Be more faithful to the timeouts set by the user.  Previously, the
   timeout configured was used only for the ultimate request sent to
   the DNS server, but asynchronous requests allowed longer timeouts
   which was not transparent.
 - When the DNS server fails, stop trying other query types.  Previously,
   the behavior was such that after an (intermediate) failure to query
   for one record type (say "SRV"), the client would try the next record
   type (say "A") and succeed with that.  It would then return the
   contents of the "A" record even if the "SRV" record pointed to a
   different address.
 - Change domain names used for testing the DNS client into the
   kong-gateway-testing.link zone, which is controlled by the Kong Gateway
   team.

Fixes #10182
KAG-2300
…adlock risk (#11900)

* Revert "fix(conf): set default value of `dns_no_sync` to `on` (#11869)"

This reverts commit 3be2513.

* fix(dns): introduce the synchronous query in syncQuery() to prevent hang risk

Originally the first request to `syncQuery()` will trigger an asynchronous timer
event, which added the risk of thread pool hanging.

With this patch, cold synchronously DNS query will always happen in the current
thread if current phase supports yielding.

Fix FTI-5348

---------

Co-authored-by: Datong Sun <datong.sun@konghq.com>
@chobits chobits force-pushed the backport-11900-to-release/3.4.x branch from 0dc54d1 to 9158663 Compare November 16, 2023 02:29
@chobits chobits changed the title Backport 11900 to release/3.4.x [backport -> release/3.4.x] fix(dns): Eliminate asynchronous timer in syncQuery() to prevent hang Nov 16, 2023
@chobits chobits requested a review from outsinre November 16, 2023 02:31
@chobits chobits removed the request for review from outsinre November 16, 2023 06:37
@bungle bungle merged commit e9640b6 into release/3.4.x Nov 16, 2023
25 checks passed
@bungle bungle deleted the backport-11900-to-release/3.4.x branch November 16, 2023 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants