Skip to content
This repository has been archived by the owner on Jul 11, 2023. It is now read-only.

Temporary ttl=0 returned on edge case by Route 53 #51

Closed
Tieske opened this issue Jul 30, 2018 · 1 comment · Fixed by #56
Closed

Temporary ttl=0 returned on edge case by Route 53 #51

Tieske opened this issue Jul 30, 2018 · 1 comment · Fixed by #56
Assignees

Comments

@Tieske
Copy link
Member

Tieske commented Jul 30, 2018

See Kong/kong#3641 (comment)

for a possible fix. The problem causes the Balancer to temporarily switch from the proper record (A) to a vritual SRV one, because ttl=0 is detected.

@Tieske Tieske self-assigned this Jul 30, 2018
@Tieske
Copy link
Member Author

Tieske commented Aug 23, 2018

Root cause of the issue is that the Amazon Route 53 nameserver appears to round/truncate the remaining ttl for a query to 0. This is an Amazon bug since the nameserver should never report ttl=0 for a record that has a non-0 ttl.

Since the amazon nameserver reports the "remaining" ttl it has in its own cache, the DNS client will automatically "zoom in" on the edge where the record expires. This is especially true when a system is under load, in a way that in every possible fraction of a second a dns request is done. So when the local cache (our own) expires, we immediately fire a query, and automatically are hitting the edge on the nameserver. Since the timing has to be precise we're not always hitting the edge, but the user reported it happening every 2 minutes. So it seems that when the remaining time calculated by the nameserver is close to 0 (but not exactly since it would invalidate its own cache in that case). It rounds, or truncates (integer cast maybe) the value to 0. This causes it to report ttl=0 for a non-0 ttl record.

This in turn causes the loadbalancer to switch behaviour to do dns queries on every request, which shouldn't happen.

Tieske added a commit that referenced this issue Aug 23, 2018
See #51

Some servers will report ttl=0 when they are on the very edge
of their own cached ttl. This should never happen for a record
that has a non-0 ttl.

This fix makes sure we require ttl=0 reported twice in a row before
we switch the loadbalancer.

Fixes #51
Tieske added a commit that referenced this issue Aug 27, 2018
See #51

Some servers will report ttl=0 when they are on the very edge
of their own cached ttl. This should never happen for a record
that has a non-0 ttl.

This fix makes sure we require ttl=0 reported twice in a row before
we switch the loadbalancer.

Fixes #51
Tieske added a commit that referenced this issue Aug 27, 2018
See #51

Some servers will report ttl=0 when they are on the very edge
of their own cached ttl. This should never happen for a record
that has a non-0 ttl.

This fix makes sure we require ttl=0 reported twice in a row before
we switch the loadbalancer.

Fixes #51
Tieske added a commit that referenced this issue Aug 27, 2018
See #51

Some servers will report ttl=0 when they are on the very edge
of their own cached ttl. This should never happen for a record
that has a non-0 ttl.

This fix makes sure we require ttl=0 reported twice in a row before
we switch the loadbalancer.

Fixes #51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant