Temporary `ttl=0` returned on edge case by Route 53 #51

Tieske · 2018-07-30T05:35:32Z

for a possible fix. The problem causes the Balancer to temporarily switch from the proper record (A) to a vritual SRV one, because ttl=0 is detected.

The text was updated successfully, but these errors were encountered:

Tieske · 2018-08-23T13:38:53Z

Root cause of the issue is that the Amazon Route 53 nameserver appears to round/truncate the remaining ttl for a query to 0. This is an Amazon bug since the nameserver should never report ttl=0 for a record that has a non-0 ttl.

Since the amazon nameserver reports the "remaining" ttl it has in its own cache, the DNS client will automatically "zoom in" on the edge where the record expires. This is especially true when a system is under load, in a way that in every possible fraction of a second a dns request is done. So when the local cache (our own) expires, we immediately fire a query, and automatically are hitting the edge on the nameserver. Since the timing has to be precise we're not always hitting the edge, but the user reported it happening every 2 minutes. So it seems that when the remaining time calculated by the nameserver is close to 0 (but not exactly since it would invalidate its own cache in that case). It rounds, or truncates (integer cast maybe) the value to 0. This causes it to report ttl=0 for a non-0 ttl record.

This in turn causes the loadbalancer to switch behaviour to do dns queries on every request, which shouldn't happen.

See #51 Some servers will report ttl=0 when they are on the very edge of their own cached ttl. This should never happen for a record that has a non-0 ttl. This fix makes sure we require ttl=0 reported twice in a row before we switch the loadbalancer. Fixes #51

Tieske self-assigned this Jul 30, 2018

Tieske mentioned this issue Aug 23, 2018

fix(balancer) fix accidental ttl=0 switches #56

Merged

Tieske closed this as completed in #56 Aug 27, 2018

onematchfox mentioned this issue May 28, 2021

Kong upstream health flip-flopping due to TTL=0 handling #131

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Temporary `ttl=0` returned on edge case by Route 53 #51

Temporary `ttl=0` returned on edge case by Route 53 #51

Tieske commented Jul 30, 2018

Tieske commented Aug 23, 2018

Temporary ttl=0 returned on edge case by Route 53 #51

Temporary ttl=0 returned on edge case by Route 53 #51

Comments

Tieske commented Jul 30, 2018

Tieske commented Aug 23, 2018

Temporary `ttl=0` returned on edge case by Route 53 #51

Temporary `ttl=0` returned on edge case by Route 53 #51