singleInFlight can cause all future DNS requests to fail #1449

jameshartig · 2023-04-09T01:26:14Z

Currently singleInFlight only takes into account the message and not the server address. I see the appeal because servers should return the same answer so why not group them. However, if a server is down then this has the unintentional consequence of causing all future DNS requests to fail.

Contrived example code

package main

import (
	"log"
	"time"

	"github.com/miekg/dns"
)

func main() {
	server := &dns.Server{
		Addr: ":9999",
		Net:  "udp",
		Handler: dns.HandlerFunc(func(w dns.ResponseWriter, r *dns.Msg) {
			m := new(dns.Msg)
			m.SetRcode(r, dns.RcodeSuccess)
			w.WriteMsg(m)
		}),
	}

	cfg := dns.ClientConfig{
		Servers: []string{
			"127.0.0.1:9999",
			"127.0.0.1:100",
		},
		Timeout: 1,
	}

	// let's pretend that the first server is temporarily down for an upgrade or
	// restart by starting the ListenAndServer purposefully late...

	newMsg := func() *dns.Msg {
		m := new(dns.Msg)
		m.SetQuestion("test.test.", dns.TypeA)
		return m
	}

	c := new(dns.Client)
	c.SingleInflight = true
	timeout := time.Duration(cfg.Timeout) * time.Second
	c.DialTimeout = timeout
	c.ReadTimeout = timeout
	c.WriteTimeout = timeout

	query := func() {
		for _, srv := range cfg.Servers {
			_, _, err := c.Exchange(newMsg(), srv)
			if err == nil {
				log.Printf("success from %v\n", srv)
				return
			}
			log.Printf("error from %v: %v\n", srv, err)
		}
	}

	go query()

	go func() {
		panic(server.ListenAndServe())
	}()

	for range time.Tick(500 * time.Millisecond) {
		go query()
	}
}

When running the above code you'll see:

2023/04/08 21:14:32 error from 127.0.0.1:9999: read udp 127.0.0.1:52618->127.0.0.1:9999: i/o timeout
2023/04/08 21:14:32 error from 127.0.0.1:9999: read udp 127.0.0.1:52618->127.0.0.1:9999: i/o timeout
2023/04/08 21:14:32 error from 127.0.0.1:9999: read udp 127.0.0.1:52618->127.0.0.1:9999: i/o timeout
2023/04/08 21:14:33 error from 127.0.0.1:100: read udp 127.0.0.1:52621->127.0.0.1:100: i/o timeout
2023/04/08 21:14:33 error from 127.0.0.1:100: read udp 127.0.0.1:52621->127.0.0.1:100: i/o timeout
2023/04/08 21:14:33 error from 127.0.0.1:9999: read udp 127.0.0.1:52621->127.0.0.1:100: i/o timeout
2023/04/08 21:14:33 error from 127.0.0.1:100: read udp 127.0.0.1:52621->127.0.0.1:100: i/o timeout
2023/04/08 21:14:33 error from 127.0.0.1:9999: read udp 127.0.0.1:52621->127.0.0.1:100: i/o timeout
2023/04/08 21:14:34 error from 127.0.0.1:9999: read udp 127.0.0.1:60130->127.0.0.1:100: i/o timeout
2023/04/08 21:14:34 error from 127.0.0.1:100: read udp 127.0.0.1:60130->127.0.0.1:100: i/o timeout
2023/04/08 21:14:34 error from 127.0.0.1:100: read udp 127.0.0.1:60130->127.0.0.1:100: i/o timeout
2023/04/08 21:14:34 error from 127.0.0.1:9999: read udp 127.0.0.1:60130->127.0.0.1:100: i/o timeout
2023/04/08 21:14:35 error from 127.0.0.1:9999: read udp 127.0.0.1:60134->127.0.0.1:100: i/o timeout
2023/04/08 21:14:35 error from 127.0.0.1:100: read udp 127.0.0.1:60134->127.0.0.1:100: i/o timeout
2023/04/08 21:14:35 error from 127.0.0.1:100: read udp 127.0.0.1:60134->127.0.0.1:100: i/o timeout
2023/04/08 21:14:35 error from 127.0.0.1:9999: read udp 127.0.0.1:60134->127.0.0.1:100: i/o timeout

Every query will forever fail despite only the second server being "down". This is because when it tries the first server a request to the second server is still timing out and then when it times out that goroutine tries the second server. The only way to get out of this is to wait for the second server to start working again or kill the program.

A simple fix would be to include the server's address in the group key. Alternatively, exchangeWithConnContext could check to see if the error returned from Do was for a different endpoint and immediately try again if so? But that could cause the Exchange function to take up to 2*Timeout which isn't ideal.

Finally, when SingleInflight is true, Exchange still calls Dial even if it will end up re-using an existing answer and just throwing away the connection without even using it. Any thoughts on tidying that up at the same time?

The text was updated successfully, but these errors were encountered:

tmthrgd · 2023-04-09T02:00:32Z

I somewhat feel the whole SingleInflight code is too fragile and we might be better off removing it. It also has issues with contexts IIRC. It's also something callers can implement if they need it.

miekg · 2023-04-10T17:52:23Z

[ Quoting ***@***.***> in "Re: [miekg/dns] singleInFlight can ..." ]

I somewhat feel the whole SingleInFligjt code is too fragile and we might be better off removing it. It also has issues with contexts IIRC. It's also something callers can implement if they need it.

Agreed, was a fun idea to add (10+ years ago), but too many subtle things. I think we can noop the functionality?

dns.Client.SingleInflight is a no-op. See github.com/miekg/dns/issues/1449 Signed-off-by: Alejandro Mery <amery@jpi.io>

tmthrgd mentioned this issue Apr 18, 2023

Remove SingleInflight support from Client #1454

Merged

miekg closed this as completed in #1454 Apr 27, 2023

amery mentioned this issue Jul 22, 2023

in-flight support darvaza-proxy/resolver#34

Closed

amery added a commit to darvaza-proxy/resolver that referenced this issue Dec 5, 2023

remove deprecated usage of dns.Client#SingleInflight

cf20709

dns.Client.SingleInflight is a no-op. See github.com/miekg/dns/issues/1449 Signed-off-by: Alejandro Mery <amery@jpi.io>

amery added a commit to darvaza-proxy/resolver that referenced this issue Dec 6, 2023

remove deprecated usage of dns.Client#SingleInflight

de7e115

dns.Client.SingleInflight is a no-op. See github.com/miekg/dns/issues/1449 Signed-off-by: Alejandro Mery <amery@jpi.io>

wallrj mentioned this issue Jan 26, 2024

Stop using the deprecated SingleInflight field of miekg/dns cert-manager/cert-manager#6669

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

singleInFlight can cause all future DNS requests to fail #1449

singleInFlight can cause all future DNS requests to fail #1449

jameshartig commented Apr 9, 2023

tmthrgd commented Apr 9, 2023 •

edited

Loading

miekg commented Apr 10, 2023 via email

singleInFlight can cause all future DNS requests to fail #1449

singleInFlight can cause all future DNS requests to fail #1449

Comments

jameshartig commented Apr 9, 2023

tmthrgd commented Apr 9, 2023 • edited Loading

miekg commented Apr 10, 2023 via email

tmthrgd commented Apr 9, 2023 •

edited

Loading