Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolver: less debug #2475

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

resolver: less debug #2475

wants to merge 2 commits into from

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Oct 21, 2019

Observed lots of debug messages from resolver (and associated high CPU
usage, part of which was because of excessive logging) when debug logging
is turned on. Here is some data from a real system:

$ journalctl -u docker --since="2019-10-21 21:00:00" --until="2019-10-21 22:00:00" | wc -l
188621
$ journalctl -u docker --since="2019-10-21 21:00:00" --until="2019-10-21 22:00:00" | grep -E 'Name To resolve: |\[resolver\] ' | wc -l
186319

So, it was almost 200000 lines logger for just one hour, and about 99% of that
are from resolver. While this might be the peculiarity of a particular setup,
the number of such messages still seem way too excessive.

Remove the ones that appear very often and are not errors.

@kolyshkin kolyshkin force-pushed the resolver-debug branch 2 times, most recently from 8b72ab8 to 999e703 Compare October 21, 2019 22:54
@kolyshkin
Copy link
Contributor Author

@arkodg @selansen PTAL

@kolyshkin
Copy link
Contributor Author

@thaJeztah @binman-docker @ingshtrom PTAL

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we should also look into caching results (could be a can of worms, but perhaps if there's a good implementation already)

@@ -452,9 +449,6 @@ func (r *resolver) ServeDNS(w dns.ResponseWriter, query *dns.Msg) {
logrus.Warnf("[resolver] connect failed: %s", err)
continue
}
queryType := dns.TypeToString[query.Question[0].Qtype]
logrus.Debugf("[resolver] query %s (%s) from %s, forwarding to %s:%s", name, queryType,
extConn.LocalAddr().String(), proto, extDNS.IPStr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we still have some debugging about this case (I know this has sometimes been useful when debugging if requests are actually forwarded, and if so, to which DNS)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, tcpdump for the rescue.

r.backend.HandleQueryResp(h.Name, ip)
}
}
if resp.Answer == nil || answers == 0 {
logrus.Debugf("[resolver] external DNS %s:%s did not return any %s records for %q", proto, extDNS.IPStr, queryType, name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for this one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@kolyshkin
Copy link
Contributor Author

Wondering if we should also look into caching results (could be a can of worms, but perhaps if there's a good implementation already)

Lack of caching is an issue, too, but I think it's entirely separate one (also much more complicated)

Copy link
Collaborator

@selansen selansen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@arkodg
Copy link
Contributor

arkodg commented Oct 22, 2019

@kolyshkin is there a way to ratelimit these logs

@thaJeztah
Copy link
Member

@kolyshkin is there a way to ratelimit these logs

Doesn't seem like logrus implements it out of the box, but could possibly be implemented; sirupsen/logrus#560 sirupsen/logrus#304

@kolyshkin
Copy link
Contributor Author

is there a way to ratelimit these logs

@arkodg in fact some logs are already rate-limited; see #1062.

But in this case it does not make sense to rate-limit the logs, since every line provide different information (different names or IPs), so if you try to rate-limit, you'll end up seeing only some of them, which does not make sense.

IOW, it makes sense to rate-limit a warning message like we ran out of disk space (if it may appear too frequent), but it does not make sense to rate-limit a message like processing query %s (as %s is different every time).

Or maybe I don't quite understand your proposal; can you elaborate?

@arkodg
Copy link
Contributor

arkodg commented Oct 25, 2019

@kolyshkin I'm just very reluctant to delete these logs

Can we add a compile time parameter / global const enableDebugLogs = false to reduce logging for now

@kolyshkin
Copy link
Contributor Author

@kolyshkin I'm just very reluctant to delete these logs

@arkodg I don't see any value in logging every DNS request (and I guess most of this info is easy to obtain using tcpdump/wireshark).

Can we add a compile time parameter / global const enableDebugLogs = false to reduce logging for now

Yes but IMHO this will look ugly. OK let me try it.

Observed lots of debug messages from resolver (and associated high CPU
usage because of lots of logging) when debug logging is turned on. Here
is some data from a real system:

> $ journalctl -u docker --since="2019-10-21 21:00:00"  --until="2019-10-21 22:00:00"  | wc -l
> 188621
> $ journalctl -u docker --since="2019-10-21 21:00:00"  --until="2019-10-21 22:00:00"  | grep -E 'Name To resolve: |\[resolver\] ' | wc -l
> 186319

So, it was about 200000 lines logger for just one hour, and about 99%
of that are from resolver. While this might be the peculiarity of
a particular setup, the number of such messages still seem way too
excessive.

Remove the ones that are not errors. In case those are needed, one have
to recompile this package with debug_log build tag.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Use more `debugf` in places that look definitely like developer's debug
which is most probably useless to an end user.

Promote some debug errors that looks more like warnings to use Warnf
rather than Debugf.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin
Copy link
Contributor Author

@arkodg I have modified this PR according to your comments; please take another look

@kolyshkin
Copy link
Contributor Author

@arkodg PTAL

@cpuguy83
Copy link
Member

Note we have migrated this codebase over to github.com/moby/moby/libnetwork.
We are not accepting PR's on this repo anymore except for backports to be included in moby 20.10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants