-
Notifications
You must be signed in to change notification settings - Fork 882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filter xtables lock warnings when firewalld is active #2135
Conversation
iptables/iptables.go
Outdated
func filterOutput(output []byte, args ...string) []byte { | ||
// ignore iptables' message about xtables lock | ||
if strings.Contains(string(output), xLockWaitMsg) { | ||
logrus.Warnf("Xtables contention detected during: %s", strings.Join(args, " ")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to still expose the seconds spent waiting for the lock I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying just add "spent up to 1s waiting for lock" to the warning? Or did you want to time it somehow in the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the suggestion is to dump the original output
message as part of the warning i.e. the whole string corresponding to https://git.netfilter.org/iptables/tree/iptables/xshared.c?h=v1.6.0#n261.
Codecov Report
@@ Coverage Diff @@
## master #2135 +/- ##
========================================
Coverage ? 40.6%
========================================
Files ? 139
Lines ? 22457
Branches ? 0
========================================
Hits ? 9119
Misses ? 12002
Partials ? 1336
Continue to review full report at Codecov.
|
@ctelfer it looks like the https://git.netfilter.org/iptables/tree/iptables/xshared.c#n307 To handle this for now, perhaps, just match against the prefix We probably need to figure out something to stop depending on the error strings. |
@ddebroy Good catch w.r.t. the changing message. I've added the original output to the warning and have slightly truncated the match string. (which was quite long to being with for something that is being searched using string.Contains()) I'm not convinced that the warning is a good idea, TBH. We may well be in contention all the time in real deployments and just not see it because we've been filtering it. In other words, could be perfectly natural. W.r.t. how to do this "right": well, flagging any (erm .. most) output from iptables as an error regardless of the operation exit code is hacky in my opinion. I'd be curious as to what conditions the code was trying to catch that a non-zero exit code wouldn't. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
iptables/iptables.go
Outdated
func filterOutput(output []byte, args ...string) []byte { | ||
// ignore iptables' message about xtables lock | ||
if strings.Contains(string(output), xLockWaitMsg) { | ||
logrus.Warnf("xtables contention detected while running [%s]: %q", strings.Join(args, " "), string(output)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you meant regarding the warning, should we follow another approach for the warning, like if the operation takes more than x seconds to execute than we print it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrmm.. Yes, I could see that being a good compromise. Gives us more control over the results in any case. I'll write it up and update.
iptables/iptables.go
Outdated
if strings.Contains(string(output), xLockWaitMsg) { | ||
// Maximum duration that an iptables operation can take | ||
// before flagging a warning. | ||
const opWarnTimeMsec = 2000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move here the time.Millisecond moltiplication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and also put 2 * time.Second
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, that would be cleaner.
iptables/iptables.go
Outdated
|
||
func filterOutput(start time.Time, output []byte, args ...string) []byte { | ||
// Flag operations that have taken a long time to complete | ||
if time.Now().Sub(start) > opWarnTimeMsec*time.Millisecond { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use time.Since()
// Since returns the time elapsed since t.
// It is shorthand for
time.Now().Sub(t).
func Since(t Time) Duration {
return Now().Sub(t)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's useful. Ok.
This gets filtered for raw iptables calls, but not from calls made through firewalld. The patch just ensures consistency of operation. It also adds a warning when xtables contention detected and truncates the search string slightly as it appears that the suffix will be changing in the near future. Signed-off-by: Chris Telfer <ctelfer@docker.com>
Addressed comments and squashed the commits. |
func filterOutput(start time.Time, output []byte, args ...string) []byte { | ||
// Flag operations that have taken a long time to complete | ||
if time.Since(start) > opWarnTime { | ||
logrus.Warnf("xtables contention detected while running [%s]: %q", strings.Join(args, " "), string(output)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will merge this one, but maybe would be interesting to print also the time.Since in the log to have an idea of how long it took.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It turns out that when running under RHEL 7.3 (and possibly other distributions), Linux can be configured with firewalld, but still run into contention on the xtables lock when performing iptables operations even though use of firewalld doesn't specify the
--wait
flag for iptables. An example of this as seen in e2e tests is as follows:The first line of the output shows that iptables programming was going through firewalld. The second line consists of output from the iptables command along with the error which is
(<nil>)
. Now, theRawCombinedOutput()
function callsRaw()
which in turn dispatches between issuing the command either using thePassthrough()
function for firewalld execution or theraw()
function for direct fork/exec execution.Passthrough()
generates the first (debug) line in the output above.Raw()
returns both the output from the command and anerror
value which is non-nil presumably if the command exits with a non-zero exit code indicating failure. The output above therefore indicates that the command exited with a zero-status (successfully) but generated output indicating that it had to wait for 1s to try to acquire the iptables lock.Now
RawCombinedOutput()
returns only an error code and it returns an error if eitherRaw()
returns an error OR if the output from iptables is non-empty. This means that the iptables warning above ends up causingRawCombinedOutput()
to return an error when in fact none was present. This is not an unknown situation as the code inraw()
specifically scans the output that it gets from the iptables operation looking for the xtables lock message. If it sees it, it returns empty output instead. This prevents flagging a false-positive error. However, the firewalld path using thePassthrough()
does not perform this filtering and so causes this false-positive error.One might, at this point, ask whether this is just a spurious log problem or a genuine error given that the operation did, in fact, complete successfully. Unfortunately, the operations that were failing above were due to programming ingress rules using iptables. There were actually a series of iptables modifications that are all needed for correct ingress redirection. If one of the early or middle operations is caught as a failure, libnetwork will not try to program the remaining ones. This leaves the system in an inconsistent state.
This PR consists of two commits. The first simply replicates the output filter from the
raw()
path on thePassthrough()
path and should avoid the error. The second was added based on a suggestion from @fcrisciani and it adds a warning when the xtables lock contention is detected. This makes the fact that the contention is occurring more visible so as to provide better visibility that there may be other issues in this area.