-
Notifications
You must be signed in to change notification settings - Fork 881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move the iptables setup for embedded DNS into a reexec process #1115
Conversation
Signed-off-by: Santhosh Manohar <santhosh@docker.com>
LGTM |
Hi, We tested this patch and it does not seem to have resolved the issue. With the patch applied to Docker 1.11.0, we see still threads end up in a different network namespace around 3.8% of the time. Thanks, |
@samuelkarp I couldn't hit the issue with the change. Can you share some details about your test ? Was it happening more frequently without this patch ? Are you testing only with containers on user defined networks or with default bridge as well ? |
@sanimej The test basically runs and stops 10 containers in parallel using the default bridge, then checks how many network namespaces exist for the Docker daemon process. The repro script is very similar to this (not exactly the same; my colleague who wrote the script and ran the tests hasn't sent me his most recent changes yet). |
@samuelkarp Thanks. This PR specifically addresses the namespace issue that happens when setting up the embedded DNS server, which does the more syscall intensive iptable oprations. Embedded DNS server is used only for containers on user defined networks. Chances of hitting this issue for default bridge network is much lower; as you have observed even with a script tailored to hit the problem. libnetwork code also has a protection (see the usage of InitOSContext() call) to make sure all network operations happen in the right network namespace. So its less likely to cause user visible symptoms. But its definitely better to avoid this issue completely. We will address it in 1.12 |
@sanimej Thanks. Are the issues we're seeing where daemon threads end up in other network namespaces while using the default bridge caused by a different bug? If so, is there an issue tracking it that I can follow? We initially missed the note about the non-default network in #1113, but with our repro script based on the reproduction steps provided in that issue we're still seeing threads end up in different network namespaces. |
@samuelkarp When bringing up a new container libnetwork code has to set into the container's network namesapce to bring up the interfaces, configure the routes etc. If the golang runtime creates a new OS thread when its executing in the container namespace that triggers the problem you see. We can use #1113 to track this issue as well. I will remove it from this PR. |
@sanimej Thank you! Is there any possibility the remaining thread/namespace issues can be addressed prior to 1.12.0 (maybe 1.11.1)? |
LGTM |
@samuelkarp Can you please create a new issue for the namespace problem with the docker default network ? My success rate in hitting the issue has been less than yours. We will try to get this fixed in 1.11.1; but given the short window for the patch release we may not be able to make it. This PR specifically addresses the namespace issue in setting up the embedded DNS server. So we will go ahead and merge this and close #1113. |
Related to #1113
For the gory details please see the excellent analysis provided by the submitters in the bug report.
Gist of the issue: Go doesn't give any control over how & when the underlying OS threads are created and managed. When we have already set into the net namespace of a container if go runtime creates a new thread its namespaces will be inherited from the current OS thread; in this case the net namespace of the container. Once this happens there is no way to set the namespaces back to the root namespace. This issue gets exposed sometimes when embedded DNS server sets up the iptables because the exec.Run creates go routines and the iptable calls translate to multiple system calls.
To address this moved the iptable setup into a reexec process.
A similar problem could happen potentially from other network plumbing tasks that happen in the container's network namespace. This will be addressed in a different PR for 1.12
Signed-off-by: Santhosh Manohar santhosh@docker.com