-
Notifications
You must be signed in to change notification settings - Fork 881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add workaround for spurious retransmits leading to connection resets #1090
Comments
@aaronlehmann I saw the linked issue turned out to be an issue with AWS, is there still something that needs to be done in libnetwork? |
@thaJeztah: This issue is a suggestion to work around problems like this in libnetwork. The problem came from a combination of invalid packets generated somewhere in AWS' infrastructure, and the NAT setup used by libnetwork reacting to those invalid packets by tearing down the connection. This means the invalid packets cause problems for Dockerized applications but they are harmless for most other setups. moby/moby#19532 revealed that this problem was also seen on a residential internet connection. I think there is value in finding a workaround. |
I'm being bit by this in production what more information could I provide? |
Same here.. Simple docker container build on an arch linux system in residential. Just trying to do a git clone from a https git site (bitbucket).
|
Just ran into this on my office's internal network. Thankfully I found this page or all my hair would be ripped out by morning. The iptables workaround did the trick for me, thanks very much for providing that. If it helps, I'm running Docker 1.11.2 on Ubuntu 16.04. Let me know if there's any more information I can give that would be useful. |
@aaronlehmann It has been detected that this issue has not received any activity in over 6 months. Can you please let us know if it is still relevant:
Thank you! |
A fix was implemented in AWS. I don't think a workaround is necessary anymore. |
I will comment that this does happen on networks outside of AWS. The iptables fix does fix it HOWEVER you first have to find this issue to learn that. The errors are very generic, so if implementing the fix in docker is not a big deal it would probably save some people many hours of research into it:) |
Any solution to this problem on a macOS host? |
we have similar problem downloading file to our docker image from nexus throws connection reset by peer, adding iptables rules fixes it |
As it has been reported multiple times (@middleagedman, @BenSjoberg, @mitchcapper , @p53) the fix in the iptables resolves the issue ('connections reset by peer' or RST packet sent at TCP level). The issue is actually occurring in any container running in the default bridge network. Whether the issue occurs frequently or not depends on lot of factors (bandwidth, latency, host load). For sure, it occurs at some point. This issue is probably most of the time non-understood and incorrectly explained by a possible transient network partition, but it is not. It is a bug in the NAT setup installed by Docker. We face this issue with a perfectly valid TCP client-server transfer (for instance a The problem as already mentioned by @aaronlehmann is that benign "invalid" packets to the SNAT'ed container (caused for instance by TCP window overflow due to high throughput but slow client) are assigned to the host interface and considered incorrectly martians, which causes a connection reset. This is a problem references at several places, due to this netfilter/conntrack limitation:
The source NAT setup in iptables are installed by Docker for its bridge network support and are thus incomplete. I can attempt to make a pull request if it can help, or I can open a new issue if needed, tell me. Note that the abandoned pull request attempt #1129 does not fix the issue because the inserted rule does not drop the packets. There should be no filter on the destination because at that time the destination is not yet NAT'ed. Any conntrack invalid packets in filter INPUT chain have to be dropped as in : |
Add drop of conntrack INVALID packets in input such that invalid packets due to TCP window overflow do not cause a connection reset. Due to some netfilter/conntrack limitations, invalid packets are never treated as NAT'ed but reassigned to the host and considered martians. This causes a RST response from the host and resets the connection. As soon as NAT is setup, for bridge networks for instance, invalid packets have to be dropped in input. The implementation adds a generic DOCKER-INPUT chain prefilled with a rule for dropping invalid packets and a return rule. As soon as some bridge network is setup, the DOCKER-INPUT chain call is inserted in the filter table INPUT chain. Fixes moby#1090. Signed-off-by: Christophe Guillon <christophe.guillon@st.com>
Add drop of conntrack INVALID packets in input such that invalid packets due to TCP window overflow do not cause a connection reset. Due to some netfilter/conntrack limitations, invalid packets are never treated as NAT'ed but reassigned to the host and considered martians. This causes a RST response from the host and resets the connection. As soon as NAT is setup, for bridge networks for instance, invalid packets have to be dropped in input. The implementation adds a generic DOCKER-INPUT chain prefilled with a rule for dropping invalid packets and a return rule. As soon as some bridge network is setup, the DOCKER-INPUT chain call is inserted in the filter table INPUT chain. Fixes moby#1090. Signed-off-by: Christophe Guillon <christophe.guillon@st.com>
FYI: Now I think we should use "/proc/sys/net/netfilter/nf_conntrack_tcp_be_liberal" instead. |
FYI: This is also an issue for kubernetes that they are trying to solve with similar strategies: |
Hi @aaronlehmann, |
I'm using neither AWS nor Kubernetes, and I see the issue too between our office network (where our CI runners use) and external resource at digitalocean or maxmind.com. It generally manifests itself as
With tcpdumps I see lost but then reappearing packets (it reappeared after about 90ms or 200KB of data) triggering a RST. I'm not sure where the actual problem is, I'm assuming our ISP is doings something funky or a link aggregation is messing up packets. It happens mostly during quiet hours and the actual network issue is something we probably have to live with, but a 90ms packet delay shouldn't terminate connections The liberal sysctl fixes our issue (and firewalling RST probably too), but as the issue is not AWS (or even K8S specific) I too think this issue should be reopened. |
Drop invalid packets to deal with moby/libnetwork#1090
This is impacting us as well just using docker. Should this issue be reopened? |
Had the same issue on GCP when downloading large file from inside container using curl. The iptables rule solves the problem for me. Another workaround was to use |
Hello. To solve this problem, I developed a kubernetes controller called node-network-manager. By simply deploying and configuring network-node-manager, you can set |
This issue pretty much exists in non-AWS, non-GCP world as well. We run our clusters on-prem and were able to reproduce this issue esp with requests going outbound with higher payloads. Getting into details... Problem: An app team complained an issue with their app behavior. This app reaches outbound external service with certain sizes of payloads. In literal cURL world, it's nothing but passing JSON payloads in Steps we took to narrow down:
Since this is a system level setting that impacts not just docker traffic, we are still looking at best action that meets our environment needs. I am not inclined to give a resolution step, but just thought will put my thoughts/experience w.r.t this issue on how this took several man hours of effort to identify the root cause. Reading through above responses, 'am curious to know how this was fixed in AWS and why or if there exists a fix for this in any of the docker releases (considering this issue showed up 4 years ago). If this is not yet fixed, what's the best way forward to reopen this issue? |
The fact that AWS implemented a fix doesn't mean this issue disappeared. As mentioned by users above, this can still happen in some cases. I'll reopen it and I'll backport the PR submitted by @guillon into github.com/moby/moby in the upcoming weeks. |
There is a longstanding issue over at distribution/distribution#785 where users reported connection resets trying to push to an AWS-hosted registry from inside the AWS network. After months, we've finally narrowed this down to a bad interaction between spurious TCP retransmits and the NAT rules that Docker sets up for bridge networking.
Here is a summary of what happens:
I think it would be hugely helpful for libnetwork to include a workaround for this. It has affected a lot of users trying to use the registry in AWS, and it presumably affects other Dockerized applications as well. While I'll reach out to AWS to point out the spurious retransmits, I don't know if they'll be able to fix them, and there may also be other environments with similar issues.
I've found two possible workarounds:
echo 1 > /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_be_liberal
. This causes conntrack/NAT to treat packets outside the TCP window as part of the flow being tracked, instead of marking them invalid and causing them to be handled by the host.iptables -I INPUT -m conntrack --ctstate INVALID -j DROP
Both of these can potentially affect non-Docker traffic. The former causes NAT to forward packets that it would otherwise err on the side of not forwarding, which seems relatively harmless, but it's a system-level setting, so it's not limited to Docker flows. The latter would drop any packets that conntrack deems invalid, system-wide, unless we added specific destination filters for the addresses/ports that Docker set up NAT rules for, which could add overhead.
It may be too late to hope for a workaround to be included in Docker 1.11, but anything we can do on this front will really improve the lives of Docker users on AWS.
The text was updated successfully, but these errors were encountered: