Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker in Docker no longer works without docker0 bridge #183

Closed
george-freebirdrides opened this issue Feb 14, 2019 · 13 comments
Closed

Docker in Docker no longer works without docker0 bridge #183

george-freebirdrides opened this issue Feb 14, 2019 · 13 comments
Assignees

Comments

@george-freebirdrides
Copy link

george-freebirdrides commented Feb 14, 2019

What happened:

We utilize docker-in-docker configuration to build our images from CI within our dev cluster. Because docker0 has been disabled, internal routing from the inner container can no longer reach the outside world. For instance:

W: Failed to fetch http://deb.debian.org/debian/dists/jessie-updates/InRelease  
W: Failed to fetch http://security.debian.org/dists/jessie/updates/InRelease  
W: Failed to fetch http://deb.debian.org/debian/dists/jessie/Release.gpg  Could not resolve 'deb.debian.org'
W: Failed to fetch http://security.debian.org/dists/jessie/updates/Release.gpg  Could not resolve 'security.debian.org'
W: Failed to fetch http://deb.debian.org/debian/dists/jessie-updates/Release.gpg  Could not resolve 'deb.debian.org'
W: Some index files failed to download. They have been ignored, or old ones used instead.
E: Unable to locate package apt-transport-https
 ---> Running in 2bfe21fd1ebc
curl: (6) Could not resolve host: dl.yarnpkg.com
gpg: no valid OpenPGP data found.

What you expected to happen:

Routing to the outside world to continue working. Only the outer container can still resolve hosts and reach them.

How to reproduce it (as minimally and precisely as possible):

Launch a container in a pod that builds an inner container with apt-get update

Anything else we need to know?:

This arose during our upgrade for CVE-2019-5736

Environment:

  • AWS Region:
  • Instance Type(s):
  • EKS Platform version (use aws eks describe-cluster --name <name> --query cluster.platformVersion):
  • Kubernetes version (use aws eks describe-cluster --name <name> --query cluster.version):
  • AMI Version:
  • Kernel (e.g. uname -a):
  • Release information (run cat /tmp/release on a node):
- AWS Region: us-west-2
- Instance Type(s): m4.large
- EKS Platform version (use `aws eks describe-cluster --name <name> --query cluster.platformVersion`): "eks.3"
- Kubernetes version (use `aws eks describe-cluster --name <name> --query cluster.version`): "1.10"
- AMI Version: ami-0e36fae01a5fa0d76
- Kernel (e.g. `uname -a`):
- Release information (run `cat /tmp/release` on a node):

Was there a security reason for disable docker0?

@kristofferahl
Copy link

We have the same issue! Our CI system, running as a docker container, is building docker images (docker in docker). Since upgrading to the latest EKS AMI we are unable to access the network from inside containers being built by our CI container. Also, simply starting a container from inside the CI docker container have the same end result.

@george-freebirdrides Did you find a temporary fix for this?

@andytom
Copy link

andytom commented Feb 14, 2019

This caused a lot of failed builds in our CI systems too. Looks like this was introduced by #109 @kristofferahl there are some work around on that PR.

@kristofferahl
Copy link

@andytom Thanks! We ended up simply specifying network=host for our docker build commands. Not something we liked doing but it worked.

docker build --network=host foo/bar:latest .

@yyolk
Copy link

yyolk commented Feb 14, 2019

thank you for opening this issue. We were also attempting to update today for the RUNC CVE-2019-5736 as per AWS' own blog post here: https://aws.amazon.com/security/security-bulletins/AWS-2019-002/

this validates for us that we needed to roll back ; and that perhaps we didn't even get the upgrade because #181 #177

@micahhausler
Copy link
Member

Hi folks,

As noted earlier, the change was made in #109 to disable the bridge network. This was done for a couple reasons. Primarily, Kubernetes doesn't need the bridge network, as it uses CNI for network setup. Secondarily, the bridge network docker chooses by default (if it is not overlapping with the VPC network) is 172.17.0.0/16. If you choose to peer a VPC using 172.17.0.0/16 and have nodes that have the 172.17.0.0/16 docker bridge network, there will be a lack of connectivity. This change went out in the release that also contained a patched docker CVE-2019-5736, but is unrelated to the CVE.

This use case of using docker outside of Kubernetes management bypasses the security restrictions Kubernetes puts in place, and as a default, we wrongly assumed that users wouldn't be managing docker containers outside of Kubernetes on a host. We understand that this is a trade off that some customers choose to make, so we've added an argument to the bootstrap script enabling the previous behavior in #187.

Please let us know if the change in #187 will accomodate your use cases, and we'll get an AMI out in the coming days to fix this for you.

@micahhausler micahhausler self-assigned this Feb 15, 2019
@DaedalusX
Copy link

DaedalusX commented Feb 15, 2019

Thanks for the explanation @micahhausler - I believe an argument to the bootstrap script to enable this behavior works for our use case. Appreciate the fast response! (edit: commented from personal - my work github is @george-freebirdrides)

@xiaodong-xie
Copy link

I managed to work around this issue by using --network host for both docker build and docker run.

@kristofferahl
Copy link

@micahhausler Any news on when the new AMI will be available?

@geerlingguy
Copy link

@kristofferahl it looks like https://aws.amazon.com/marketplace/pp/B07GRMYQR5?qid=1551211195678&sr=0-1&ref_=srh_res_product_title (AMI ami-0eeeef929db40543c) was created 6 days ago... @micahhausler can you confirm if it has this new flag available?

@JessieAMorris
Copy link

The flag is available but it still has bugs in it. From my testing it has not been completely fixed yet.

@JessieAMorris
Copy link

Specifically amazon-eks-node-(1.11,1.10)-v20190220 does not have commit 613fece in it.

@andytom
Copy link

andytom commented Mar 4, 2019

The flag is available but it still has bugs in it. From my testing it has not been completely fixed yet.

Specifically amazon-eks-node-(1.11,1.10)-v20190220 does not have commit 613fece in it.

@micahhausler is it possible to reopen this issue or build a new AMI with the missing commit?

johncblandii added a commit to johncblandii/terraform-aws-eks-workers that referenced this issue May 4, 2019
See awslabs/amazon-eks-ami#183. According to AWS support, adding the default bridge support is needed in order for docker in or docker on docker to build images inside of a pod.
@abdennour
Copy link

Thanks a lot @kristofferahl . Your comment saves my day,, no no save my week.

 docker build --network=host foo/bar:latest .

Hence, in my Jenkinsfile, I added :

sed -i 's@docker build@docker build --network=host@g' ./build.sh

Now Stackexcanges issues are resolve :

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants