Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Overlay network not found on worker node #11894

Open
thormme opened this issue Jun 7, 2024 · 8 comments
Open

[BUG] Overlay network not found on worker node #11894

thormme opened this issue Jun 7, 2024 · 8 comments
Assignees
Labels

Comments

@thormme
Copy link

thormme commented Jun 7, 2024

Description

Issue:
Swarm worker hosts fail to attach to manager node overlay networks unless a container has been manually started and attached to the network using docker run --network swarm-overlay

Expected Behavior:
This should automatically attach to the overlay network and it should be visible in the docker network info.

$> docker network ls
8e3c351af333   bridge             bridge    local
0cbc0420c111   docker_gwbridge    bridge    local
x8gb7mz6s222   swarm-overlay      overlay   swarm
c09ad17a7321   host               host      local
keth4xuub123   ingress            overlay   swarm
d8baa27f3654   none               null      local

Workaround:
The only solution I have found is to downgrade to an earlier version (2.21.0-1) of docker-compose-plugin

sudo apt list -a docker-compose-plugin
sudo apt install docker-compose-plugin=2.21.0-1~debian.11~bullseye

I believe this is the same issue as #11387 but i couldn't find any open bugs with the same issue.

Thanks for any help with this!

Steps To Reproduce

I created a custom overlay network on the swarm manager node.

...
  service:
    image: service-image
    container_name: service
    networks:
      - swarm-overlay
    restart: unless-stopped
...
networks:
  swarm-overlay:
    attachable: true
    driver: overlay

This correctly created the network and attached the relevant container to it.

I then joined a worker host to the swarm and attempted to connect a container to the overlay network.

...
worker-service:
    image: worker-image
    container_name: worker-service
    networks:
      swarm-overlay:
        aliases:
          - host1-worker-service
    restart: unless-stopped
...
networks:
  swarm-overlay:
    external: true
    driver: overlay

docker compose up -d worker-service
This errors with:

Error response from daemon: network swarm-overlay not found

Compose Version

docker-compose-plugin/bullseye 2.27.1-1~debian.11~bullseye
Docker Compose version v2.27.1

Docker Environment

Client: Docker Engine - Community
 Version:    26.1.4
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.14.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.27.1
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 12
  Running: 5
  Paused: 0
  Stopped: 7
 Images: 31
 Server Version: 26.1.4
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: active
  NodeID: 2brhg9vzj8m47oyo40ie5yj0u
  Is Manager: false
  Node Address: 1.2.3.4
  Manager Addresses:
   4.3.2.1:2377
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: d2d58213f83a351ca8f528a95fbd145f5654e957
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 5.10.0-28-cloud-amd64
 Operating System: Debian GNU/Linux 11 (bullseye)
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 13.42GiB
 Name: cloud-machine
 ID: 6c0ae974-1ba3-450a-ab03-d31b31c6097f
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Anything else?

No response

@ndeloof
Copy link
Contributor

ndeloof commented Jun 10, 2024

This isn't the same issue as #11387 as here this is the docker engine reporting error: Error response from daemon: network swarm-overlay not found

Can you please confirm you can use docker run --network swarm-overlay ... to run equivalent container on worked node with this swarm setup ?

@jsunstrom
Copy link

I'm running into this exact same issue using Docker Compose 2.27.0. I can confirm that I can use docker run -it --name alpine1 --network test-net alpine from the official documentation. I walked through the entirety of the "Use an overlay network for standalone containers" and it worked as expected.

However, using docker compose files, I also get the error Error response from daemon: network <my network name here> not found message using docker compose up -d.

@ambretanmay
Copy link

ambretanmay commented Jun 11, 2024

I am having the exact same issue.
Docker Compose version v2.27.1
@ndeloof docker run --network swarm-overlay works and compose doesn't

@inql
Copy link

inql commented Jun 27, 2024

btw is the downgrade workaround needed for both leader and worker node?

@ambretanmay
Copy link

@inql I have not tested this as our scripts set versions for all nodes.

@michaelmcandrew
Copy link

michaelmcandrew commented Jul 3, 2024

Hey there, also affected by this bug.

If you don't want to downgrade another workaround is to create a container and attach it to the network. It then appears in the list and docker compose no longer complains

docker run -dit --name keep-alive --network --restart=always <network_name> alpine

Adding --restart=always will ensure that it survives restarts of the docker daemon, etc.

My versions in case it is useful:

docker version

Client: Docker Engine - Community
Version: 27.0.3
API version: 1.46
Go version: go1.21.11
Git commit: 7d4bcd8
Built: Sat Jun 29 00:02:50 2024
OS/Arch: linux/amd64
Context: default

Server: Docker Engine - Community
Engine:
Version: 27.0.3
API version: 1.46 (minimum version 1.24)
Go version: go1.21.11
Git commit: 662f78c
Built: Sat Jun 29 00:02:50 2024
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.7.18
GitCommit: ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
runc:
Version: 1.7.18
GitCommit: v1.1.13-0-g58aa920
docker-init:
Version: 0.19.0
GitCommit: de40ad0

docker compose version

Docker Compose version v2.28.1

@kulpsin
Copy link

kulpsin commented Jul 4, 2024

As in above, sorry did not realise that @michaelmcandrew also mentioned this but at least this comment confirms his findings: #11894 (comment)

I tested this issue and noticed that if there exists running container which has connection to the external overlay network (started with docker run ... and visible in docker network ls), then the compose is able to connect to the external overlay network.

So, without knowing anything about internals, the problem might have something to do with not checking for available external overlay networks but instead checking just internal networks (visible with docker network ls).

So as an additinal workaround it is possible to first start "dummy" container on workers via for example:

$ docker compose up -d
Error response from daemon: network <overlay-network> not found
$ run -dit --rm --name dummy-network-container --network <overlay-network> alpine
43924b1b25ac73373aac9120b55ac46fc1de3435ce26485682e11d6c06671936
$ docker compose up -d
[+] Running 1/0
 ✔ Container worker-service  Started
$ _

I also checked downgrading and for Ubuntu 22.04 it worked, so I think I will be using downgraded version for now myself.
sudo apt-get remove docker-compose-plugin && sudo apt-get install docker-compose-plugin=2.21.0-1~ubuntu.22.04~jammy

$ docker version
Client: Docker Engine - Community
 Version:           27.0.3
 API version:       1.46
 Go version:        go1.21.11
 Git commit:        7d4bcd8
 Built:             Sat Jun 29 00:02:33 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          27.0.3
  API version:      1.46 (minimum version 1.24)
  Go version:       go1.21.11
  Git commit:       662f78c
  Built:            Sat Jun 29 00:02:33 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.18
  GitCommit:        ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
 runc:
  Version:          1.7.18
  GitCommit:        v1.1.13-0-g58aa920
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

$ docker compose version
Docker Compose version v2.28.1

@ndeloof
Copy link
Contributor

ndeloof commented Jul 4, 2024

@kulpsin docker network ls indeed does not detect overlay networks created on another swarm node (not sure about the reason, but that's what we get with the engine API) until it is used by some container. So Docker Compose can't check network existence, but should detect swarm is enabled and ignore error (assuming container create will fail if there's an actual missing network). See

compose/pkg/compose/create.go

Lines 1334 to 1340 in 11d5ecd

if enabled {
// Swarm nodes do not register overlay networks that were
// created on a different node unless they're in use.
// So we can't preemptively check network exists, but
// networkAttach will later fail anyway if network actually doesn't exists
return nil
}

Not sure why this doesn't work as expected, need to setup a test environment and try to reproduce this bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants