Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Fail to start Kafka container (KRaft) with basic setup #2543

Closed
ronshemerws opened this issue May 19, 2024 · 12 comments
Closed

[Bug]: Fail to start Kafka container (KRaft) with basic setup #2543

ronshemerws opened this issue May 19, 2024 · 12 comments
Labels
bug An issue with the library

Comments

@ronshemerws
Copy link

Testcontainers version

0.31.0

Using the latest Testcontainers version?

Yes

Host OS

Mac

Host arch

ARM

Go version

1.22.3

Docker version

Client: Docker Engine - Community
 Version:           24.0.6
 API version:       1.43
 Go version:        go1.21.0
 Git commit:        ed223bc820
 Built:             Thu Aug 31 17:24:32 2023
 OS/Arch:           darwin/arm64
 Context:           desktop-linux

Server: Docker Desktop 4.30.0 (149282)
 Engine:
  Version:          26.1.1
  API version:      1.45 (minimum version 1.24)
  Go version:       go1.21.9
  Git commit:       ac2de55
  Built:            Tue Apr 30 11:48:04 2024
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.31
  GitCommit:        e377cd56a71523140ca6ae87e30244719194a521
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Docker info

Client: Docker Engine - Community
 Version:    24.0.6
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.14.0-desktop.1
    Path:     /Users/ronshemer/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.27.0-desktop.2
    Path:     /Users/ronshemer/.docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.29
    Path:     /Users/ronshemer/.docker/cli-plugins/docker-debug
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.2
    Path:     /Users/ronshemer/.docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.23
    Path:     /Users/ronshemer/.docker/cli-plugins/docker-extension
  feedback: Provide feedback, right in your terminal! (Docker Inc.)
    Version:  v1.0.4
    Path:     /Users/ronshemer/.docker/cli-plugins/docker-feedback
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v1.1.0
    Path:     /Users/ronshemer/.docker/cli-plugins/docker-init
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     /Users/ronshemer/.docker/cli-plugins/docker-sbom
  scout: Docker Scout (Docker Inc.)
    Version:  v1.8.0
    Path:     /Users/ronshemer/.docker/cli-plugins/docker-scout

Server:
 Containers: 6
  Running: 1
  Paused: 0
  Stopped: 5
 Images: 38
 Server Version: 26.1.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: e377cd56a71523140ca6ae87e30244719194a521
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  seccomp
   Profile: unconfined
  cgroupns
 Kernel Version: 6.6.26-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: aarch64
 CPUs: 10
 Total Memory: 15.6GiB
 Name: docker-desktop
 ID: 4c6eebbc-c190-4257-96b7-65293c5d4add
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Labels:
  com.docker.desktop.address=unix:///Users/ronshemer/Library/Containers/com.docker.docker/Data/docker-cli.sock
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: daemon is not using the default seccomp profile

What happened?

I followed the documentation on how to use kafka testcontainers and at first it worked randomly (got the container running), and now it does not work constantly ending up with the "port not found" error after waiting the 1m timeout.

Relevant log output

⚡ go run .
2024/05/19 16:40:21 github.com/testcontainers/testcontainers-go - Connected to docker:
  Server Version: 26.1.1
  API Version: 1.44
  Operating System: Docker Desktop
  Total Memory: 15976 MB
  Resolved Docker Host: unix:///var/run/docker.sock
  Resolved Docker Socket Path: /var/run/docker.sock
  Test SessionID: 46c175eaf04deba9d5b7bdc51177a827b389a4d7cec93f329a3bef4e700b2154
  Test ProcessID: c8a51e52-8aa0-4514-aa6d-d487aff9d1be
2024/05/19 16:40:21 🐳 Creating container for image testcontainers/ryuk:0.7.0
2024/05/19 16:40:21 ✅ Container created: cad4a0e8592a
2024/05/19 16:40:21 🐳 Starting container: cad4a0e8592a
2024/05/19 16:40:21 ✅ Container started: cad4a0e8592a
2024/05/19 16:40:21 🚧 Waiting for container id cad4a0e8592a image: testcontainers/ryuk:0.7.0. Waiting for: &{Port:8080/tcp timeout:<nil> PollInterval:100ms}
2024/05/19 16:40:22 🔔 Container is ready: cad4a0e8592a
2024/05/19 16:40:22 🐳 Creating container for image confluentinc/confluent-local:7.5.0
2024/05/19 16:40:22 ✅ Container created: e1bc25805a7b
2024/05/19 16:40:22 🐳 Starting container: e1bc25805a7b
2024/05/19 16:41:22 ✅ Container started: e1bc25805a7b
2024/05/19 16:41:22 container logs (port not found
context deadline exceeded):

2024/05/19 16:41:22 failed to start container: failed to start container: port not found
context deadline exceeded
exit status 1

Additional information

This is the complete program I am running to reproduce (started an empty project):

package main

import (
	"context"
	"log"

	"github.com/testcontainers/testcontainers-go/modules/kafka"
)

func main() {
	ctx := context.Background()

	kafkaContainer, err := kafka.RunContainer(ctx,
		kafka.WithClusterID("test-cluster"),
	)
	if err != nil {
		log.Fatalf("failed to start container: %s", err)
	}

	// Clean up the container after
	defer func() {
		if err := kafkaContainer.Terminate(ctx); err != nil {
			log.Fatalf("failed to terminate container: %s", err)
		}
	}()
}

Here is the mod file

module kafka-test

go 1.22.3

require (
	dario.cat/mergo v1.0.0 // indirect
	github.com/Azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1 // indirect
	github.com/Microsoft/go-winio v0.6.1 // indirect
	github.com/Microsoft/hcsshim v0.11.4 // indirect
	github.com/cenkalti/backoff/v4 v4.2.1 // indirect
	github.com/containerd/containerd v1.7.15 // indirect
	github.com/containerd/log v0.1.0 // indirect
	github.com/cpuguy83/dockercfg v0.3.1 // indirect
	github.com/distribution/reference v0.5.0 // indirect
	github.com/docker/docker v25.0.5+incompatible // indirect
	github.com/docker/go-connections v0.5.0 // indirect
	github.com/docker/go-units v0.5.0 // indirect
	github.com/felixge/httpsnoop v1.0.4 // indirect
	github.com/go-logr/logr v1.4.1 // indirect
	github.com/go-logr/stdr v1.2.2 // indirect
	github.com/go-ole/go-ole v1.2.6 // indirect
	github.com/gogo/protobuf v1.3.2 // indirect
	github.com/golang/protobuf v1.5.4 // indirect
	github.com/google/uuid v1.6.0 // indirect
	github.com/klauspost/compress v1.16.7 // indirect
	github.com/lufia/plan9stats v0.0.0-20211012122336-39d0f177ccd0 // indirect
	github.com/magiconair/properties v1.8.7 // indirect
	github.com/moby/patternmatcher v0.6.0 // indirect
	github.com/moby/sys/sequential v0.5.0 // indirect
	github.com/moby/sys/user v0.1.0 // indirect
	github.com/moby/term v0.5.0 // indirect
	github.com/morikuni/aec v1.0.0 // indirect
	github.com/opencontainers/go-digest v1.0.0 // indirect
	github.com/opencontainers/image-spec v1.1.0 // indirect
	github.com/pkg/errors v0.9.1 // indirect
	github.com/power-devops/perfstat v0.0.0-20210106213030-5aafc221ea8c // indirect
	github.com/shirou/gopsutil/v3 v3.23.12 // indirect
	github.com/shoenig/go-m1cpu v0.1.6 // indirect
	github.com/sirupsen/logrus v1.9.3 // indirect
	github.com/testcontainers/testcontainers-go v0.31.0 // indirect
	github.com/testcontainers/testcontainers-go/modules/kafka v0.31.0 // indirect
	github.com/tklauser/go-sysconf v0.3.12 // indirect
	github.com/tklauser/numcpus v0.6.1 // indirect
	github.com/yusufpapurcu/wmi v1.2.3 // indirect
	go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.49.0 // indirect
	go.opentelemetry.io/otel v1.24.0 // indirect
	go.opentelemetry.io/otel/metric v1.24.0 // indirect
	go.opentelemetry.io/otel/trace v1.24.0 // indirect
	golang.org/x/crypto v0.22.0 // indirect
	golang.org/x/mod v0.16.0 // indirect
	golang.org/x/sys v0.19.0 // indirect
	golang.org/x/tools v0.13.0 // indirect
	google.golang.org/genproto/googleapis/rpc v0.0.0-20230731190214-cbb8c96f2d6d // indirect
	google.golang.org/grpc v1.58.3 // indirect
	google.golang.org/protobuf v1.33.0 // indirect
)

@ronshemerws ronshemerws added the bug An issue with the library label May 19, 2024
wilsouza pushed a commit to wilsouza/testcontainers-go that referenced this issue May 22, 2024
wilsouza pushed a commit to wilsouza/testcontainers-go that referenced this issue May 22, 2024
@wilsouza
Copy link
Contributor

I am also experiencing this issue and have been able to reproduce it. It appears to be related to the PostStartLifecycleHook. The Kafka module uses the PostStart LifecycleHook to get the mapped port, which is used to create the Kafka startup script. However, when PostStart is called, the Docker container has not yet resolved the port mapping, causing the port not found error.

While I am not certain if there is a straightforward solution, I have found a less elegant solution to resolve this issue. Simply adding a sleep at the beginning of the Kafka PostStart hook works fine in my tests, and we can make it configurable as needed.

I will suggest a PR for review.

@mdelapenya, do you know of a way to wait until Docker resolves the port mapping before the PostStart hook?

wilsouza pushed a commit to wilsouza/testcontainers-go that referenced this issue May 23, 2024
This reverts commit 225f6dc.
It was done just to undertand the cause and now we can revert it.
wilsouza pushed a commit to wilsouza/testcontainers-go that referenced this issue May 25, 2024
wilsouza pushed a commit to wilsouza/testcontainers-go that referenced this issue May 25, 2024
wilsouza pushed a commit to wilsouza/testcontainers-go that referenced this issue May 25, 2024
This reverts commit 225f6dc.
It was done just to undertand the cause and now we can revert it.
@mdonkers
Copy link
Contributor

@wilsouza , with adding a delay, did the problem go away completely for you?

My colleague @grevend is also running into this issue, on a Mac (for Linux I cannot reproduce it). And we both tried adding delays, but also retries, and neither worked consistently.

So we are now wondering if its really the same issue, or slightly different.
If we look into the response from the MappedPort request it's sometimes correct, but other times not having any data in the NetworkSettings.Ports section of the Docker inspect command.

Note we're still debugging further, seeing if we can find something to make it work more consistently, but so far it either works, or not (delays etc having no positive effect).

@thmshmm
Copy link

thmshmm commented May 29, 2024

I experience the same behaviour with the Redpanda module. Also tried older versions of testcontainers with the same result. Recently updated to the latest Docker Desktop.

@wilsouza
Copy link
Contributor

Thank you for sharing the details of the problems you're encountering.

@wilsouza , with adding a delay, did the problem go away completely for you?

Yes, adding a delay has completely resolved the issue for me. I'm running an internal version with this solution and haven't experienced any problems. I’ve also tested it on my Mac with an M1 processor without issues.

My colleague @grevend is also running into this issue, on a Mac (for Linux I cannot reproduce it). And we both tried adding delays, but also retries, and neither worked consistently.

So we are now wondering if its really the same issue, or slightly different. If we look into the response from the MappedPort request it's sometimes correct, but other times not having any data in the NetworkSettings.Ports section of the Docker inspect command.

Note we're still debugging further, seeing if we can find something to make it work more consistently, but so far it either works, or not (delays etc having no positive effect).

Have you tried running version #2552 with a custom wait time? Could you try running it with a custom timeout to see if it resolves the issue on your end?

@mdonkers
Copy link
Contributor

@wilsouza we tried with waits, unfortunately though, the wait needs to be pretty long to be stable (3-5 seconds). What makes that unacceptable from my perspective, is that the wait isn't needed on Linux based systems. Even if for a second, it would add a second to every test run, for every time the Kafka container needs to be started.

We also further looked into why some retry mechanism doesn't work and found the cause. MappedPort calls the Inspect function, which always caches the first result it got:

inspect, err := c.Inspect(ctx)

return c.raw, nil

Which is then the reason why a retry doesn't work (if it once gets a wrong response, it always gets the wrong response), and a initial delay somehow works as it gives Docker a chance to set up the port mapping.

I'm happy to provide assistance to any solution, but would really like to hear @mdelapenya 's perspective. Because I'm wondering if caching that Inspect response is the right thing to do. As I guess most fields would be stable, but next to the network there might be some other fields that change such as the State ones? Always caching would give a high potential of returning wrong info.

And of course the question remains why the port mapping has some delay in the first place for MacOS, but maybe that's due to the nature of how Docker works there.

@Eyal-Shalev
Copy link

And of course the question remains why the port mapping has some delay in the first place for MacOS, but maybe that's due to the nature of how Docker works there.

Maybe it relates to the fact MacOS uses a Virtual Machine to host docker containers 🤔

Maybe the port is mapped in the VM, but not in the MacOS host machine.

@mgilbir
Copy link

mgilbir commented Jun 14, 2024

For the record, I have this behaviour also on a linux machine running ubuntu:

  Server Version: 26.1.4
  API Version: 1.44
  Operating System: Ubuntu 22.04.4 LTS
  Total Memory: 7771 MB
  Resolved Docker Host: unix:///var/run/docker.sock
  Resolved Docker Socket Path: /var/run/docker.sock
  Test SessionID: 109ffb6f0068199f8347d3954fd3f1fb31f80a5ffc016cd4ea231e7a3193510d
  Test ProcessID: 5dcbcb80-8058-4b5a-a12b-37d4d47fd291
  [...]
  kafka_test.go:13: 
        	Error Trace:	[...]/kafka_test.go:13
        	Error:      	Received unexpected error:
        	            	failed to start container: context deadline exceeded

wilsouza pushed a commit to wilsouza/testcontainers-go that referenced this issue Jun 15, 2024
wilsouza pushed a commit to wilsouza/testcontainers-go that referenced this issue Jun 15, 2024
wilsouza pushed a commit to wilsouza/testcontainers-go that referenced this issue Jun 15, 2024
This reverts commit 225f6dc.
It was done just to undertand the cause and now we can revert it.
@kishansairam9
Copy link

any update on this?

@dramosOptiply
Copy link

Got hit with this bug today on my Mac. Happens randomly. I'm using the latest testcontainers version.

@wilsouza
Copy link
Contributor

I think that this issue can be closed now that it has been addressed in @mdelapenya PR, #2606. What do you think @mdelapenya?
I’m unable to validate if it’s still occurring at this time.

@ronshemerws
Copy link
Author

This issue seem to be resolved for me when upgraded to v0.32.0. I will close this now, thanks to everyone that was involved, this is much appreciated!

@mdelapenya
Copy link
Collaborator

mdelapenya commented Jul 11, 2024

thanks folks for your support! I'm currently at Gophercon, and I will go back to regular work next Monday 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An issue with the library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants