-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add HEALTHCHECK to Linux Docker image. #128
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that it the ryuk container can be exposed by the Docker API before the container is ready to accept connections. A testcontainer client can attempt to connect to the container before it is accepting connections.
Out of curiosity and to understand and compare the behavior (issue) with the .NET implementation: Does Go create a single Ryuk instance for all packages? Does one package connect successfully and the others fail? Does it retry to connect if it fails?
windows/Dockerfile
Outdated
@@ -25,3 +25,4 @@ FROM ${BASE_IMAGE} | |||
COPY --from=build /bin/ryuk /bin/ryuk | |||
CMD ["/bin/ryuk"] | |||
LABEL org.testcontainers.ryuk=true | |||
HEALTHCHECK --start-period=5s --interval=2s --timeout=2s --retries=10 CMD netstat -ltn | grep -q ":8080" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think the CMD
will work on Windows (nanoserver).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, yes it won't. Sorry- I was a bit sleep deprived making this PR. I don't have access to a windows system but I can do some research to find a simple command that works in nanoserver!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I'm not on windows, and am not familiar with nanoserver, I thought it best to revert the changes on windows/Dockerfile
. It would be best for someone with Windows experience to implement and test a Windows health check.
Yes, exactly.
One package typically connects successfully. With the other packages, it's a race: if they get the container before it's ready to accept connections, they fail. In my testing, all remaining packages fail. But I can imagine in a larger codebase, some packages would succeed. How does .NET work: is it a single ryuk instance? For example, can you launch two tests in parallel and re-create the behavior (issue)? |
With respect to the languages/platforms supported by If other platforms are affected by this same issue, then they'll need a patch similar to testcontainers/testcontainers-go#2508, which actually uses the results of the healthcheck before returning a (healthy) ryuk container. |
Thanks for the quick explanation. .NET behaves different here. It creates a Ryuk instance for each test assembly (test project). Test assemblies are kind of isolated from each other. We do not need to deal with synchronization here. |
I did encounter some things that I can't explain during my investigation, so I wanted to add them here for full transparency. TL;DR: I failed to identify which component was denying the connection. I have an overly-simple mental model of how Docker API+Docker Engine work together with their containers, so these issues could just be my ignorance. The error message indicates the connection was denied, not a timeout.So who denied the connection? The Docker engine? It wasn't the I did not observe the denied TCP connectionWhen I was troubleshooting, I used |
hey @HofmeisterAn, yes, Go creates one Ryuk container per test session, which implies some synchronisation logic to check if the ryuk container is in the running state before creating a new one. @emetsger for the repro repository, I cloned it and it works for me with both values of Using all processors
|
@mdelapenya I have the same result with testcontainers-go and Docker Desktop, but when I run Docker with Colima I randomly have the issue when TestContainers wants to connect to Ryuk (95% of the time it triggers the issue) |
Pinging @eddumelendez and @kiview as this is a cross-lang feature. |
Can we decouple the switch to using a @HofmeisterAn I guess we didn't fully sync up on the current approach to test session semantics (and technical measures to achieve them) across languages, back when we introduced this in Go (and AFAIK in Node as well). From a technical PoV, Go and .NET should be in a similar situation. |
Closing as per testcontainers/testcontainers-go#2508 (comment) Thanks! |
Summary
I ran across an issue using testcontainers in golang, specifically when testcontainers are used by multiple test packages in a single project. Go builds a test binary for each package and runs them simultaneously.
Example project tree, where test binaries for packages
a
,b
,c
, andd
will run simultaneously, attempting to use a testcontainer:One of the test binaries will win the race to create the ryuk container, and it becomes present in the list of docker containers returned by the Docker API.
The problem is that it the ryuk container can be exposed by the Docker API before the container is ready to accept connections. A testcontainer client can attempt to connect to the container before it is accepting connections.
This results in error messages tests like:
Clone this test case repo and run
go test -count=1 -v ./...
and observe the errors.Run
go test -count=1 -p 1 -v ./...
and tests should pass.Prior issues
Race conditions (and the
connection refused
error message) have been the topic of previous issues, c.f.But this PR addresses the situation described here by adding a simple healthcheck using
netstat
(already present in the alpine image).Future Work
Once healthcheck information is exposed by the
ryuk
container, the lookupReaperContainer func can retry until the status is healthy.I'm not on a Windows platform, and I'm not familiar with nanoserver and whether it comes with Powershell, and whether a command like
netstat -an | Select-String 8080
will work. This PR only adds aHEALTHCHECK
for the linux platform.Unintended Consequences
Adding a healthcheck to ryuk may increase its perceived startup time.
Example Healthcheck Output
Before healthcheck (current state, i.e.
main
):After healthcheck (this PR):