Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Sporadically fails to create reaper in GitHub Actions #2172

Open
abemedia opened this issue Jan 28, 2024 · 46 comments · Fixed by #2648
Open

[Bug]: Sporadically fails to create reaper in GitHub Actions #2172

abemedia opened this issue Jan 28, 2024 · 46 comments · Fixed by #2648
Labels
bug An issue with the library

Comments

@abemedia
Copy link
Contributor

Testcontainers version

v0.27.0

Using the latest Testcontainers version?

Yes

Host OS

Ubuntu 22.04.3 LTS

Host arch

amd64

Go version

1.20

Docker version

GitHub Actions 🤷🏼‍♂️

Docker info

GitHub Actions 🤷🏼‍♂️

What happened?

This does not seem to happen when running locally but occasionally my tests fail in GitHub Actions with the error creating reaper failed.

Relevant log output

2024/01/28 17:56:02 github.com/testcontainers/testcontainers-go - Connected to docker: 
  Server Version: 24.0.7
  API Version: 1.43
  Operating System: Ubuntu 22.04.3 LTS
  Total Memory: 6921 MB
  Resolved Docker Host: unix:///var/run/docker.sock
  Resolved Docker Socket Path: /var/run/docker.sock
  Test SessionID: 1db7667978fa78cee3873c39e54eaa6e9e4e8bd9b8a5025ceb0b2b7c20578725
  Test ProcessID: c0f2e63c-5ad4-4ae2-a60d-de0250ff089f
2024/01/28 17:56:02 🔥 Reaper obtained from Docker for this test session 24e370b447d41033419fea8aefd579d45338a16882557d81a9ee8e0df38cfaad
2024/01/28 17:56:02 port not found: creating reaper failed: failed to create container

Additional information

No response

@abemedia abemedia added the bug An issue with the library label Jan 28, 2024
@Bablzz
Copy link
Contributor

Bablzz commented Feb 1, 2024

Hi @abemedia ! I hope you are doing well! can you share link to the job? I check actions in this repo - repear works fine - for example - https://github.com/testcontainers/testcontainers-go/actions/runs/7730045562/job/21075148438#step:10:41

@kostyay
Copy link

kostyay commented Feb 13, 2024

Did you manage to solve this issue? I'm facing an issue with reaper failing with the following error.. it only happens on GHA:

2024/02/13 14:47:24 github.com/testcontainers/testcontainers-go - Connected to docker: 
  Server Version: 25.0.3
  API Version: 1.43
  Operating System: Alpine Linux v3.19 (containerized)
  Total Memory: 31291 MB
  Resolved Docker Host: unix:///run/docker.sock
  Resolved Docker Socket Path: /run/docker.sock
  Test SessionID: c00c76c83c7180c78797a94d2146b5216f91200d033f9eb87efd620b53b27115
  Test ProcessID: e60189dd-ddfc-4b4c-98ab-1[70](xxx)99ea7a937
2024/02/13 14:47:24 🐳 Creating container for image testcontainers/ryuk:0.6.0
    internal_handler_test.go:283: failed to start container: look up reaper container returned nil although creation failed due to name conflict: creating reaper failed: failed to create container

@pablocalvo-zh
Copy link

Did you manage to solve this issue? I'm facing an issue with reaper failing with the following error.. it only happens on GHA:

2024/02/13 14:47:24 github.com/testcontainers/testcontainers-go - Connected to docker: 
  Server Version: 25.0.3
  API Version: 1.43
  Operating System: Alpine Linux v3.19 (containerized)
  Total Memory: 31291 MB
  Resolved Docker Host: unix:///run/docker.sock
  Resolved Docker Socket Path: /run/docker.sock
  Test SessionID: c00c76c83c7180c78797a94d2146b5216f91200d033f9eb87efd620b53b27115
  Test ProcessID: e60189dd-ddfc-4b4c-98ab-1[70](xxx)99ea7a937
2024/02/13 14:47:24 🐳 Creating container for image testcontainers/ryuk:0.6.0
    internal_handler_test.go:283: failed to start container: look up reaper container returned nil although creation failed due to name conflict: creating reaper failed: failed to create container

I am getting the same error apparently.

@pablocalvo-zh
Copy link

pablocalvo-zh commented Feb 19, 2024

@kostyay I was able to bypass this error by cleaning up containers on the tests and then using TESTCONTAINERS_RYUK_DISABLED: true. FYI I realize this happens mostly when there are multiple tests spawning containers.

https://java.testcontainers.org/features/configuration/

@webstradev
Copy link

I am experiencing this exact same issue in gitlab runners when running a few different jobs that are all using testcontainers.
4/5 of my jobs always pass and one job (which runs tests in parallel) always fails when the other 4 are running but passes when I retry it (and it is running by itself).

@abdulkk49
Copy link

I am getting a similar error in Github Actions:

Server Version: 25.0.2
  API Version: 1.44
  Operating System: Ubuntu 20.04.6 LTS
  Total Memory: 7751 MB
  Resolved Docker Host: unix:///var/run/docker.sock
  Resolved Docker Socket Path: /var/run/docker.sock
2024/03/19 13:49:59 🐳 Creating container for image testcontainers/ryuk:0.6.0
        	Error:      	Received unexpected error:
        	            	look up reaper container returned nil although creation failed due to name conflict: creating reaper failed: failed to create container

@shepherdjerred
Copy link

I've seen this quite frequently in Jenkins when running a large number of tests. Disabling the reaper with TESTCONTAINERS_RYUK_DISABLED fixed the issue for me.

If you're in an ephemeral environment enabling TESTCONTAINERS_RYUK_DISABLED should be a fine workaround without any negative consequences.

@tuxxi
Copy link

tuxxi commented Mar 19, 2024

Adding a data point here: we never hit this until upgrading testcontainers-go from 0.19 to 0.28.
Soon after, starting hitting this issue fairly regularly (I would estimate about 1 out of 4 jobs fail due to either

look up reaper container returned nil although creation failed due to name conflict

or

connecting to reaper failed: failed to create container

@bhlox
Copy link

bhlox commented May 11, 2024

Still having this issue. I notice this when multiple containers are spawned. But when done individually, no errors of reaper or port issues occurred. Having the (cached) text makes the test pass.

--- PASS: TestProducts (3.18s)
    --- PASS: TestProducts/getting_product_of_23 (0.00s)
    --- PASS: TestProducts/getting_product_of_14 (0.00s)
    --- PASS: TestProducts/getting_product_of_4fh555sd (0.00s)
    --- PASS: TestProducts/getting_product_of_-23.42 (0.00s)
    --- PASS: TestProducts/getting_product_of_234234234 (0.00s)
    --- PASS: TestProducts/creating_product_case_index:_0 (0.36s)
    --- PASS: TestProducts/creating_product_case_index:_1 (0.25s)
    --- PASS: TestProducts/creating_product_case_index:_2 (0.25s)
PASS
ok      github.com/bhlox/ecom/internal/services/product (cached)

@kdescoteaux-uptycs
Copy link

To narrow @tuxxi 's anecdote. I didn't see this until we upgraded from 0.26 to 0.27
which indirectly changes the ryuk image version as well.
Downgrading back is not an option because we did it to be able to use TESTCONTAINERS_HUB_IMAGE_NAME_PREFIX

@mdelapenya
Copy link
Member

mdelapenya commented Jun 12, 2024

Hi folks, I think in the most recent versions we have introduced some fixes to the synchronisation of the reaper over a single test session. Issues/PRs of interest could be:

I'd appreciate if you can check them out so we can close this one.

Thanks in advance for your support!

@MetalRex101
Copy link

@mdelapenya Just checked on Windows 11 with WSL2 docker.
Test container version: v0.31.0.
Had one error: "port not found: creating reaper failed: failed to create container".

@tbrown1979
Copy link

tbrown1979 commented Jun 23, 2024

TL;DR: You can terminate the containers directly and no longer need the reaper.

Hey, so I was having this issue. I'm using an old version of the project but I think the solution should work for newer versions.

On my ContainerRequest I added SkipReaper: true so that the ryuk reaper does not start up. Then I have some code that gets the created Container passed to it and uses it for my tests and that looks like this:

func TestMain(m *testing.M) {
	container, testDB, err := tools.CreateContainer("test-db")
	if err != nil {
		log.Fatal(err)
	}
	TestSqlDB = testDB

	result := m.Run()

	// Deferred functions aren't run when os.Exit is called..
	testDB.Close()
	container.Terminate(context.Background()) // nolint

	os.Exit(result)
}

My issue was originally that the container.Terminate was not running because it was deferred and the os.Exit was preventing the deferred functions from executing. This meant I had to depend on the reaper to clear the created containers. I only realized this once Github Actions started failing my builds. With skipping the reaper and fixing this deferred termination of the container everything runs better than it even did before. Containers start and die super fast.

@mdelapenya
Copy link
Member

I added SkipReaper: true so that the ryuk reaper does not start up

For the record, SkipReaper is deprecated in newer versions and will eventually disappear from the Request struct (see v1 branch: https://github.com/testcontainers/testcontainers-go/blob/v1/request.go#L30)

If you want to disable Ryuk, please use the properties file here: https://golang.testcontainers.org/features/configuration/#customizing-ryuk-the-resource-reaper

@demeralde
Copy link

demeralde commented Jul 6, 2024

I'm also experiencing this in a regular dev environment (it's not a GitHub Action).

It would be great if it just worked with Ryuk without needing to turn it off and manually manage the container state. I don't think this is a good solution. Ryuk should just work out of the box.

This is especially important because it's challenging to manage container state across many tests run in parallel if you want to reuse containers (instead of creating a new container for each test). I'd much rather have Ryuk handle this itself, so I can run my tests in parallel without having to manage logic for terminating the containers once all tests have finished.

Hopefully this bug gets fixed soon because it's been a PITA.

fmoura added a commit to cartesi/rollups-node that referenced this issue Jul 11, 2024
Due to the need to update testcontainers lib also
we needed to disable testcontainers reaper at
Github actions as well as it seems related to
testcontainers/testcontainers-go#2172
fmoura added a commit to cartesi/rollups-node that referenced this issue Jul 11, 2024
fmoura added a commit to cartesi/rollups-node that referenced this issue Jul 11, 2024
@stevenh
Copy link
Collaborator

stevenh commented Jul 13, 2024

Possibly related to this fix testcontainers/moby-ryuk#121 which needs a release and then testcontainers-go updating to use the new image version

Try cloning the moby-ryuk repo and running the following in it to replace the image that testcontainers-go uses to see if it does fix:

docker build -f linux/Dockerfile -t testcontainers/ryuk:0.7.0 .

@mdelapenya
Copy link
Member

You can terminate the containers directly and no longer need the reaper.

@tbrown1979 please remember that the reaper also removes built images, volumes and networks, so if you disable it please carefully remove all those Docker resources.

In any case, we have released a new version of Ryuk. As @stevenh pointed out in #2172 (comment), you could give it a try like that.

@sanatik
Copy link

sanatik commented Aug 2, 2024

@mdelapenya
I still see this issue on the latest v0.32.0 version.

2024/08/02 07:17:05 github.com/testcontainers/testcontainers-go - Connected to docker: 
  Server Version: 27.1.1
  API Version: 1.46
  Operating System: Alpine Linux v3.20 (containerized)
  Total Memory: 15633 MB
  Testcontainers for Go Version: v0.32.0
  Resolved Docker Host: unix:///var/run/docker.sock
  Resolved Docker Socket Path: /var/run/docker.sock
  Test SessionID: a5b705ed65f0[28](https://github.com/***/actions/runs/***?pr=13#step:7:29)356e9eea0744e7f92be030ccf0cb0b4a92ed3454a01776d488
  Test ProcessID: fe16248a-b0f0-4768-8477-ee0505fe69d8
{"level":"debug","message":"Failed to get image auth for https://index.docker.io/v1/. Setting empty credentials for the image: testcontainers/ryuk:0.7.0. Error is:open /home/runner/.docker/config.json: no such file or directory"}
{"level":"debug","message":"🐳 Creating container for image testcontainers/ryuk:0.7.0"}
------------------------------
[BeforeSuite] [PANICKED] [5.698 seconds]
[BeforeSuite] 
/home/runner/_work/***/internal/app/app_test.go:56

  [PANICKED] Test Panicked
  In [BeforeSuite] at: /home/runner/_work/***/internal/testcontainers/db.go:44 @ 08/02/24 07:17:11.[37](https://github.com/***/actions/runs/***?pr=13#step:7:38)1

  look up reaper container returned nil although creation failed due to name conflict: creating reaper failed: failed to create container

I have 2 tests running in parallel and one of them failing in 90% of the cases.

@stevenh
Copy link
Collaborator

stevenh commented Aug 2, 2024

The creation of reapers in the current version is racy, could you test from the branch in #2664 which should fix this issue.

@strowk
Copy link

strowk commented Aug 12, 2024

@stevenh , the PR is a draft and has "DO NOT MERGE!" on it. Does it mean, maybe, that this issue is not actually fixed and should be opened?

@mdelapenya
Copy link
Member

@strowk thanks for checking. I asked Steven to split that massive PR into smaller chunks and it's allowing me to review it in a more efficient manner. The parent issue is #2685, it contains all the fixes, one by one as subtasks. Thanks to @stevenh work, now it's much easier to see what's been fixed/done

@stevenh
Copy link
Collaborator

stevenh commented Aug 12, 2024

To build on @mdelapenya response the reaper fix extraction is still pending as there's some dependencies in the stack.

If your issue is fixed by using the branch in the big PR, then once the individual fixes are all merged we should be good, so testing with that for now is useful.

@erez-rabih
Copy link

not sure why this was closed, still experience it with the latest 0.33 release
any way to solve this?

@stevenh
Copy link
Collaborator

stevenh commented Sep 2, 2024

Yep seeing here too @mdelapenya can you reopen this, requires the reaper fixes.

@mdelapenya
Copy link
Member

Thanks for checking, reopening.

@mdelapenya mdelapenya reopened this Sep 2, 2024
@mdelapenya
Copy link
Member

Ah it was autoclosed by merging #2648

@erez-rabih
Copy link

thanks for re-opening this 🙏
is there a known workaround for this issue?

@stevenh
Copy link
Collaborator

stevenh commented Sep 3, 2024

Unfortunately not it requires a significant rewrite of the both the reaper itself and the reaper integration

@webstradev
Copy link

For us this happens about daily. It seems to be slightly less frequent after upgrading to 0.33 but not by much.

@sanatik
Copy link

sanatik commented Sep 3, 2024

We solved the problem by putting all the tests into the same package. In this case, a container spins up only once. It is far from ideal, but it works for small to medium projects.

@stevenh
Copy link
Collaborator

stevenh commented Sep 3, 2024

You could test the branch for this PR, which includes the reaper fixes, to see if that fixes the problem.

@calebmcelroy
Copy link

@stevenh Just wanted to chime in and say that #2664 fixed the issue for me. Thanks!

For anyone else who wants to use it as a patch until it’s merged, you can include it in go.mod like this:

// testcontainers-go v0.32.0 has bug with reaper when running tests in parallel.
// See Issue: https://github.com/testcontainers/testcontainers-go/issues/2172#issuecomment-2265169851
require github.com/testcontainers/testcontainers-go v0.32.0
replace github.com/testcontainers/testcontainers-go => github.com/stevenh/testcontainers-go v0.0.0-20240719184830-7292a5b57918

And if you're using any modules you can also replace those...

require github.com/testcontainers/testcontainers-go/modules/redis v0.32.0
replace github.com/testcontainers/testcontainers-go/modules/redis => github.com/stevenh/testcontainers-go/modules/redis v0.0.0-20240719184830-7292a5b57918

@mdelapenya
Copy link
Member

@stevenh I think the only PR that is missing from the bug bash was #2728, right? I'll go deep into it today.

@stevenh
Copy link
Collaborator

stevenh commented Sep 7, 2024

#2728 requires #2738 first and then there's a few other open PRs and then some more to raise once those are merged see: #2685

@TimJung
Copy link

TimJung commented Sep 16, 2024

replace github.com/testcontainers/testcontainers-go => github.com/stevenh/testcontainers-go v0.0.0-20240719184830-7292a5b57918

I've tried going this route to test if the patch would work for my project but unfortunately it did not work. I still get many failures with the following details:

create container: look up reaper container returned nil although creation failed due to name conflict: creating reaper failed

@borod108
Copy link
Contributor

replace github.com/testcontainers/testcontainers-go => github.com/stevenh/testcontainers-go v0.0.0-20240719184830-7292a5b57918

I've tried going this route to test if the patch would work for my project but unfortunately it did not work. I still get many failures with the following details:

create container: look up reaper container returned nil although creation failed due to name conflict: creating reaper failed

Same here!

@stevenh
Copy link
Collaborator

stevenh commented Sep 24, 2024

We're working our way through the full set of PR's the reaper one is next on the list, however some cases might need the new ryuk version too.

@borod108 could you clarify the error you're seeing?

@axilis-marko
Copy link

@shepherdjerred

Disabling the reaper with TESTCONTAINERS_RYUK_DISABLED fixed the issue for me.

How did you do this? I did

import "github.com/testcontainers/testcontainers-go" (v0.33.0)
...
testContainers := map[string]string{
	"TESTCONTAINERS_RYUK_DISABLED": "true",
}
testcontainers.WithEnv(testContainers)

before running any containers, but I'm still getting this issue.

  Couldn't setup testcontainers:
  Error: failed to start kafka container: create container: look up reaper container returned nil although creation failed due to name conflict: creating reaper failed
  Kafka Broker: 
  Redis Address: 

@mdelapenya
Copy link
Member

@axilis-marko the env variable must be set at the terminal/shell you use to run the tests, as documented here: https://golang.testcontainers.org/features/garbage_collector/#ryuk

@axilis-marko
Copy link

adding ryuk.disabled=true to the .testcontainers.properties file did not work for me.

TESTCONTAINERS_RYUK_DISABLED=true go test ./... worked!

I don't have control over env variables in my CI pipeline, so programatically this did the trick

func init() {
	err := os.Setenv("TESTCONTAINERS_RYUK_DISABLED", "true")
	if err != nil {
		panic(err)
	}
}

Thanks @mdelapenya!

@mdelapenya
Copy link
Member

I don't have control over env variables in my CI pipeline,

Is this because the CI is owned by other team? If so, you could ask them about that need and request it. Of course, I don't know the internal procedures you have for that.

so programatically this did the trick

The drawback here is that it will apply to everybody cloning the repo, and because env vars override the properties setting, everybody will have Ryuk disabled by default, which we usually disregard.

@axilis-marko
Copy link

Is this because the CI is owned by other team?

Yes. We'll probably ask them to expose a variable so we know we're in CI, so we can add this, and any future "hack" without pinging them constantly.

everybody will have Ryuk disabled by default

Currently, we prefer manually deleting our local containers than sporadic failures in the pipeline. As our CI/CD creates ephemeral pods for tests, they're gonna be destroyed either way.

@mdelapenya
Copy link
Member

Sounds good to me :) In that case, make sure you call testcontainers.TerminateContainer(ctr) for every container you spawn. In the library we ensure that while building it, and we do have two pipelines: with and without ryuk to verify all the tests are cleaning up what it creates.

@stevenh
Copy link
Collaborator

stevenh commented Oct 28, 2024

@shepherdjerred

before running any containers, but I'm still getting this issue.

  Couldn't setup testcontainers:
  Error: failed to start kafka container: create container: look up reaper container returned nil although creation failed due to name conflict: creating reaper failed
  Kafka Broker: 
  Redis Address: 

The message you list above isn't present in the current version of the code, could you make sure you're using the latest version or provide details of the test which is creating these messages please @shepherdjerred

@axilis-marko
Copy link

I was using v0.33.0. I've since updated to v0.34.0 for the testcontainers.TerminateContainer(ctr) function so it's possible that it's not a problem anymore. We've still left the Ryak disabled in the CI as it's an ephemeral environment and added the manual termination throughout our code. If we get the time we can test it with Ryak again just for the purpose of closing this issue.

@stevenh
Copy link
Collaborator

stevenh commented Oct 28, 2024

Thanks @axilis-marko 0.34.0 has a significant refactor of the reaper in it so would be good to know if that addresses this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An issue with the library
Projects
None yet
Development

Successfully merging a pull request may close this issue.