Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(reaper): refactor to allow retries and fix races #2728

Merged
merged 1 commit into from
Oct 18, 2024

Conversation

stevenh
Copy link
Collaborator

@stevenh stevenh commented Aug 9, 2024

Refactor how the reaper is created to allow for proper retries of temporary errors such as container not found issues during startup or shutdown races.

This eliminates the use of sync.Once which wasn't solving the problem at hand and replaces it with a singleton spawner with locked access.

Wrap reaper errors so we can determine the cause of failures more easily.

Fix race condition in port wait when loading from container by always waiting for the port first.

Remove unnecessary use of buffering and invalid retry logic in reaper connection handling which could never recover correctly from a partial read.

Move the reaper creation just before connections are established in compose to ensure its still running when the Connect calls are made.

Previously the reaper was requested in NewDockerComposeWith which means it could have already shutdown before connections are made during the later sections of Up if the startup took over 1 minute.

This was causing consistent failures for:
TestDockerComposeAPIWithWaitLogStrategy

Ensure that resource labels are correct so that resources aren't reaped when the reaper is disabled by excluding session id when reaper is disabled.

Error when creating a reaper when the config says it's disabled so that we avoid hard to debug issues because a reaper is running when it shouldn't be.

@stevenh stevenh requested a review from a team as a code owner August 9, 2024 15:48
Copy link

netlify bot commented Aug 9, 2024

Deploy Preview for testcontainers-go ready!

Name Link
🔨 Latest commit 8bfe616
🔍 Latest deploy log https://app.netlify.com/sites/testcontainers-go/deploys/671230eb36030b0008fff0d2
😎 Deploy Preview https://deploy-preview-2728--testcontainers-go.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@stevenh stevenh marked this pull request as draft August 9, 2024 15:55
@stevenh
Copy link
Collaborator Author

stevenh commented Aug 9, 2024

Depends on #2738

Copy link
Member

@mdelapenya mdelapenya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a commit on top to satisfy the usage of several missing functions (see comments).

Other than than, I merged the branch with current main and resolved conflicts.

docker.go Show resolved Hide resolved
docker.go Show resolved Hide resolved
reaper.go Show resolved Hide resolved
reaper.go Show resolved Hide resolved
@stevenh
Copy link
Collaborator Author

stevenh commented Sep 8, 2024

@mdelapenya reminder this is dependent on #2728 which is why it has missing components still.

I'll rebase once #2738 is merged.

@stevenh stevenh force-pushed the fix/reaper-retries-race branch from e7b948d to 5de803c Compare September 12, 2024 14:01
@stevenh
Copy link
Collaborator Author

stevenh commented Sep 12, 2024

Rebased and picking up again, hopefully wont require many fixes. Will convert from draft when ready.

@stevenh stevenh force-pushed the fix/reaper-retries-race branch 3 times, most recently from bf961f0 to 7649e0c Compare September 13, 2024 08:34
@stevenh stevenh marked this pull request as ready for review September 13, 2024 13:12
@stevenh stevenh requested a review from mdelapenya September 13, 2024 13:12
Copy link
Member

@mdelapenya mdelapenya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only added a few super minor comments, other than that, my only concern is about the implications of these changes with the reuse mode we are proposing in #2768

I'm fine merging this, although I'd like to discuss with you about reuse more in depth, as it could apply to the reaper code as well.

reaper.go Outdated Show resolved Hide resolved
reaper.go Show resolved Hide resolved
reaper_test.go Show resolved Hide resolved
@stevenh stevenh force-pushed the fix/reaper-retries-race branch from 7649e0c to 4182991 Compare September 13, 2024 19:38
@stevenh stevenh requested a review from mdelapenya September 13, 2024 19:39
@stevenh stevenh marked this pull request as draft September 13, 2024 23:16
@mdelapenya
Copy link
Member

@stevenh is there anything I can do to move forward with this PR?

@stevenh stevenh force-pushed the fix/reaper-retries-race branch from 4182991 to 7bc2a87 Compare September 19, 2024 13:23
@stevenh stevenh marked this pull request as ready for review September 19, 2024 13:24
@stevenh stevenh marked this pull request as draft September 19, 2024 14:17
@stevenh
Copy link
Collaborator Author

stevenh commented Sep 19, 2024

Blocked on #2786

@stevenh stevenh force-pushed the fix/reaper-retries-race branch from 7bc2a87 to 12c4483 Compare September 20, 2024 00:49
generic.go Show resolved Hide resolved
internal/core/labels.go Outdated Show resolved Hide resolved
reaper.go Outdated Show resolved Hide resolved
@stevenh stevenh force-pushed the fix/reaper-retries-race branch 2 times, most recently from a304b09 to 1f4141c Compare September 21, 2024 12:21
@stevenh stevenh marked this pull request as ready for review September 21, 2024 17:50
@stevenh stevenh requested a review from mdelapenya September 21, 2024 17:51
@stevenh stevenh force-pushed the fix/reaper-retries-race branch from 1f4141c to 0e34b71 Compare September 24, 2024 14:12
@mdelapenya
Copy link
Member

There are failing tests for the reaper. Could you take a look?

@stevenh
Copy link
Collaborator Author

stevenh commented Oct 2, 2024

There are failing tests for the reaper. Could you take a look?

Thanks for the poke I was expecting some issues here, but hadn't got round to checking yet.

@erikmansson
Copy link

Any progress on this? We're having issues with flaky tests and would love for this to get merged.

@stevenh
Copy link
Collaborator Author

stevenh commented Oct 14, 2024

ryuk side was merged last week, just fixing a few compatibility issue with tc-java so once that's done, this is next on the list @erikmansson

@mdelapenya mdelapenya self-assigned this Oct 17, 2024
@mdelapenya mdelapenya added the bug An issue with the library label Oct 17, 2024
Copy link
Member

@mdelapenya mdelapenya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments, but LGTM.

Once addressed, specially those regarding the t.Skip when the reaper is disabled on GH actions, we can move on and merge it.

Thanks as always for your hard work with the library 🙇

modules/compose/compose_api.go Show resolved Hide resolved
reaper.go Outdated Show resolved Hide resolved
reaper_test.go Show resolved Hide resolved
@stevenh stevenh force-pushed the fix/reaper-retries-race branch from 4218dcc to 6a96b24 Compare October 18, 2024 09:22
Refactor how the reaper is created to allow for proper retries of
temporary errors such as container not found issues during startup or
shutdown races.

This eliminates the use of sync.Once which wasn't solving the problem at
hand and replaces it with a singleton spawner with locked access.

Wrap reaper errors so we can determine the cause of failures more
easily.

Fix race condition in port wait when loading from container by always
waiting for the port first.

Remove unnecessary use of buffering and invalid retry logic in reaper
connection handling which could never recover correctly from a partial
read.

Move the reaper creation just before connections are established in
compose to ensure its still running when the Connect calls are made.

Previously the reaper was requested in NewDockerComposeWith which
means it could have already shutdown before connections are made
during the later sections of Up if the startup took over 1 minute.

This was causing consistent failures for:
TestDockerComposeAPIWithWaitLogStrategy

Ensure that resource labels are correct so that resources aren't reaped
when the reaper is disabled by excluding session id when reaper is
disabled.

Error when creating a reaper when the config says it's disabled so that
we avoid hard to debug issues because a reaper is running when it
shouldn't be.

Set org.testcontainers.reap label for containers which should be reaped
by the reaper, to prevent containers which disable the reaper from being
incorrectly reaped.
@stevenh stevenh force-pushed the fix/reaper-retries-race branch from 6a96b24 to 8bfe616 Compare October 18, 2024 09:56
Copy link
Member

@mdelapenya mdelapenya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

I'd do the refactor of the compose tests in a follow-up, so we get this PR in, and we can start a release.

@mdelapenya mdelapenya merged commit 5e988ff into testcontainers:main Oct 18, 2024
120 checks passed
mdelapenya added a commit to mdelapenya/testcontainers-go that referenced this pull request Oct 18, 2024
* main:
  fix(reaper): refactor to allow retries and fix races (testcontainers#2728)
  chore: update ryuk to 0.10.2 (testcontainers#2833)
@stevenh stevenh deleted the fix/reaper-retries-race branch October 18, 2024 12:26
mdelapenya added a commit to mdelapenya/testcontainers-go that referenced this pull request Oct 18, 2024
* main:
  fix(reaper): refactor to allow retries and fix races (testcontainers#2728)
  chore: update ryuk to 0.10.2 (testcontainers#2833)
  feat: add yugabytedb module (testcontainers#2825)
  fix: update module container struct name and missing imports (testcontainers#2831)
  chore: replace 'assert' with 'require' (testcontainers#2827)
  chore: replace 'assert' with 'require' for critical checks (testcontainers#2824)
  chore: bump ryuk to latest release (testcontainers#2818)
  feat: add require for critical checks (testcontainers#2812)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An issue with the library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants