Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: StartLogProducer does not drain channel when EOF is received #1407

Closed
jrfeenst opened this issue Aug 1, 2023 · 7 comments
Closed
Labels
bug An issue with the library

Comments

@jrfeenst
Copy link

jrfeenst commented Aug 1, 2023

Testcontainers version

0.21.0

Using the latest Testcontainers version?

Yes

Host OS

Linux

Host arch

x86

Go version

1.20.6

Docker version

Client:
 Version:           20.10.5+dfsg1
 API version:       1.41
 Go version:        go1.15.15
 Git commit:        55c4c88
 Built:             Mon May 30 18:34:49 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.5+dfsg1
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.15.15
  Git commit:       363e9a8
  Built:            Mon May 30 18:34:49 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.13~ds1
  GitCommit:        1.4.13~ds1-1~deb11u4
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d1
 docker-init:
  Version:          0.19.0
  GitCommit:

Docker info

Client:                                                                                                                                                                                                              
 Context:    default                                                                                                                                                                                                 
 Debug Mode: false                                                                                                                                                                                                   
                                                                                                                                                                                                                     
Server:
 Containers: 38
  Running: 1
  Paused: 0
  Stopped: 37
 Images: 682
 Server Version: 20.10.5+dfsg1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux nvidia runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 1.4.13~ds1-1~deb11u4
 runc version: v1.1.4-0-g5fd4c4d1
 init version: 
 Security Options:
  apparmor
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.10.0-23-amd64
 Operating System: Debian GNU/Linux 11 (bullseye)
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 62.4GiB
 Name: ah-jfeenstr2-l
 ID: UZUR:45Z4:3MFL:6P5B:HGFF:7IQC:FJD2:XJ2J:K72Q:MCP2:4DI2:IS7S
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: jrfeenst
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Default Address Pools:
   Base: 192.168.0.0/16, Size: 24

WARNING: Support for cgroup v2 is experimental

What happened?

I have a container which exits on it's own. I use a log consumer/producer to copy the output to stdout. When the container is done it looks like the producer code is failing to properly handle the EOF. This then hangs the test -- I think because the docker daemon has an active log reader? If I don't start the log producer the test runs and passes.

_, _ = fmt.Fprintf(os.Stderr, "container log error: %+v. %s", err, logStoppedForOutOfSyncMessage)

_, _ = fmt.Fprintf(os.Stderr, "container log error: %+v. %s", err, logStoppedForOutOfSyncMessage)
// if we would continue here, the next header-read will result into random data...
return

I think the issue is that a blocking go channel is being used to stop the producer, but in this error case the call to StopLogProducer will hang indefinitely because the goroutine exited. Making it a buffering channel fixes this but makes StopLogProducer non-blocking, which may break some log consumers. You could also create a secondary goroutine to drain the stopProducer channel when this error is encountered. I've tested this out and it resolves the issue for me.

_, _ = fmt.Fprintf(os.Stderr, "container log error: %+v. %s", err, logStoppedForOutOfSyncMessage)
// if we would continue here, the next header-read will result into random data...
go func() {
<-stop // drain the channel to unblock StopLogProducer
}()

Relevant log output

container log error: EOF. Stopping log consumer: Headers out of sync2023/08/01 11:58:44 ✋ Container stopped: 2da69cdc0555
2023/08/01 11:58:44 🐳 Terminating container: 4e9083e61ffd

Additional information

No response

@jrfeenst jrfeenst added the bug An issue with the library label Aug 1, 2023
@jrfeenst jrfeenst changed the title [Bug]: StartLogProducer does not close reader when EOF is received [Bug]: StartLogProducer does not drain channel when EOF is received Aug 1, 2023
@mdelapenya
Copy link
Member

Thanks for the report @jrfeenst , would you consider contributing a fix, if you have time? I wonder if in that case we could add a test to demonstrate the behaviour.

lefinal pushed a commit to lefinal/testcontainers-go that referenced this issue Oct 19, 2023
Fixes an issue where stopping the log producer would block indefinitely if the log producer has exited before.

Closes testcontainers#1407, testcontainers#1669
lefinal pushed a commit to lefinal/testcontainers-go that referenced this issue Oct 19, 2023
Fixes an issue where stopping the log producer would block indefinitely if the log producer has exited before.

Closes testcontainers#1407, testcontainers#1669
lefinal pushed a commit to lefinal/testcontainers-go that referenced this issue Oct 19, 2023
Fixes an issue where stopping the log producer would block indefinitely if the log producer has exited before.

Closes testcontainers#1407, testcontainers#1669
lefinal pushed a commit to lefinal/testcontainers-go that referenced this issue Nov 7, 2023
Fixes an issue where stopping the log producer would block indefinitely if the log producer has exited before.

Closes testcontainers#1407, testcontainers#1669
@mdelapenya
Copy link
Member

@jrfeenst we recently merged #1971 could you take a look and verify if this bug is still present for you? 🙏

@jrfeenst
Copy link
Author

jrfeenst commented Feb 16, 2024

@jrfeenst we recently merged #1971 could you take a look and verify if this bug is still present for you? 🙏

The hang is fixed -- there is still a warning message about the container logs being out of sync but it's not breaking anything.

@mdelapenya
Copy link
Member

Closing as previous comment. Please reopen if you consider it's not fully fixed.

Thanks!

@nmoroze
Copy link

nmoroze commented Mar 1, 2024

The hang is fixed -- there is still a warning message about the container logs being out of sync but it's not breaking anything

Do you have any insight into what this error means? I'm seeing it consistently on certain tests but as far as I can tell the containers are still running until terminated by testcontainers, at which point I'd expect the log consumer to be stopped gracefully.

@mdelapenya
Copy link
Member

@nmoroze could you please help out with a repro snippet 🙏 then we can work on that with more confidence

@nmoroze
Copy link

nmoroze commented Mar 6, 2024

@mdelapenya sure, I've opened a new issue describing what I'm seeing with a minimal example: #2328

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An issue with the library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants