Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LogMessageWaitStrategy for PostgreSQLContainer times out #455

Closed
gbtec-markusmeier opened this issue Sep 14, 2017 · 33 comments
Closed

LogMessageWaitStrategy for PostgreSQLContainer times out #455

gbtec-markusmeier opened this issue Sep 14, 2017 · 33 comments

Comments

@gbtec-markusmeier
Copy link

When waiting for Postgres to start up, PostgreSQLContainer looks for the String ".*database system is ready to accept connections.*\\s".

The database though has already started when PostgreSQLContainer is ready for waiting and thus will never find the String.

This issues seems to appear only on Windows machines - on Linux it was not observable.

The new PostgreSQLContainer wait behaviour was introduced in #327.

@kiview
Copy link
Member

kiview commented Sep 14, 2017

Is this String not logged on Windows machines? This String should normally be logged two times before the database is ready.

@gbtec-markusmeier
Copy link
Author

Yes, it is logged. It already came twice when PostgreSQLContainer starts waiting.

@kiview
Copy link
Member

kiview commented Sep 14, 2017

Ok, so that sounds like some race condition regarding the LogMessageWaitStrategy and might not be a PostgreSQLContainer (or Windows) specific problem.

@ingogriebsch
Copy link

@kiview Would like to ask if one can expect a fix for this bug in the near future? We are developing mostly on Windows machines and heavily using testcontainers in our services. Even if we have a little workaround it would be nice to get this issue solved. Don't know if it is also possible to support you in some way but if, we at least need to get some details explained. :)

@kiview
Copy link
Member

kiview commented Sep 30, 2017

@gbtec-ingogriebsch Since it might involve a race condition (which should be platform independent?), I don't know when it will be fixed. It would help to find this bug on Linux as well.

I'll be home next week (and I have a Windows 10 box at home) and I'll try to look into the problem.

@kiview
Copy link
Member

kiview commented Oct 2, 2017

Can someone share a project demonstrating this problem?

I just ran SimplePostgreSQLTest on my Windows 10 Pro 1703 with Docker 17.09.0 CE and starting the postgres container seems to work fine.

I see the following error in my console, the test succeeds however:

Okt 02, 2017 4:17:22 PM org.postgresql.Driver connect
SCHWERWIEGEND: Connection error: 
org.postgresql.util.PSQLException: Der Verbindungsversuch schlug fehl.
	at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:275)
	at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
	at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:194)
	at org.postgresql.Driver.makeConnection(Driver.java:431)
	at org.postgresql.Driver.connect(Driver.java:247)
	at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:92)
	at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:98)
	at com.zaxxer.hikari.pool.BaseHikariPool.addConnection(BaseHikariPool.java:438)
	at com.zaxxer.hikari.pool.BaseHikariPool$1.run(BaseHikariPool.java:413)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
	at org.postgresql.core.PGStream.receiveChar(PGStream.java:284)
	at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:418)
	at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:222)
	... 13 more

I think this happens when the container is shutting down, so nothing too bad.

Edit:
Indeed, happens when closing. Adding a ds.close() statement to the end of the test circumvents the error,

@ingogriebsch
Copy link

ingogriebsch commented Oct 6, 2017

@kiview I'm using Docker for Windows, Version 17.06.1-ce-win24 (13025). I have not yet updated to Version 17.09.0-ce-win32 (13529). We will check if the problem is maybe solved if using the latest docker version.

And we can provide a simple test project for the problem we are having at the moment. But it will include nothing else as a Spring Boot app which has configured the postgres testcontainer triggered through the 'special' jdbc url.

@gbtec-markusmeier
Copy link
Author

gbtec-markusmeier commented Oct 10, 2017

The result of https://github.com/testcontainers/testcontainers-java/blob/master/modules/jdbc-test/src/test/java/org/testcontainers/junit/SimplePostgreSQLTest.java on my machine:

13:21:08.851 DEBUG 🐳 [postgres:latest] - Starting container: postgres:latest
13:21:08.852 DEBUG 🐳 [postgres:latest] - Trying to start container: postgres:latest
13:21:08.852 DEBUG 🐳 [postgres:latest] - Trying to start container: postgres:latest (attempt 1/1)
13:21:08.852 DEBUG 🐳 [postgres:latest] - Starting container: postgres:latest
13:21:08.852 INFO  🐳 [postgres:latest] - Creating container for image: postgres:latest
13:21:09.037 INFO  🐳 [postgres:latest] - Starting container with ID: 3bed081e31e630abe1e97560afb3510013eaf84603541e58c4a6d36d0f7f3494
13:21:09.375 INFO  🐳 [postgres:latest] - Container postgres:latest is starting: 3bed081e31e630abe1e97560afb3510013eaf84603541e58c4a6d36d0f7f3494
13:22:09.467 ERROR 🐳 [postgres:latest] - Could not start container
org.testcontainers.containers.ContainerLaunchException: Timed out waiting for log output matching '.*database system is ready to accept connections.*\s'
	at org.testcontainers.containers.wait.LogMessageWaitStrategy.waitUntilReady(LogMessageWaitStrategy.java:31)
	at org.testcontainers.containers.GenericContainer$AbstractWaitStrategy.waitUntilReady(GenericContainer.java:943)
	at org.testcontainers.containers.PostgreSQLContainer.waitUntilContainerStarted(PostgreSQLContainer.java:92)
	at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:235)
	at org.testcontainers.containers.GenericContainer.lambda$start$0(GenericContainer.java:184)
	at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:76)
	at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:182)
	at org.testcontainers.containers.GenericContainer.starting(GenericContainer.java:544)
	at org.testcontainers.containers.FailureDetectingExternalResource$1.evaluate(FailureDetectingExternalResource.java:29)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
13:22:09.467 ERROR 🐳 [postgres:latest] - Container log output (if any) will follow:
13:22:09.489 INFO  🐳 [postgres:latest] - STDOUT: The files belonging to this database system will be owned by user "postgres".
13:22:09.489 INFO  🐳 [postgres:latest] - STDOUT: This user must also own the server process.
13:22:09.489 INFO  🐳 [postgres:latest] - STDOUT: 
13:22:09.489 INFO  🐳 [postgres:latest] - STDOUT: The database cluster will be initialized with locale "en_US.utf8".
13:22:09.489 INFO  🐳 [postgres:latest] - STDOUT: The default database encoding has accordingly been set to "UTF8".
13:22:09.490 INFO  🐳 [postgres:latest] - STDOUT: The default text search configuration will be set to "english".
13:22:09.490 INFO  🐳 [postgres:latest] - STDOUT: 
13:22:09.490 INFO  🐳 [postgres:latest] - STDOUT: Data page checksums are disabled.
13:22:09.490 INFO  🐳 [postgres:latest] - STDOUT: 
13:22:09.490 INFO  🐳 [postgres:latest] - STDOUT: fixing permissions on existing directory /var/lib/postgresql/data ... ok
13:22:09.490 INFO  🐳 [postgres:latest] - STDOUT: creating subdirectories ... ok
13:22:09.490 INFO  🐳 [postgres:latest] - STDOUT: selecting default max_connections ... 100
13:22:09.490 INFO  🐳 [postgres:latest] - STDOUT: selecting default shared_buffers ... 128MB
13:22:09.490 INFO  🐳 [postgres:latest] - STDOUT: selecting dynamic shared memory implementation ... posix
13:22:09.490 INFO  🐳 [postgres:latest] - STDOUT: creating configuration files ... ok
13:22:09.490 INFO  🐳 [postgres:latest] - STDOUT: running bootstrap script ... ok
13:22:09.491 INFO  🐳 [postgres:latest] - STDOUT: performing post-bootstrap initialization ... ok
13:22:09.491 INFO  🐳 [postgres:latest] - STDERR: 
13:22:09.491 INFO  🐳 [postgres:latest] - STDERR: WARNING: enabling "trust" authentication for local connections
13:22:09.491 INFO  🐳 [postgres:latest] - STDERR: You can change this by editing pg_hba.conf or using the option -A, or
13:22:09.491 INFO  🐳 [postgres:latest] - STDERR: --auth-local and --auth-host, the next time you run initdb.
13:22:09.491 INFO  🐳 [postgres:latest] - STDOUT: syncing data to disk ... ok
13:22:09.491 INFO  🐳 [postgres:latest] - STDOUT: 
13:22:09.492 INFO  🐳 [postgres:latest] - STDOUT: Success. You can now start the database server using:
13:22:09.492 INFO  🐳 [postgres:latest] - STDOUT: 
13:22:09.492 INFO  🐳 [postgres:latest] - STDOUT: pg_ctl -D /var/lib/postgresql/data -l logfile start
13:22:09.492 INFO  🐳 [postgres:latest] - STDOUT: 
13:22:09.492 INFO  🐳 [postgres:latest] - STDOUT: waiting for server to start....2017-10-10 11:21:10.459 UTC [39] LOG:  listening on IPv4 address "127.0.0.1", port 5432
13:22:09.492 INFO  🐳 [postgres:latest] - STDOUT: 2017-10-10 11:21:10.459 UTC [39] LOG:  could not bind IPv6 address "::1": Cannot assign requested address
13:22:09.492 INFO  🐳 [postgres:latest] - STDOUT: 2017-10-10 11:21:10.459 UTC [39] HINT:  Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
13:22:09.492 INFO  🐳 [postgres:latest] - STDOUT: 2017-10-10 11:21:10.465 UTC [39] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
13:22:09.493 INFO  🐳 [postgres:latest] - STDOUT: 2017-10-10 11:21:10.479 UTC [40] LOG:  database system was shut down at 2017-10-10 11:21:10 UTC
13:22:09.493 INFO  🐳 [postgres:latest] - STDOUT: 2017-10-10 11:21:10.485 UTC [39] LOG:  database system is ready to accept connections
13:22:09.493 INFO  🐳 [postgres:latest] - STDOUT: done
13:22:09.493 INFO  🐳 [postgres:latest] - STDOUT: server started
13:22:09.494 INFO  🐳 [postgres:latest] - STDOUT: CREATE DATABASE
13:22:09.494 INFO  🐳 [postgres:latest] - STDOUT: 
13:22:09.494 INFO  🐳 [postgres:latest] - STDOUT: CREATE ROLE
13:22:09.494 INFO  🐳 [postgres:latest] - STDOUT: 
13:22:09.494 INFO  🐳 [postgres:latest] - STDOUT: 
13:22:09.494 INFO  🐳 [postgres:latest] - STDOUT: /usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
13:22:09.495 INFO  🐳 [postgres:latest] - STDOUT: 
13:22:09.495 INFO  🐳 [postgres:latest] - STDOUT: waiting for server to shut down....2017-10-10 11:21:10.823 UTC [39] LOG:  received fast shutdown request
13:22:09.495 INFO  🐳 [postgres:latest] - STDOUT: 2017-10-10 11:21:10.827 UTC [39] LOG:  aborting any active transactions
13:22:09.495 INFO  🐳 [postgres:latest] - STDOUT: 2017-10-10 11:21:10.829 UTC [39] LOG:  worker process: logical replication launcher (PID 46) exited with exit code 1
13:22:09.495 INFO  🐳 [postgres:latest] - STDOUT: 2017-10-10 11:21:10.829 UTC [41] LOG:  shutting down
13:22:09.495 INFO  🐳 [postgres:latest] - STDOUT: 2017-10-10 11:21:10.851 UTC [39] LOG:  database system is shut down
13:22:09.495 INFO  🐳 [postgres:latest] - STDOUT: done
13:22:09.496 INFO  🐳 [postgres:latest] - STDOUT: server stopped
13:22:09.496 INFO  🐳 [postgres:latest] - STDOUT: 
13:22:09.496 INFO  🐳 [postgres:latest] - STDOUT: PostgreSQL init process complete; ready for start up.
13:22:09.496 INFO  🐳 [postgres:latest] - STDOUT: 
13:22:09.496 INFO  🐳 [postgres:latest] - STDERR: 2017-10-10 11:21:10.948 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
13:22:09.496 INFO  🐳 [postgres:latest] - STDERR: 2017-10-10 11:21:10.948 UTC [1] LOG:  listening on IPv6 address "::", port 5432
13:22:09.497 INFO  🐳 [postgres:latest] - STDERR: 2017-10-10 11:21:10.956 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
13:22:09.497 INFO  🐳 [postgres:latest] - STDERR: 2017-10-10 11:21:10.972 UTC [52] LOG:  database system was shut down at 2017-10-10 11:21:10 UTC
13:22:09.497 INFO  🐳 [postgres:latest] - STDERR: 2017-10-10 11:21:10.977 UTC [1] LOG:  database system is ready to accept connections
13:22:10.173 DEBUG org.testcontainers.utility.ResourceReaper - Removed container and associated volume(s): postgres:latest

org.testcontainers.containers.ContainerLaunchException: Container startup failed

	at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:189)
	at org.testcontainers.containers.GenericContainer.starting(GenericContainer.java:544)
	at org.testcontainers.containers.FailureDetectingExternalResource$1.evaluate(FailureDetectingExternalResource.java:29)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: org.rnorth.ducttape.RetryCountExceededException: Retry limit hit with exception
	at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:83)
	at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:182)
	... 17 more
Caused by: org.testcontainers.containers.ContainerLaunchException: Could not create/start container
	at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:256)
	at org.testcontainers.containers.GenericContainer.lambda$start$0(GenericContainer.java:184)
	at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:76)
	... 18 more
Caused by: org.testcontainers.containers.ContainerLaunchException: Timed out waiting for log output matching '.*database system is ready to accept connections.*\s'
	at org.testcontainers.containers.wait.LogMessageWaitStrategy.waitUntilReady(LogMessageWaitStrategy.java:31)
	at org.testcontainers.containers.GenericContainer$AbstractWaitStrategy.waitUntilReady(GenericContainer.java:943)
	at org.testcontainers.containers.PostgreSQLContainer.waitUntilContainerStarted(PostgreSQLContainer.java:92)
	at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:235)
	... 20 more

@gbtec-markusmeier gbtec-markusmeier changed the title LogMessageWaitStrategy for PostgreSQLContainer times out on Windows machines LogMessageWaitStrategy for PostgreSQLContainer times Oct 14, 2017
@gbtec-markusmeier
Copy link
Author

It is somewhat environment specific, but seems not to depend on the OS itself. Further investigating.

@gbtec-markusmeier gbtec-markusmeier changed the title LogMessageWaitStrategy for PostgreSQLContainer times LogMessageWaitStrategy for PostgreSQLContainer times out Oct 14, 2017
@kiview
Copy link
Member

kiview commented Oct 14, 2017

After additional research, it seems it is not a race condition, since followOutput() will also retrieve old output. However, in the failing environments, followOutput() doesn't seem to return any frames in the callback.

It might be somehow related to docker-java/docker-java#878, not sure though.

@rnorth
Copy link
Member

rnorth commented Oct 16, 2017

That's not good. Do you think we should revert use of the log-based wait strategy until we can get to the bottom of this?

@gbtec-markusmeier
Copy link
Author

I would appreciate that - we need to upgrade to testcontainers 1.4.x to make use of the latest java docker api and the TC_DAEMON parameter. Reverting it to use SELECT 1 would remedy this. But the revert would also re-introduce #317, wouldn't it?

@kiview
Copy link
Member

kiview commented Oct 17, 2017

Yes it would, but that's not really an error or a problem, more like verbose logging stuff happening. Other LogMessageWaitStrategy based containers wouldn't work either in this setup.

@rnorth
TBH I'm not sure what to do here right now. I was with @gbtec-markusmeier at the weekend, trying to fix the problem, pinpoint it, work around it, but the behavior regarding the log output was actually happening as soon as we've used logContainerCmd.withFollowStream(true).
So I think the problem might be somewhere in docker-java. I think it's interesting, this problem occurs specifically in the work environment of @gbtec-markusmeier and @gbtec-ingogriebsch, we weren't really able to spot the difference in the setup from which this problem might arise. My last idea was, that it might happen somewhere one the network layer, docker-java not being able to stream the logs.

@gbtec-markusmeier
Maybe we can raise an issue regarding the logContainerCmd.withFollowStream(true) problem in docker-java?

@ingogriebsch
Copy link

ingogriebsch commented Nov 7, 2017

@kiview Do you have some news for us? Can we provide 'some' information to help you understanding the problem? Or is the next step to follow your suggestion and raise an issue in docker-java?

@kiview
Copy link
Member

kiview commented Nov 7, 2017

@gbtec-ingogriebsch Do you think you can raise a useful issue with enough details in docker-java? If yes this would be great.

As I said before, I wasn't really able to reproduce this problem on my machine and pinpoint the differences in the environment, so sadly I'm currently unable to get additional insight into this problem 😞

As a work around you could always reside to extending your own custom JdbcContainer I suppose.

@kiview
Copy link
Member

kiview commented Nov 22, 2017

Just a little heads up:

We've currently the problem, that LogMessageWaitStrategy doesn't receive the first OutputFrame in one of our tests (reproducable).

I wonder why this is happening, since we do withSince(0) on following output.

@kiview kiview mentioned this issue Nov 29, 2017
@louisburton
Copy link

Just noting that I've had a similar issue with 1.5.1 and the org.testcontainers.containers.localstack.LocalStackContainer on a Windows OS. Nothing to do with the RegExp format, and the line does get logged as I can see from docker logs on the container started, but it times out with the same time out error.

Looking under the hood I can see a netty request of this format: /containers/20bc52c3e633815dd14beeeb0f4ddbb41c3acb5908cee3dc7d230ea57e3e0f67/logs?since=0&stdout=true&stderr=true&follow=true
If I changed it to org.testcontainers.containers.wait.HostPortWaitStrategy it starts up fine.

@kiview
Copy link
Member

kiview commented Jan 29, 2018

Yes, this bug seems a bit more prominent on Windows, but we've observed it on different Linux environments as well. Sadly we couldn't figure out the root cause yet, it might be in the docker-java implementation, or it could be strange behaviour of the docker logs.

@kiview
Copy link
Member

kiview commented May 14, 2018

@gbtec-ingogriebsch @gbtec-markusmeier
Are you interested in checking out, if the last version with the included #643 solves your problem on Windows?

@ingogriebsch
Copy link

@kiview Nice stuff! We will try it for sure! Give us a moment, we will come back to you. /CC @gbtec-markusmeier

@gbtec-markusmeier
Copy link
Author

gbtec-markusmeier commented May 17, 2018

Tested with testcontainers:1.7.3:

  • Problem still exists.
  • I see that the WaitingConsumer being used is using the new BaseConsumer.
  • On the other hand, WaitingConsumer does not have changes similar to Slf4jLogConsumer or ToStringConsumer done in More accurate log processing. #643.
  • Also, the changed FrameConsumerResultCallback.onNext() is called only after the waiting period of 60 seconds, so it has no effect within the scope of this ticket. When it is called, processRawFrame() is never touched - so it seems to be another problem.

@kiview
Copy link
Member

kiview commented May 18, 2018

@gbtec-markusmeier Thanks for testing and the feedback!
Okay, maybe we can finde something in WaitingConsumer.

@binakot
Copy link

binakot commented Sep 20, 2018

Today I got this error too...

org.testcontainers.containers.ContainerLaunchException: Container startup failed

	at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:214)
	at org.testcontainers.containers.GenericContainer.starting(GenericContainer.java:638)
	at org.testcontainers.containers.FailureDetectingExternalResource$1.evaluate(FailureDetectingExternalResource.java:29)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: org.rnorth.ducttape.RetryCountExceededException: Retry limit hit with exception
	at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:83)
	at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:207)
	... 9 more
Caused by: org.testcontainers.containers.ContainerLaunchException: Could not create/start container
	at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:279)
	at org.testcontainers.containers.GenericContainer.lambda$start$0(GenericContainer.java:209)
	at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:76)
	... 10 more
Caused by: org.testcontainers.containers.ContainerLaunchException: Timed out waiting for log output matching '.*database system is ready to accept connections.*\s'
	at org.testcontainers.containers.wait.strategy.LogMessageWaitStrategy.waitUntilReady(LogMessageWaitStrategy.java:30)
	at org.testcontainers.containers.wait.strategy.AbstractWaitStrategy.waitUntilReady(AbstractWaitStrategy.java:32)
	at org.testcontainers.containers.wait.LogMessageWaitStrategy.waitUntilReady(LogMessageWaitStrategy.java:17)
	at org.testcontainers.containers.wait.strategy.AbstractWaitStrategy.waitUntilReady(AbstractWaitStrategy.java:32)
	at org.testcontainers.containers.PostgreSQLContainer.waitUntilContainerStarted(PostgreSQLContainer.java:103)
	at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:258)
	... 12 more

Probably it helps to localize the problem... I'm not using default Postgres image, but Postgres with PostGIS and Timescale https://hub.docker.com/r/timescale/timescaledb-postgis/ or my own https://hub.docker.com/r/binakot/postgresql-postgis-timescaledb/

For example, when I run the image timescale/timescaledb-postgis:latest-pg9.6, I got this:

The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
sh: locale: not found
performing post-bootstrap initialization ... No usable system locales were found.
Use the option "--debug" to see details.
ok
syncing data to disk ... ok

Success. You can now start the database server using:

    pg_ctl -D /var/lib/postgresql/data -l logfile start


WARNING: enabling "trust" authentication for local connections

For timescale/timescaledb-postgis:latest-pg10:

The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
sh: locale: not found
2018-09-21 05:58:43.959 UTC [26] WARNING:  no usable system locales were found
performing post-bootstrap initialization ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
syncing data to disk ... ok

Success. You can now start the database server using:

    pg_ctl -D /var/lib/postgresql/data -l logfile start

waiting for server to start....2018-09-21 05:58:44.462 UTC [30] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2018-09-21 05:58:44.474 UTC [31] LOG:  database system was shut down at 2018-09-21 05:58:44 UTC
2018-09-21 05:58:44.483 UTC [30] LOG:  database system is ready to accept connections
2018-09-21 05:58:44.485 UTC [37] LOG:  TimescaleDB background worker launcher connected to shared catalogs
 done
server started
CREATE DATABASE


/usr/local/bin/docker-entrypoint.sh: sourcing /docker-entrypoint-initdb.d/install_timescaledb.sh
2018-09-21 05:58:44.851 UTC [43] FATAL:  role "postgres" does not exist
psql: FATAL:  role "postgres" does not exist

And for binakot/postgresql-postgis-timescaledb:latest:

The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
sh: locale: not found
2018-09-21 06:00:10.260 UTC [27] WARNING:  no usable system locales were found
performing post-bootstrap initialization ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
syncing data to disk ... ok

Success. You can now start the database server using:

    pg_ctl -D /var/lib/postgresql/data -l logfile start

waiting for server to start....2018-09-21 06:00:10.712 UTC [31] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2018-09-21 06:00:10.723 UTC [32] LOG:  database system was shut down at 2018-09-21 06:00:10 UTC
2018-09-21 06:00:10.725 UTC [31] LOG:  database system is ready to accept connections
 done
server started
CREATE DATABASE

CREATE ROLE


/usr/local/bin/docker-entrypoint.sh: sourcing /docker-entrypoint-initdb.d/1.postgis.sh
CREATE DATABASE
UPDATE 1
Loading PostGIS extensions into template_postgis
CREATE EXTENSION
CREATE EXTENSION
CREATE EXTENSION
CREATE EXTENSION
Loading PostGIS extensions into test
CREATE EXTENSION
CREATE EXTENSION
CREATE EXTENSION
CREATE EXTENSION

/usr/local/bin/docker-entrypoint.sh: sourcing /docker-entrypoint-initdb.d/2.timescaledb.sh
CREATE DATABASE
UPDATE 1
Loading TimescaleDB extensions into template_timescaledb
CREATE EXTENSION
Loading TimescaleDB extensions into test
CREATE EXTENSION

/usr/local/bin/docker-entrypoint.sh: sourcing /docker-entrypoint-initdb.d/3.postgres.sh
                                        version                                        
---------------------------------------------------------------------------------------
 PostgreSQL 10.4 on x86_64-pc-linux-musl, compiled by gcc (Alpine 6.4.0) 6.4.0, 64-bit
(1 row)

        extname         | extversion 
------------------------+------------
 plpgsql                | 1.0
 postgis                | 2.4.4
 postgis_topology       | 2.4.4
 fuzzystrmatch          | 1.1
 postgis_tiger_geocoder | 2.4.4
 timescaledb            | 0.10.1
(6 rows)


waiting for server to shut down....2018-09-21 06:00:15.387 UTC [31] LOG:  received fast shutdown request
2018-09-21 06:00:15.397 UTC [31] LOG:  aborting any active transactions
2018-09-21 06:00:15.398 UTC [31] LOG:  worker process: logical replication launcher (PID 38) exited with exit code 1
2018-09-21 06:00:15.398 UTC [33] LOG:  shutting down
2018-09-21 06:00:15.437 UTC [31] LOG:  database system is shut down
 done
server stopped

PostgreSQL init process complete; ready for start up.

2018-09-21 06:00:15.498 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2018-09-21 06:00:15.498 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2018-09-21 06:00:15.501 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2018-09-21 06:00:15.514 UTC [59] LOG:  database system was shut down at 2018-09-21 06:00:15 UTC
2018-09-21 06:00:15.518 UTC [1] LOG:  database system is ready to accept connections

The most interesting thing that everything worked like a charm before I re-install Windows 10 on my PC 😢

But now It works only with my own image, because:

  1. There is no log database system is ready to accept connections in timescale/timescaledb-postgis:latest-pg9.6 at all. But we have it in timescale/timescaledb-postgis:latest-pg10.
  2. In my image I got database system is ready to accept connections 2 times, because I restart the Postgres after extension installations. I think the test-container catches exactly the last one.

But how did it work before? I didn't change versions of libraries or images.

@binakot
Copy link

binakot commented Sep 21, 2018

I just tried every version of timescale docker images one by one.
And I found the working image: timescale/timescaledb-postgis:0.10.1-pg10.

@kiview
Copy link
Member

kiview commented Sep 21, 2018

Maybe we should roll back to the Jdbc based wait strategy for the sake of stability, WDYT @rnorth ?

@binakot
Copy link

binakot commented Sep 22, 2018

Or even better to let to choose the strategy via argument or smth else?
I'm using rule's annotation for my junit tests.

@kiview
Copy link
Member

kiview commented Sep 22, 2018

I've just checked the code and I see that we don't have a JdbcWaitStrategy (I somehow thought we once had?). The actual wait implementation is in theJdbcDatabaseContainer (see

).

So one solution would be, to extract a real JdbcWaitStrategy, which would allow for easier composition and configurability, simply use it in waitingFor() method on container instances. We could still use the log based waiting as the faster default, but user in environments which observe this behavior (which seems strangely related to Docker for Windows?), can opt-in to use the JdbcWaitStrategy.

@gbtec-markusmeier What is your take on this? I know this bug blocks you from upgrading to recent Testcontainers version, so I'd be happy to find a solution.

Since I'm a bit uncomfortable with the current state of PostgreSQLContainer, which is unreliable for some user, and I'm the one responsible for changing it to LogMessageWaitStrategy in the first place, I'd be happy to take care of it.

@rnorth @bsideup WDYT?

@gbtec-markusmeier
Copy link
Author

@kiview: In any case we will be happy to be able to have a query based db readiness check again - as this works reliable in our environment.
It would be a possibility (as discussed in person last year) to OR combine that with the current log based strategy - i.e. whichever comes first will result in a successful wait. But if it is easier/faster to just go back to the former, only query based strategy, why not go back to that one? There weren't functional problems in any scenarios, so it should be fine.

@kiview
Copy link
Member

kiview commented Oct 4, 2018

After some geek-at-keyboard tests by @gbtec-markusmeier and me, it seems this problem was solved in 1.9.1, probably because of the switch to okhttp and npipe support.

@gbtec-markusmeier will test a bit more and then we might be able to close this ticket 🙂.

@kiview
Copy link
Member

kiview commented Oct 9, 2018

@gbtec-markusmeier For getting some more confirmation of the underlying bug, could you please try to run 1.9.1 with Netty and check if the problem reappears?

Just set

transport.type=netty

in ~/.testcontainers.properties. This would be very helpful. (You would also need to switch back to TCP port on Docker and probably delete the discovered strategy line from ~/.testcontainers.properties)

@gbtec-markusmeier
Copy link
Author

gbtec-markusmeier commented Oct 9, 2018

@gbtec-markusmeier For getting some more confirmation of the underlying bug, could you please try to run 1.9.1 with Netty and check if the problem reappears?

Just set

transport.type=netty

in ~/.testcontainers.properties. This would be very helpful. (You would also need to switch back to TCP port on Docker and probably delete the discovered strategy line from ~/.testcontainers.properties)

The problem does not occur using netty:

o.t.d.DockerClientProviderStrategy       : Loaded org.testcontainers.dockerclient.WindowsClientProviderStrategy from ~/.testcontainers.properties, will try it first
o.t.d.DockerClientProviderStrategy       : Will use 'netty' transport
o.t.d.DockerClientProviderStrategy       : Found Docker environment with Docker for Windows (via TCP port 2375)
org.testcontainers.DockerClientFactory   : Docker host IP address is localhost
org.testcontainers.DockerClientFactory   : Connected to docker: 
  Server Version: 18.06.1-ce
  API Version: 1.38
  Operating System: Docker for Windows
  Total Memory: 16006 MB
org.testcontainers.DockerClientFactory   : Ryuk started - will monitor and terminate Testcontainers containers on JVM exit
        ?? Checking the system...
        ? Docker version should be at least 1.6.0
        ? Docker environment should have more than 2GB free disk space
docker[postgres:9.5]                     : Creating container for image: postgres:9.5
docker[postgres:9.5]                     : Starting container with ID: 5597e1cdf786e868c55f733dfa1abe11d4625ff9a76c3210a74914bae0706f05
docker[postgres:9.5]                     : Container postgres:9.5 is starting: 5597e1cdf786e868c55f733dfa1abe11d4625ff9a76c3210a74914bae0706f05
docker[postgres:9.5]                     : Container postgres:9.5 started

@kiview
Copy link
Member

kiview commented Oct 9, 2018

Thanks, so probably solved by something else, but solved never the less! 👹
I'll close the issue for now, @gbtec-markusmeier please re-open if the problem persists in some combination.

@kiview kiview closed this as completed Oct 9, 2018
danoost pushed a commit to danoost/streamevmon that referenced this issue Mar 3, 2021
testcontainers/testcontainers-java#455
discusses this issue, but they believe the problem was fixed by using
netty transport. We're gonna just try one of their other solutions,
changing to a HostPortWaitStrategy instead of the default log parsing
one.
@SgtSilvio
Copy link
Contributor

This problem is very likely caused by #5843

SgtSilvio added a commit to SgtSilvio/testcontainers-java that referenced this issue Sep 18, 2022
…one complete log line

Motivation: Docker logs provide a stream, the frames provided are not necessarily split by newline characters. So it might happen that a frame contains partial log lines.
Changes:
- Line splitting, partial log buffering and line merging independently of the stream type (RAW, STDOUT, STDERR)
- OutputFrame does consistently not contain newline characters (independent of TTY)
- ToStringConsumer now adds newlines
- Slf4jLogConsumer does not need to remove any newlines

Also fixes testcontainers#4110, testcontainers#455
SgtSilvio added a commit to SgtSilvio/testcontainers-java that referenced this issue Sep 18, 2022
…one complete log line

Motivation: Docker logs provide a stream, the frames provided are not necessarily split by newline characters. So it might happen that a frame contains partial log lines.
Changes:
- Line splitting, partial log buffering and line merging independently of the stream type (RAW, STDOUT, STDERR)
- OutputFrame does consistently not contain newline characters (independent of TTY)
- ToStringConsumer now adds newlines
- Slf4jLogConsumer does not need to remove any newlines

Also fixes testcontainers#4110, testcontainers#455
SgtSilvio added a commit to SgtSilvio/testcontainers-java that referenced this issue Sep 19, 2022
…one complete log line

Motivation: Docker logs provide a stream, the frames provided are not necessarily split by newline characters. So it might happen that a frame contains partial log lines.
Changes:
- Line splitting, partial log buffering and line merging independently of the stream type (RAW, STDOUT, STDERR)
- OutputFrame does consistently not contain newline characters (independent of TTY)
- ToStringConsumer now adds newlines
- Slf4jLogConsumer does not need to remove any newlines

Also fixes testcontainers#4110, testcontainers#455
SgtSilvio added a commit to SgtSilvio/testcontainers-java that referenced this issue Oct 4, 2022
…one complete log line

Motivation: Docker logs provide a stream, the frames provided are not necessarily split by newline characters. So it might happen that a frame contains partial log lines.
Changes:
- Line splitting, partial log buffering and line merging independently of the stream type (RAW, STDOUT, STDERR)
- OutputFrame does consistently not contain newline characters (independent of TTY)
- ToStringConsumer now adds newlines
- Slf4jLogConsumer does not need to remove any newlines

Also fixes testcontainers#4110, testcontainers#455
SgtSilvio added a commit to SgtSilvio/testcontainers-java that referenced this issue Oct 10, 2022
…one complete log line

Motivation: Docker logs provide a stream, the frames provided are not necessarily split by newline characters. So it might happen that a frame contains partial log lines.
Changes:
- Line splitting, partial log buffering and line merging independently of the stream type (RAW, STDOUT, STDERR)
- OutputFrame does consistently not contain newline characters (independent of TTY)
- ToStringConsumer now adds newlines
- Slf4jLogConsumer does not need to remove any newlines

Also fixes testcontainers#4110, testcontainers#455
SgtSilvio added a commit to SgtSilvio/testcontainers-java that referenced this issue Oct 12, 2022
…one complete log line

Motivation: Docker logs provide a stream, the frames provided are not necessarily split by newline characters. So it might happen that a frame contains partial log lines.
Changes:
- Line splitting, partial log buffering and line merging independently of the stream type (RAW, STDOUT, STDERR)
- OutputFrame does consistently not contain newline characters (independent of TTY)
- ToStringConsumer now adds newlines
- Slf4jLogConsumer does not need to remove any newlines

Also fixes testcontainers#4110, testcontainers#455
SgtSilvio added a commit to SgtSilvio/testcontainers-java that referenced this issue Oct 17, 2022
…one complete log line

Motivation: Docker logs provide a stream, the frames provided are not necessarily split by newline characters. So it might happen that a frame contains partial log lines.
Changes:
- Line splitting, partial log buffering and line merging independently of the stream type (RAW, STDOUT, STDERR)
- OutputFrame does consistently not contain newline characters (independent of TTY)
- ToStringConsumer now adds newlines
- Slf4jLogConsumer does not need to remove any newlines

Also fixes testcontainers#4110, testcontainers#455
SgtSilvio added a commit to SgtSilvio/testcontainers-java that referenced this issue Oct 20, 2022
…one complete log line

Motivation: Docker logs provide a stream, the frames provided are not necessarily split by newline characters. So it might happen that a frame contains partial log lines.
Changes:
- Line splitting, partial log buffering and line merging independently of the stream type (RAW, STDOUT, STDERR)
- OutputFrame does consistently not contain newline characters (independent of TTY)
- ToStringConsumer now adds newlines
- Slf4jLogConsumer does not need to remove any newlines

Also fixes testcontainers#4110, testcontainers#455
SgtSilvio added a commit to SgtSilvio/testcontainers-java that referenced this issue Oct 22, 2022
…one complete log line

Motivation: Docker logs provide a stream, the frames provided are not necessarily split by newline characters. So it might happen that a frame contains partial log lines.
Changes:
- Line splitting, partial log buffering and line merging independently of the stream type (RAW, STDOUT, STDERR)
- OutputFrame does consistently not contain newline characters (independent of TTY)
- ToStringConsumer now adds newlines
- Slf4jLogConsumer does not need to remove any newlines

Also fixes testcontainers#4110, testcontainers#455
SgtSilvio added a commit to SgtSilvio/testcontainers-java that referenced this issue Nov 2, 2022
…one complete log line

Motivation: Docker logs provide a stream, the frames provided are not necessarily split by newline characters. So it might happen that a frame contains partial log lines.
Changes:
- Line splitting, partial log buffering and line merging independently of the stream type (RAW, STDOUT, STDERR)
- OutputFrame does consistently not contain newline characters (independent of TTY)
- ToStringConsumer now adds newlines
- Slf4jLogConsumer does not need to remove any newlines

Also fixes testcontainers#4110, testcontainers#455
SgtSilvio added a commit to SgtSilvio/testcontainers-java that referenced this issue Jan 16, 2023
…one complete log line

Motivation: Docker logs provide a stream, the frames provided are not necessarily split by newline characters. So it might happen that a frame contains partial log lines.
Changes:
- Line splitting, partial log buffering and line merging independently of the stream type (RAW, STDOUT, STDERR)
- OutputFrame does consistently not contain newline characters (independent of TTY)
- ToStringConsumer now adds newlines
- Slf4jLogConsumer does not need to remove any newlines

Also fixes testcontainers#4110, testcontainers#455
SgtSilvio added a commit to SgtSilvio/testcontainers-java that referenced this issue Jan 18, 2023
…one complete log line

Motivation: Docker logs provide a stream, the frames provided are not necessarily split by newline characters. So it might happen that a frame contains partial log lines.
Changes:
- Line splitting, partial log buffering and line merging independently of the stream type (RAW, STDOUT, STDERR)
- OutputFrame does consistently not contain newline characters (independent of TTY)
- ToStringConsumer now adds newlines
- Slf4jLogConsumer does not need to remove any newlines

Also fixes testcontainers#4110, testcontainers#455
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants