Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL Server container startup fails intermittently #715

Closed
vpavic opened this issue May 29, 2018 · 10 comments
Closed

SQL Server container startup fails intermittently #715

vpavic opened this issue May 29, 2018 · 10 comments

Comments

@vpavic
Copy link
Contributor

vpavic commented May 29, 2018

This was originally reported as testcontainers/testcontainers-java-module-mssqlserver#11.

We are experiencing an intermittent failures of our SQL Server integration tests that are based on org.testcontainers:mssqlserver. The error occurs in roughly 30-40% of the builds. This, however, heppens only on our Jenkins CI infrastructure, and not on Travis CI builds or local machine.

I should also note that we use TestContainers for integration of other relational databases (PostgreSQL, MySQL, MariaDB) as well, but with no such issues.

Error Message

org.testcontainers.containers.ContainerLaunchException: Container startup failed

Stacktrace

org.testcontainers.containers.ContainerLaunchException: Container startup failed
	at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:199)
	at org.testcontainers.containers.GenericContainer.starting(GenericContainer.java:628)
	at org.testcontainers.containers.FailureDetectingExternalResource$1.evaluate(FailureDetectingExternalResource.java:29)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:190)
	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:114)
	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:57)
	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66)
	at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
	at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
	at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
	at com.sun.proxy.$Proxy1.processTestClass(Unknown Source)
	at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:108)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
	at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:146)
	at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:128)
	at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
	at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
	at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.rnorth.ducttape.RetryCountExceededException: Retry limit hit with exception
	at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:83)
	at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:192)
	... 34 more
Caused by: org.testcontainers.containers.ContainerLaunchException: Could not create/start container
	at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:264)
	at org.testcontainers.containers.GenericContainer.lambda$start$0(GenericContainer.java:194)
	at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:76)
	... 35 more
Caused by: org.rnorth.ducttape.TimeoutException: org.rnorth.ducttape.TimeoutException: java.util.concurrent.TimeoutException
	at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:53)
	at org.testcontainers.containers.JdbcDatabaseContainer.waitUntilContainerStarted(JdbcDatabaseContainer.java:88)
	at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:243)
	... 37 more
Caused by: org.rnorth.ducttape.TimeoutException: java.util.concurrent.TimeoutException
	at org.rnorth.ducttape.timeouts.Timeouts.callFuture(Timeouts.java:59)
	at org.rnorth.ducttape.timeouts.Timeouts.getWithTimeout(Timeouts.java:32)
	at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:38)
	... 39 more
Caused by: java.util.concurrent.TimeoutException
	at java.util.concurrent.FutureTask.get(FutureTask.java:205)
	at org.rnorth.ducttape.timeouts.Timeouts.callFuture(Timeouts.java:54)
	... 41 more

Standard Error

Feb 12, 2018 7:09:36 PM com.microsoft.sqlserver.jdbc.SQLServerConnection Prelogin
WARNING: ConnectionID:1 ClientConnectionId: 6534f393-9bf8-4278-9eae-d3a9aaaeb4e1 Prelogin error: host localhost port 32929 Unexpected end of prelogin response after 0 bytes read
Feb 12, 2018 7:09:36 PM com.microsoft.sqlserver.jdbc.SQLServerConnection Prelogin
WARNING: ConnectionID:1 ClientConnectionId: 2a9df843-bd21-431f-8e04-5f295ee14db4 Prelogin error: host localhost port 32929 Unexpected end of prelogin response after 0 bytes read
Feb 12, 2018 7:09:36 PM com.microsoft.sqlserver.jdbc.SQLServerConnection Prelogin
WARNING: ConnectionID:1 ClientConnectionId: 994e2acb-4024-44ba-8b23-cd39f4c03841 Prelogin error: host localhost port 32929 Unexpected end of prelogin response after 0 bytes read
Feb 12, 2018 7:09:36 PM com.microsoft.sqlserver.jdbc.SQLServerConnection Prelogin
WARNING: ConnectionID:1 ClientConnectionId: 997b401c-97a2-4ace-aa9d-ca0477cacc73 Prelogin error: host localhost port 32929 Unexpected end of prelogin response after 0 bytes read
Feb 12, 2018 7:09:37 PM com.microsoft.sqlserver.jdbc.SQLServerConnection Prelogin
WARNING: ConnectionID:1 ClientConnectionId: c697678a-ad5d-4991-8e9d-8e7c20cb2f5d Prelogin error: host localhost port 32929 Unexpected end of prelogin response after 0 bytes read
Feb 12, 2018 7:09:38 PM com.microsoft.sqlserver.jdbc.SQLServerConnection Prelogin
WARNING: ConnectionID:1 ClientConnectionId: acdebecf-da3d-42cc-9d3c-e8c7f610b0ea Prelogin error: host localhost port 32929 Unexpected end of prelogin response after 0 bytes read
Feb 12, 2018 7:09:39 PM com.microsoft.sqlserver.jdbc.SQLServerConnection Prelogin
WARNING: ConnectionID:1 ClientConnectionId: 4b57c876-48c6-431c-b958-da86ef5b2e4c Prelogin error: host localhost port 32929 Error reading prelogin response: Connection reset ClientConnectionId:4b57c876-48c6-431c-b958-da86ef5b2e4c
Feb 12, 2018 7:09:40 PM com.microsoft.sqlserver.jdbc.SQLServerConnection Prelogin
WARNING: ConnectionID:1 ClientConnectionId: 68e7f99e-2d70-4bcc-9fe4-654a875e56fd Prelogin error: host localhost port 32929 Error reading prelogin response: Connection reset ClientConnectionId:68e7f99e-2d70-4bcc-9fe4-654a875e56fd
Feb 12, 2018 7:09:41 PM com.microsoft.sqlserver.jdbc.SQLServerConnection Prelogin
WARNING: ConnectionID:1 ClientConnectionId: 2aabc0ea-6c43-4d94-a894-9b2c42f72f05 Prelogin error: host localhost port 32929 Unexpected end of prelogin response after 0 bytes read
Feb 12, 2018 7:09:42 PM com.microsoft.sqlserver.jdbc.SQLServerConnection Prelogin
WARNING: ConnectionID:1 ClientConnectionId: 268d348e-4694-4d8e-bae9-98071985e5bf Prelogin error: host localhost port 32929 Error reading prelogin response: Connection reset ClientConnectionId:268d348e-4694-4d8e-bae9-98071985e5bf
Feb 12, 2018 7:09:43 PM com.microsoft.sqlserver.jdbc.SQLServerConnection Prelogin
WARNING: ConnectionID:1 ClientConnectionId: b03b7801-6721-47bf-87fc-237e3a6a4bd6 Prelogin error: host localhost port 32929 Unexpected end of prelogin response after 0 bytes read

The integration test in subject is available here.

Should we permanently resort to a custom WaitStrategy or could MSSQLServerContainer get a bit more leeway out of the box?

@kiview
Copy link
Member

kiview commented Jun 12, 2018

Just pinging @StefanHufschmidt, maybe he has an idea.

@StefanHufschmidt
Copy link
Contributor

We got some strange behavior with the MS SQL Server image for linux too.

It was sometimes the case that the image could not be pulled within the timeout time. - But you can see your image starting, so this should not be your problem.

Sometimes the image was strangely broken and we had to pull it again afterwards everything worked back again. - That should occur on every run afterwards until the image has been deleted and re-pulled back again. Since the MS SQL Server image is very large you might don't want to pull it every time.

I've noticed the warnings from the MS SQL Server container some time ago. This does not look very well but should not affect the whole start up process.

Since the MS SQL Server image is 1.43GB large and we have a default timeout of 120 seconds it could be that the download of the image is not fast enough. I saw that you've increased the timeout to 240 seconds. @vpavic have you tried it with an even higher value?

What's about the connection timeout?
JavaDoc says: * @return time to allow for the database to start and establish an initial connection, in seconds
Give it a try to increase it as well:

		@Override
		protected int getConnectTimeoutSeconds() {
			return 240;
		}

Sometimes the docker image seems to start very slow. So this might be a solution.

@vpavic
Copy link
Contributor Author

vpavic commented Jun 13, 2018

Thanks for the feedback @kiview and @StefanHufschmidt.

I saw that you've increased the timeout to 240 seconds. @vpavic have you tried it with an even higher value?

We've had this test marked with @Ignore for some time now so it didn't run on the CI env where it caused problems. Thanks for the #getConnectTimeoutSeconds tip, we'll try with that as well.

Just to note, providing custom timeout value(s) isn't a problem by itself, however when you do that in multiple different projects it becomes a bit annoying, so my hope was this issue would result in a more sane default in Testcontainers.

StefanHufschmidt added a commit to StefanHufschmidt/testcontainers-java that referenced this issue Jun 13, 2018
StefanHufschmidt added a commit to StefanHufschmidt/testcontainers-java that referenced this issue Jun 13, 2018
bsideup pushed a commit that referenced this issue Jun 13, 2018
* Added builder for timeouts in JdbcDatabaseContainer (#715)

* Adjusted Oracle and SQL Server to use other default timeouts (#715)
@vpavic
Copy link
Contributor Author

vpavic commented Jul 31, 2018

We have had success with our integration tests that use MSSQLServerContainer for some time now with startup and connection timeouts of 240 seconds.

@kiview @bsideup Would you accept a PR that sets these timeout values as defaults for MSSQLServerContainer? That would effectively resolve this issue.

@kiview
Copy link
Member

kiview commented Oct 1, 2018

@vpavic This was solved in #748 for you, wasn't it?
I'll close for now.

@kiview kiview closed this as completed Oct 1, 2018
@vpavic
Copy link
Contributor Author

vpavic commented Oct 1, 2018

@kiview See the comment just above yours - I was offering to submit a PR that would update the defaults, which is the thing that addressed the problem for us.

#748 just provided a nicer way for users to update the defaults.

@kiview
Copy link
Member

kiview commented Oct 1, 2018

Oh sorry, I see.
I think a PR would be a good idea, thanks.

@vpavic
Copy link
Contributor Author

vpavic commented Oct 1, 2018

Thanks - I'll try to put together a PR over the next day or two.

@vpavic
Copy link
Contributor Author

vpavic commented Oct 4, 2018

Looking at this, I realized that #748 actually did update the default timeouts to 240 seconds. Not sure how I managed to missed that.

Anyway, that makes this issues indeed resolved by #748. Thanks for the support!

@vpavic vpavic closed this as completed Oct 4, 2018
@ftardif
Copy link
Contributor

ftardif commented Mar 11, 2019

I am experience similar random problems with DockerComposeContainer based tests. Is there a way to override the connect timeout in this case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants