Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🐛 Bug]: Client times out creating a session after running a number of successful sessions #13718

Open
Vlatombe opened this issue Mar 21, 2024 · 5 comments

Comments

@Vlatombe
Copy link

Vlatombe commented Mar 21, 2024

What happened?

We run the Jenkins Acceptance Test harness. We recently migrated from Selenium 4.11.0 to 4.18.1 (jenkinsci/acceptance-test-harness#1499) but got stuck because of execution issues that were not occurring before.

Our environment runs subsets of the tests suite inside a docker environment:

  • a container running the container image selenium/standalone-firefox:4.18.1
  • another container running the test harness using maven and targetting the selenium server.

All tests run sequentially, so it is expected that only one selenium session is active at a given time.

Obviously both client and server are aligned.

We managed to narrow down the issue to the 4.13.0 -> 4.14.1 selenium upgrade.

The symptoms are the following:

  • Between 34 and 36 selenium sessions run successfully
  • A new session is requested by the client
  • Server http logging indicates the session request is processed, a session is created and returned (HTTP 200 on POST /session)
  • However the client never receives the response, and ends up timing out after 3 minutes
  • Since the session is never used on server side, it ends up timing out after 5 minutes.
  • Sometimes, after a longer while (~1 hour) some tests start working again, but then some later tests go back to the same problem and time out trying to grab a new session.

All the tests are launched using the initialization sequence (link), so the problem is clearly not tied to one test content.

I managed to find a workaround for this problem: I added a proxy server (caddy reverse-proxy --from :4444 --to firefox:4444 --insecure) between the client and selenium server and the problem went away.

How can we reproduce the issue?

I never managed to reproduce the problem locally on a Mac, running the same tests against a local selenium server.

Even though I never tracked down the root cause for this, it is probably network stack related. The fact it started to appear just after migrating to 4.14, matching with the JDK http client switch, makes me think it could be related to the problem, even though it seems to affect the server side (a new JVM is forked for each new test class).

I'm hoping this eventually helps to diagnose a possible issue, and leave it for posterity in case anyone faces a similar problem.

Relevant log output

Client side exception

Caused by: org.openqa.selenium.SessionNotCreatedException: Could not start a new session. Possible causes are invalid address of the remote server or browser start-up failure. 
Host info: host: '739ec46b65a0', ip: '172.18.0.3'
	at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:537)
	at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:233)
	at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:162)
	at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:142)
	at org.jenkinsci.test.acceptance.FallbackConfig.buildRemoteWebDriver(FallbackConfig.java:162)
	at org.jenkinsci.test.acceptance.FallbackConfig.createWebDriver(FallbackConfig.java:149)
	at org.jenkinsci.test.acceptance.FallbackConfig.createWebDriver(FallbackConfig.java:315)
	at org.jenkinsci.test.acceptance.FallbackConfig$$FastClassByGuice$$65dd6.GUICE$TRAMPOLINE(<generated>)
	at org.jenkinsci.test.acceptance.FallbackConfig$$FastClassByGuice$$65dd6.apply(<generated>)
	at com.google.inject.internal.ProviderMethod$FastClassProviderMethod.doProvision(ProviderMethod.java:260)
	at com.google.inject.internal.ProviderMethod.doProvision(ProviderMethod.java:171)
	at com.google.inject.internal.InternalProviderInstanceBindingImpl$CyclicFactory.provision(InternalProviderInstanceBindingImpl.java:185)
	at com.google.inject.internal.InternalProviderInstanceBindingImpl$CyclicFactory.get(InternalProviderInstanceBindingImpl.java:162)
	at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
	at org.jenkinsci.test.acceptance.guice.TestLifecycle.lambda$scope$0(TestLifecycle.java:58)
	at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45)
	at com.google.inject.internal.SingleFieldInjector.inject(SingleFieldInjector.java:50)
	at com.google.inject.internal.MembersInjectorImpl.injectMembers(MembersInjectorImpl.java:146)
	at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:124)
	at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:91)
	at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:300)
	at com.google.inject.internal.InjectorImpl$1.get(InjectorImpl.java:1148)
	... 25 more
Caused by: org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
Build info: version: '4.18.1', revision: 'b1d3319b48'
System info: os.name: 'Linux', os.arch: 'amd64', os.version: '5.15.146+', java.version: '17.0.10'
Driver info: driver.version: RemoteWebDriver
	at org.openqa.selenium.remote.http.jdk.JdkHttpClient.execute0(JdkHttpClient.java:403)
	at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
	at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:55)
	at org.openqa.selenium.remote.http.jdk.JdkHttpClient.execute(JdkHttpClient.java:359)
	at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:54)
	at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:114)
	at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:95)
	at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:67)
	at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:162)
	at org.openqa.selenium.remote.TracedCommandExecutor.execute(TracedCommandExecutor.java:51)
	at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:519)
	... 46 more
Caused by: java.util.concurrent.TimeoutException
	at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1960)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2095)
	at org.openqa.selenium.remote.http.jdk.JdkHttpClient.execute0(JdkHttpClient.java:386)
	... 56 more

Server-side logging

 13:38:07.643 INFO [LocalDistributor.newSession] - Session request received by the Distributor: 
  [Capabilities {acceptInsecureCerts: true, browserName: firefox, moz:debuggerAddress: true, moz:firefoxOptions: {prefs: {dom.disable_beforeunload: false, dom.max_chrome_script_run_time: 1500000, dom.max_script_run_time: 1500000, intl.accept_languages: en}}}]
 13:38:09.783 INFO [LocalNode.newSession] - Session created by the Node. Id: 62f46cbb-85a4-4e39-9959-0747e54be196, Caps: Capabilities {acceptInsecureCerts: true, browserName: firefox, browserVersion: 123.0, moz:accessibilityChecks: false, moz:buildID: 20240213221259, moz:debuggerAddress: 127.0.0.1:12793, moz:firefoxOptions: {prefs: {dom.disable_beforeunload: false, dom.max_chrome_script_run_time: 1500000, dom.max_script_run_time: 1500000, intl.accept_languages: en}}, moz:geckodriverVersion: 0.34.0, moz:headless: false, moz:platformVersion: 5.15.146+, moz:processID: 13765, moz:profile: /tmp/rust_mozprofile9K3PEI, moz:shutdownTimeout: 60000, moz:webdriverClick: true, moz:windowless: false, pageLoadStrategy: normal, platformName: linux, proxy: Proxy(), se:bidiEnabled: false, se:cdp: ws://172.18.0.2:4444/sessio..., se:cdpVersion: 85.0, se:noVncPort: 7900, se:vnc: ws://172.18.0.2:4444/sessio..., se:vncEnabled: true, se:vncLocalAddress: ws://172.18.0.2:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify}
 13:38:09.785 INFO [LocalDistributor.newSession] - Session created by the Distributor. Id: 62f46cbb-85a4-4e39-9959-0747e54be196 
  Caps: Capabilities {acceptInsecureCerts: true, browserName: firefox, browserVersion: 123.0, moz:accessibilityChecks: false, moz:buildID: 20240213221259, moz:debuggerAddress: 127.0.0.1:12793, moz:firefoxOptions: {prefs: {dom.disable_beforeunload: false, dom.max_chrome_script_run_time: 1500000, dom.max_script_run_time: 1500000, intl.accept_languages: en}}, moz:geckodriverVersion: 0.34.0, moz:headless: false, moz:platformVersion: 5.15.146+, moz:processID: 13765, moz:profile: /tmp/rust_mozprofile9K3PEI, moz:shutdownTimeout: 60000, moz:webdriverClick: true, moz:windowless: false, pageLoadStrategy: normal, platformName: linux, proxy: Proxy(), se:bidiEnabled: false, se:cdp: ws://172.18.0.2:4444/sessio..., se:cdpVersion: 85.0, se:noVncPort: 7900, se:vnc: ws://172.18.0.2:4444/sessio..., se:vncEnabled: true, se:vncLocalAddress: ws://172.18.0.2:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify}
 13:38:09.786 INFO [SeleniumSpanExporter$1.lambda$export$3] - {"traceId": "069d010f096ae47834e48fd852c8ba33","eventTime": 1710941889785258977,"eventName": "HTTP request execution complete","attributes": {"http.flavor": 1,"http.handler_class": "org.openqa.selenium.grid.sessionqueue.local.LocalNewSessionQueue","http.host": "firefox:4444","http.method": "POST","http.request_content_length": "444","http.scheme": "HTTP","http.status_code": 200,"http.target": "\u002fsession","http.user_agent": "selenium\u002f4.18.1 (java unix)"}}
 
 13:43:18.869 INFO [LocalNode.stopTimedOutSession] - Session id 62f46cbb-85a4-4e39-9959-0747e54be196 timed out, stopping...
 13:43:19.185 INFO [LocalSessionMap.lambda$new$0] - Deleted session from local Session Map, Id: 62f46cbb-85a4-4e39-9959-0747e54be196
 13:43:19.185 INFO [GridModel.release] - Releasing slot for session id 62f46cbb-85a4-4e39-9959-0747e54be196
 13:43:19.185 INFO [SessionSlot.stop] - Stopping session 62f46cbb-85a4-4e39-9959-0747e54be196


### Operating System

Ubuntu

### Selenium version

4.18.1

### What are the browser(s) and version(s) where you see this issue?

Firefox 123

### What are the browser driver(s) and version(s) where you see this issue?

GeckoDriver 0.34

### Are you using Selenium Grid?

_No response_
Copy link

@Vlatombe, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

@diemol
Copy link
Member

diemol commented Mar 21, 2024

Do you see the issue if you use Chrome or Edge? Just trying to locate where the problem might be.

@Vlatombe
Copy link
Author

I'll check with Chrome just in case, however given the reverse proxy workaround, I'm doubting this has anything to do with the browser used.

@Vlatombe
Copy link
Author

FWIW I ran the same suite on chrome, exact same result

@joerg1985
Copy link
Member

I had a look at the code and did find an field read with missing synchronize statement, will fix this the next days. But i don't think this is the root cause here.

Increasing the client timeout might produce another error message (6 minutes should be fine) and might help to find the root cause. Some of the default grid timeouts are higher than the default client timeout, this is allready tracked in other issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants