Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[java] Use Process class to execute SM in Java (fix #13091) #13123

Closed
wants to merge 2 commits into from

Conversation

bonigarcia
Copy link
Member

@bonigarcia bonigarcia commented Nov 9, 2023

Description

This PR changes the use of the org.openqa.selenium.build.ExternalProcess Selenium class to the standard java.lang.Process (through java.lang.ProcessBuilder) to execute Selenium Manager in Java.

UPDATE: The second commit in this PR includes some logic for retrying the command execution, as reported in #13091.

Motivation and Context

It has been reported by some users (see #13091) that sometimes, the SeleniumManager Java logic fails due to an empty output by Selenium Manager. After troubleshooting it, it seems the cause is the use of ExternalProcess, and using Process fixes it.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • I have read the contributing document.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (eb82278) 57.70% compared to head (d0b9bdb) 57.70%.
Report is 3 commits behind head on trunk.

❗ Current head d0b9bdb differs from pull request most recent head 59cb63d. Consider uploading reports for the commit 59cb63d to get more accurate results

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##            trunk   #13123   +/-   ##
=======================================
  Coverage   57.70%   57.70%           
=======================================
  Files          86       86           
  Lines        5289     5289           
  Branches      208      208           
=======================================
  Hits         3052     3052           
  Misses       2029     2029           
  Partials      208      208           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@joerg1985
Copy link
Member

I think it might be better to fix the root cause in the ExternalProcess class.

One possible cause could be reading the outputstream asynchronous in a different thread.
If the thread is not started yet / has not finished to copy the output the getOutput might be incomplete.

A possible fix to this would be to call Thread.join for the copy thread when ExternalProcess.waitFor has been called.

I could create a PR for this.

@bonigarcia
Copy link
Member Author

I agree that if the ExternalProcess is used in another place of the Selenium code base, it'd be better to be fixed.

But in the particular case of the SeleniumManager class. What is the benefit of using ExternalProcess vs Process? I suppose I don't see the whole picture. But if both implementations work the same way, wouldn't it be better to rely on the standard Java library than on our proper implementation?

@joerg1985
Copy link
Member

joerg1985 commented Nov 9, 2023

The ExternalProcess does ensure you don't hit common issues with the Process class and should not add new ones ;)

One example, if the SM would send a huge output to stdout the process would lock as soon as the outputbuffer is full.
The output must be read to ensure the process can further write to stdout.

In the PR the java code will wait for the process to exit, before reading the stdout.
As soon as the outputbuffer is full, you have a kind of deadlock situation.

@bonigarcia
Copy link
Member Author

Just for the record, the SM output should not be long. It should be a JSON file with around 55 lines (i.e., less than 2K). In case of an unexpected error, the output (panic description) should be even less.

@joerg1985
Copy link
Member

I just pushed d1787a9 to ensure processing the output has been completed before returning from a .shutdown or .waitFor call.

@dummyuser992
Copy link

dummyuser992 commented Nov 21, 2023

are there any updates here?

@diemol
Copy link
Member

diemol commented Nov 21, 2023

@bonigarcia @joerg1985 did you folks agree on something? Are we using the ExternalProcess class or not?

@joerg1985
Copy link
Member

It is safe to use the ExternalProcess now. The PR does contain a retry on Exception mechanism as a workaround to the error=26, Text file busy., this should be rebased in my mind.

@joerg1985
Copy link
Member

PS: SeleniumHQ/docker-selenium#2008 should make the retry obsolete and it might be a good idea to remove the retry again, to see if the root issue happens outside docker too.

@diemol
Copy link
Member

diemol commented Nov 21, 2023

PS: SeleniumHQ/docker-selenium#2008 should make the retry obsolete and it might be a good idea to remove the retry again, to see if the root issue happens outside docker too.

I don't remember where, but someone mentioned this issue outside Docker.

@bonigarcia
Copy link
Member Author

I don't have a strong opinion on this. I created this PR for the first time because there was a problem related to the execution of Selenium Manager in Java, in which the Selenium Manager was sometimes empty. My intuition was the problem in which the SM process was executed in Java. For that reason, I changed using the org.openqa.selenium.build.ExternalProcess Selenium class to the standard java.lang.Process in the first commit of this PR. Thanks to this change, we ensured the problem was not on the Rust side but on the Java side.

After solving the first issue, a second problem occurs (error=26, Text file busy). Since there was no solution for that problem, I added a second commit based on a retries mechanism. That workaround seems to work nicely; at least, users have reported that it works in #13091. If you prefer to solve this problem another way, I'm okay with that. But If you discard this PR, it would be good to test the new approach properly.

Regarding the use of Process vs. ExternalProcess, in my experience, the standard Java class Process works nicely. I used it for years in WebDriverManager and never had problems with it. The problem with a long output by Selenium Manager should not happen, so I don't see any particular value in using ExternalProcess in the SeleniumManager Java class. But if you are 100% sure that ExternalProcess is correct now, go ahead and use it.

@krmahadevan
Copy link
Contributor

@bonigarcia - ExternalProcess I think is just an abstraction of Process. I see that it internally leverages the Process class to get the work done.

@diemol
Copy link
Member

diemol commented Nov 22, 2023

I released a new set of Docker images (https://github.com/SeleniumHQ/docker-selenium/releases/tag/4.15.0-20231122), now the browser driver path is set via properties and with that, Selenium Manager is avoided.

I would leave this PR open for a while in case people still report this issue in non Docker environments.

@ctunkl
Copy link

ctunkl commented Nov 23, 2023

Hi there!

PS: SeleniumHQ/docker-selenium#2008 should make the retry obsolete and it might be a good idea to remove the retry again, to see if the root issue happens outside docker too.

I don't remember where, but someone mentioned this issue outside Docker.

That wasn't me, but we do have this problem outside of Docker. It's the same "error failed to parse json output" error mentioned in #13091:

net.thucydides.core.webdriver.DriverConfigurationError: WebDriver was unable to create a new instance of type class org.openqa.selenium.chrome.ChromeDriver WebDriver reported the following message: Unable to obtain: Capabilities {acceptInsecureCerts: false, browserName: chrome, goog:chromeOptions: {args: [remote-allow-origins=*, --headless=new, --window-size=1200,1600], extensions: []}}, error Failed to parse json output, executed: [--browser, chrome, --output, json]

@diemol
Copy link
Member

diemol commented Nov 23, 2023

Hi there!

PS: SeleniumHQ/docker-selenium#2008 should make the retry obsolete and it might be a good idea to remove the retry again, to see if the root issue happens outside docker too.

I don't remember where, but someone mentioned this issue outside Docker.

That wasn't me, but we do have this problem outside of Docker. It's the same "error failed to parse json output" error mentioned in #13091:

net.thucydides.core.webdriver.DriverConfigurationError: WebDriver was unable to create a new instance of type class org.openqa.selenium.chrome.ChromeDriver WebDriver reported the following message: Unable to obtain: Capabilities {acceptInsecureCerts: false, browserName: chrome, goog:chromeOptions: {args: [remote-allow-origins=*, --headless=new, --window-size=1200,1600], extensions: []}}, error Failed to parse json output, executed: [--browser, chrome, --output, json]

Have you tried again with the Nightly version? https://www.selenium.dev/downloads/#nightly

@ctunkl
Copy link

ctunkl commented Nov 23, 2023

Hi there!

PS: SeleniumHQ/docker-selenium#2008 should make the retry obsolete and it might be a good idea to remove the retry again, to see if the root issue happens outside docker too.

I don't remember where, but someone mentioned this issue outside Docker.

That wasn't me, but we do have this problem outside of Docker. It's the same "error failed to parse json output" error mentioned in #13091:
net.thucydides.core.webdriver.DriverConfigurationError: WebDriver was unable to create a new instance of type class org.openqa.selenium.chrome.ChromeDriver WebDriver reported the following message: Unable to obtain: Capabilities {acceptInsecureCerts: false, browserName: chrome, goog:chromeOptions: {args: [remote-allow-origins=*, --headless=new, --window-size=1200,1600], extensions: []}}, error Failed to parse json output, executed: [--browser, chrome, --output, json]

Have you tried again with the Nightly version? https://www.selenium.dev/downloads/#nightly

I have done so now (using 4.16.0-SNAPSHOT), and the problem seems to be fixed! 🎉

@diemol
Copy link
Member

diemol commented Nov 23, 2023

Hi there!

PS: SeleniumHQ/docker-selenium#2008 should make the retry obsolete and it might be a good idea to remove the retry again, to see if the root issue happens outside docker too.

I don't remember where, but someone mentioned this issue outside Docker.

That wasn't me, but we do have this problem outside of Docker. It's the same "error failed to parse json output" error mentioned in #13091:
net.thucydides.core.webdriver.DriverConfigurationError: WebDriver was unable to create a new instance of type class org.openqa.selenium.chrome.ChromeDriver WebDriver reported the following message: Unable to obtain: Capabilities {acceptInsecureCerts: false, browserName: chrome, goog:chromeOptions: {args: [remote-allow-origins=*, --headless=new, --window-size=1200,1600], extensions: []}}, error Failed to parse json output, executed: [--browser, chrome, --output, json]

Have you tried again with the Nightly version? https://www.selenium.dev/downloads/#nightly

I have done so now (using 4.16.0-SNAPSHOT), and the problem seems to be fixed! 🎉

Great. Then it seems this PR won't be needed. Let's wait until we get more feedback.

@giuliohome
Copy link

giuliohome commented Nov 24, 2023

But If you discard this PR, it would be good to test the new approach properly.

@bonigarcia 💯 agree with you. I'd especially appreciate it if someone could include this test as well among the others. Well, in the meantime, I've already confirmed that the download test is okay in Docker. I'll now try it in a real Kubernetes cluster. Tested also in AKS, it seems ok.

@giuliohome

This comment was marked as off-topic.

@giuliohome

This comment was marked as off-topic.

@dummyuser992
Copy link

if current issue seems like solved, can we release this fix?

@dummyuser992
Copy link

guys, who can make go/no go release this fix, since our team is experiencing this issue and we need to fix it?

@diemol
Copy link
Member

diemol commented Dec 1, 2023

guys, who can make go/no go release this fix, since our team is experiencing this issue and we need to fix it?

If you need this now, you can use Nightly: https://www.selenium.dev/downloads/#nightly

We will try to do a release next week if time and availability permits.

@diemol
Copy link
Member

diemol commented Dec 1, 2023

@giuliohome those errors are not related to the issue, you see this error in the Node. I will mark those comments as off-topic.

@giuliohome
Copy link

@giuliohome those errors are not related to the issue, you see this error in the Node. I will mark those comments as off-topic.

I have forwarded your response to my colleague. From my perspective, I have communicated my stance on the matter - that this doesn't align with an open-source approach. I strongly advise against using such software in our company.

@titusfortner
Copy link
Member

what doesn't align with an open source approach?

It appears that your recent comments do not apply to the solution provided by this PR. Please open a new issue if there is a problem that needs to be addressed.

@giuliohome
Copy link

what doesn't align with an open source approach?

It appears that your recent comments do not apply to the solution provided by this PR. Please open a new issue if there is a problem that needs to be addressed.

The way you assess that my comments do not apply and mark them as off-topic raises a concern. The point I'm making is precisely that my comments are closely tied to the ongoing discussion.

Specifically, regarding the use of Process vs. ExternalProcess, and the feedback requested just before them.

More broadly, the issue extends to the lack of clarification in your statement:

those errors are not related to the issue, you see this error in the Node.

This statement is rather brief and could be interpreted as somewhat dismissive. To provide a clearer context, I'd like to emphasize that I observed the error in the router log, seemingly linked to an exception when saving the process output - a matter discussed previously. Even if there's a misunderstanding on my part, the communication doesn't seem conducive to a collaborative approach that values understanding users' perspectives (from a devrel communication perspective, it's crucial to ensure a more open and receptive tone, even when closing a comment).

@diemol
Copy link
Member

diemol commented Dec 2, 2023

what doesn't align with an open source approach?
It appears that your recent comments do not apply to the solution provided by this PR. Please open a new issue if there is a problem that needs to be addressed.

The way you assess that my comments do not apply and mark them as off-topic raises a concern. The point I'm making is precisely that my comments are closely tied to the ongoing discussion.

Specifically, regarding the use of Process vs. ExternalProcess, and the feedback requested just before them.

More broadly, the issue extends to the lack of clarification in your statement:

those errors are not related to the issue, you see this error in the Node.

This statement is rather brief and could be interpreted as somewhat dismissive. To provide a clearer context, I'd like to emphasize that I observed the error in the router log, seemingly linked to an exception when saving the process output - a matter discussed previously. Even if there's a misunderstanding on my part, the communication doesn't seem conducive to a collaborative approach that values understanding users' perspectives (from a devrel communication perspective, it's crucial to ensure a more open and receptive tone, even when closing a comment).

I replied that because Selenium Manager is being used only in the Node, and the issue is about that, not the Router, where no external processes are executed, everything is already in the JAR file. If you are seeing something similar that requires a separate issue with enough context, let us know how to troubleshoot it. That is why I marked them as off-topic.

@giuliohome
Copy link

The situation is much clearer now. It turns out I was initially selecting the wrong pod in Grafana. For instance, the error I observed from the Node pod is as follows:

2023-11-29 12:34:50.418 | at java.base/java.lang.Thread.run(Thread.java:829) |  
-- | -- | --
  |   | 2023-11-29 12:34:50.418 | at org.openqa.selenium.os.ExternalProcess$Builder.lambda$start$0(ExternalProcess.java:209) |  
  |   | 2023-11-29 12:34:50.418 | at java.base/java.io.InputStream.transferTo(InputStream.java:704) |  
  |   | 2023-11-29 12:34:50.418 | at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:342) |  
  |   | 2023-11-29 12:34:50.418 | at java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176) |  
  |   | 2023-11-29 12:34:50.418 | java.io.IOException: Stream closed |  
  |   | 2023-11-29 12:34:50.418 | 11:34:50.416 WARN [ExternalProcess$Builder.lambda$start$0] - failed to copy the output of process 15191

This error seems somehow related to the ongoing discussion.

Additionally, during the same timeframe, the Kubernetes Ingress Controller reported lines referring to both the Router and the download part of our test:

2023-11-29 12:34:50.108 | {   "namespace": "test-automation",   "ingress_name": "selenium-ingress",   "service_name": "selenium-router",   "service_port": "4444",   "request_id": "a1321224cf10b6004290e9bd2c2a7645",   "request_length": "471",   "remote_addr": "10.65.131.136",   "remote_port": "45687",   "time_iso8601": "2023-11-29T11:34:50+00:00",   "request_uri": "/wd/hub/session/7a6183da89044bb24d29a58fe1df7fc5/se/files"
-- | --

@dummyuser992
Copy link

can we merge it and release?

@diemol
Copy link
Member

diemol commented Dec 7, 2023

4.16.1 is out, are you still seeing the issue with this version?

@titusfortner
Copy link
Member

This statement is rather brief and could be interpreted as somewhat dismissive

Please be careful about reading into things in responses here. The Selenium team is composed of people from many countries and cultures and languages. You will see a variety of tones and terseness in the conversations in our Slack and in our GitHub issues.

We get a lot of new issues every day and most of the work is the thankless job of trying to differentiate between things that need to be fixed and things that don't need to be fixed, and things that help us determine between those conditions and things that do not. If something can get put in a "not helpful" bucket, we're going to quickly put it there so we can sift through the next pile of information.

@giuliohome
Copy link

We get a lot of new issues every day and most of the work is the thankless job of trying to differentiate between things that need to be fixed and things that don't need to be fixed

That's also my job, though in a different context. Meanwhile I understood what was the cause of the message that has been put off-topic here and indeed it has been thanks to a colleague of mine that pointed me to the problem by asking me to solve an issue seen by a dev team working for us. So again, my conclusion is that before differentiating things must be well understood and often the communication among different teams is key.

Back to the main topic, here now I don't understand why someone wants to merge this if the underlying issue seems to be already fixed in a quite different way:

now the browser driver path is set via properties and with that, Selenium Manager is avoided

@titusfortner
Copy link
Member

don't understand why someone wants to merge this if the underlying issue seems to be already fixed in a quite different way.

The issue is that while Selenium Manager should work on Docker, it shouldn't get used on Docker. So we added the code to keep it from getting used. But the same thing that caused the problem we saw on Docker may exist in non-docker environments, and we may need this PR to address it, so we're leaving it open.

@bonigarcia
Copy link
Member Author

I believe this PR is no longer needed. I'm closing it.

@bonigarcia bonigarcia closed this Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

9 participants