Performance issues #116

ghost · 2017-05-18T13:42:31Z

Hi all,

While using Zalenium, I've had some performance issues with the Docker containers. It's very slow to start
and to launch containers to the point where all the tests start timing out. Even when I use a fixed number of containers, some of them time out and are shut down, and it ends up being very slow waiting for the others to start. For reference, I'm usually running about 10 to 20 tests in parallel from a test run of over 150 tests, and the machine slows down considerably, and all the tests past the first one usually fail.
Is there any way, to speed up this process? Either a feature that is planned, available, or by hacking.

Thank you in advance.

diemol · 2017-05-18T17:51:30Z

Hi @joao-valente,

It might be possible that some performance issues arise in cases of high concurrency or many tests running in parallel. Our normal scenarios don't run more than 4 tests in parallel so that's why we have not seen something relevant yet.

We have seen some things that we can improve:

Scale horizontally (we plan to work on it during the Summer), see kubernetes support #103
We found an improvement in the container creation, in some cases it was failing and then some tests were failing randomly (we want to release this improvement next week).
We are also thinking about reusing containers for more than one test, to avoid the continuous creation. This is just an idea and we need to see if it makes sense. Restart node after run #135
We create the new containers in a very conservative way, one by one, because we have seen that the Grid has problems when many nodes come at the same time to register. We'll try to improve that so the container creation is more fluent. Too many containers are created #143

Maybe you can help us with some more information, perhaps a timeline of how things happen. From the beginning where everything is running fine, and then what happens afterwards to make the performance go down. Also, what hardware specs are you using? How many containers are you starting at the beginning?

With more info from your side we could come up with more ideas.

woza2000 · 2017-05-23T03:39:44Z

Hi @diemol

Reusing containers does make sense.

In my case, I need static nodes which I can get internal IP, and allocate user account per IP. When nodes are dynamic, I have to use extra database to manage these account in order to make sure each test get unique account. Because I can't create thousands of testing accounts, I have to lock and unlock them in db during every test running.

Your improvement plan is highly appreciated.

SrinivasanTarget · 2017-05-25T15:51:15Z

@diemol Executing simultaneously on 10 ~ 20 containers doesn't yield stable results.Tests hangs sometimes and I see interrupt and null pointer exceptions in logs. Can share the logs tomorrow if required.

diemol · 2017-05-25T16:55:15Z

@SrinivasanTarget could you please also share your HW specifications? Logs are helpful as well.
So far, we have seen that Zalenium performs well depending on the available RAM and processor power.

SrinivasanTarget · 2017-05-25T16:59:59Z

@diemol Yup i was running in a 16Gig VM which runs Ubuntu 16.x. Don't have logs in hand now. Will share it surely tomorrow. Was trying to execute around 200~ tests with 20 containers spinned up. Same execution on elgalu/docker-selenium was fine.

diemol · 2017-05-25T17:42:17Z

Thanks @SrinivasanTarget, logs will be useful. Perhaps you can also share with us:

How you start Zalenium
how many threads you configure in your tests
how you start elgalu/docker-selenium when the execution goes well

All this info will be very helpful for us :)

SrinivasanTarget · 2017-05-26T11:43:32Z

@diemol Please find the zalenium logs here: https://gist.github.com/SrinivasanTarget/a88aa39274717d31af46d01056408175

How you start Zalenium

docker run --rm -ti --name zalenium -p 4444:4444 -p 5555:5555
-v /var/run/docker.sock:/var/run/docker.sock
-v /tmp/videos:/home/seluser/videos
dosel/zalenium start --maxDockerSeleniumContainers 20

Results were same even when executed via docker-compose.

how many threads you configure in your tests

data-provider-thread-count="15" but it is the same results even when count is reduced to 4 or 8.

how you start elgalu/docker-selenium when the execution goes well

Yeah it is through docker-compose https://github.com/elgalu/docker-selenium/blob/master/docker-compose.yml.

SrinivasanTarget · 2017-05-29T07:05:07Z

@diemol Do you have any updates on this? Do you need any other information?

diemol · 2017-05-29T11:07:23Z

Hi @SrinivasanTarget,

We need more time to check it. We were running 16 parallel tests on a linux machine with 16GB and it worked OK, the same amount of threads on a Mac with 16GB didn't work so well.

We'll check if something can be improved or if it just a matter of HW.

saikrishna321 · 2017-05-29T11:45:29Z

@diemol we also have the same issue, when we bring up more than 15 containers

tacf · 2017-05-30T06:52:42Z

Hi, regarding performance issues, we've found that, with the vanilla containers (from selenium) ram was not an issue (14gb are more than sufficient for 20 containers), in order to stabilize test runs we needed to upgrade from 2 to 4 cores (using azure cloud) and we even move to a 4 core on a improved processor family (30% plus processing capacity). We're working over swarm network and scaling from nothing to 60 containers on 3 machines takes less than a minute, including node registering (i would risk saying about 30 seconds for all nodes to register).

Another side note is that when testing heavy load on the same setup is easy to pass the point where you overload the machine with containers and tests start failing because the grid doesn't respond in time. The same setup we use now to run 60 browsers will run 200 without complains, but the test results will be flaky. Another point to notice is that the browser instance request (to the selenium hub) overhead makes scaling, for instance, from 15 to 20 browser may not really be worth it when running between 100-200 tests on the same test run (assuming parallelism) the request from 0 to X browsers up and running takes to long. We've gained 2 minutes out of 15 going from 15 to 20. Making it 60 parallel browser for the same run made it to 9 minutes only.

My point is that, you guys surely can work out the performance issues as they must have something to do with your own customizations. Loved seeing these features in a more stabilized way. Nonetheless you should note these facts that i worked out in order to differentiate between the performance issues that you can address and the selenium grid nature.

These are my 2 cents, hope it will be helpful.

katryo · 2017-05-30T08:34:51Z

Hi, I also ran into the same (probably) problem.

Start Zalenium with docker run --rm -ti --name zalenium -p 4446:4444 -p 5555:5555 -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/videos:/home/seluser/videos dosel/zalenium start --timeZone "Asia/Tokyo" --videoRecordingEnabled true
Run six tests in parallel using Zalenium
Containers were created, browsers were opened, but they suddenly stopped working when I was watching the browsers with HOST:4446/grid/admin/live.
Several containers became unhealthy while others are healthy.
After I stopped the tests, browser sessions and containers remained.

I reproduced this problem on Mac's Docker too.

I have not experienced this kind of problem when I ran the tests with the official Selenium Docker image ( https://github.com/SeleniumHQ/docker-selenium ) while I ran the eight tests in parallel.

In addition, I got some other problems, such as elgalu/selenium container does not start selenium-node-chrome because of java.lang.RuntimeException: java.net.BindException: Address already in use error, but I cannot reproduce the problem.

OS: CentOS 7.2.1511
CPU: Intel Core Processor (Haswell, no TSX), 2.4GHz, 10 cores
Memory: 10GB

docker's log: https://gist.github.com/katryo/d2c588554d1ace8583ccaa3e755bfb98
other data: https://gist.github.com/katryo/9180919444544db12d3bd1677ca8f6eb

I hope this helps you.

diemol · 2017-05-31T09:23:27Z

Thank you @SrinivasanTarget, @tacf, @katryo, @saikrishna321 for all the info and detailed logs.

Right now we are spending time on reading the logs you submitted us and also running Zalenium in debug mode to spot where the main bottleneck happen when many tests are executed at the same time.

What we plan to achieve is:

After understanding better the logs, make some changes to improve the behaviour when running several tests in parallel.
Come up with some "good usage" guidelines, which should give a hint on how many tests you could run in parallel given some hardware specifications. Also some setup tips, like starting the containers before and things like that.

I am not sure how long this will take, but we are investing time on this because we think that if we are able to fix this performance issues (and adding the Kubernetes feature), Zalenium could become very successful.

SrinivasanTarget · 2017-05-31T16:45:35Z

@diemol Thanks for your response :) Thanks for this wonderful project 👍

I would like to share few observations from my end here.

I hope you guys are aware of https://github.com/aerokube/selenoid. I was trying all available docker selenium solutions in market. Based on my attempts, i see i was able to execute upto ~ 200 tests in 13-15 containers using Zalenium/docker-selenium/elgalu's docker-selenium images in a 16Gig ubuntu machine. I did executed same 200~ scripts in 16Gig Ubuntu machine with 30 containers (CPU usage was 85-90%) using Selenoid successfully. I was able to derive stable results from selenoid during each execution. Though i love the idea of on-demand containers in zalenium, i see selenoid spins up little less containers and seems like they reuse containers to an extent. I think it would be great if Zalenium also resues containers instead of killing/relaunch/registering nodes for each tests. I accept kubernetes/Docker Swarm/ powerful AWS instances might be a long term solution.

we think that if we are able to fix this performance issues

Looking forward to it :)

(and adding the Kubernetes feature)

Are we planning to support Docker Swarm as well because both kubernetes and Docker Swarm supports self healing capability now.

elgalu · 2017-05-31T17:03:12Z

@SrinivasanTarget thanks for this info!! is really helpful

I was trying all available docker selenium solutions in market

Do you think you could send us a PR to add an "Alternatives" section to the README.md listing all available working alternatives (with the links) ? I think this will be very useful to us and to our users, ideally we would differentiate each project per use case so people reading it can decide what fits them better and they don't have to go to trying them all.

It should be something short and concise, if it's too long then a blog post might be a better place though.

SrinivasanTarget · 2017-05-31T17:11:13Z

@elgalu Sure, still couple of solutions left for me to try. Will raise a PR post that attempts.

manoj9788 · 2017-06-01T01:02:18Z

@SrinivasanTarget That's a good piece of work on researching in terms of stability.

diemol · 2017-06-01T07:14:14Z

Thanks for the comments @SrinivasanTarget
I was aware of Selenoid, and we were trying it yesterday and it looks awesome! How come they are not more known?

Continuing with the topic, right now we are in the process of breaking apart Zalenium in pieces to detect where it gets slow when adding many tests in parallel, during this we found yesterday a few things that may lead us to improvements. We'll work on changing the network mode and also changing the way containers are created, until reaching the point where the only limit is the grid itself.

We'll keep you posted.

elgalu · 2017-06-01T09:22:15Z

@SrinivasanTarget you may also want to check https://github.com/seleniumkit/gridrouter as someone pointed out in another issue

SrinivasanTarget · 2017-06-01T09:40:30Z

Yes it is in my list @elgalu :)

elgalu · 2017-06-01T14:40:07Z

As Diego mentioned, we tested Selenoid yesterday with great results! I was able to run, without VNC enabled, 50 tests in parallel within 1 minute in my laptop! (8 cores, 16GB)
Great job! @aandryashin @vania-pooh !!!

Diego is looking into Zalenium performance issues as we speak:)

vania-pooh · 2017-06-01T15:21:38Z

@SrinivasanTarget regarding GridRouter - please try the newer implementation: http://github.com/aerokube/ggr It's also a Golang stuff tested enough in production.

vania-pooh · 2017-06-01T15:28:52Z

Just to put all eggs in one basket :) here are some recently posted articles about ggr and Selenoid:

SrinivasanTarget · 2017-06-01T16:33:13Z

@vania-pooh I did read all the histories today. Interesting and a long journey. Great Stuff 👍

manoj9788 · 2017-06-02T01:01:05Z

@vania-pooh Do you want to submit a paper on this for the upcoming Selenium Conference in Berlin ?

vania-pooh · 2017-06-02T03:26:02Z

@manoj9788: already submitted a talk about scalable Selenium.

manoj9788 · 2017-06-02T04:18:04Z

Oh! yeah! I see that. Thanks.

vania-pooh · 2017-06-02T08:02:55Z

Btw, regarding Selenium server performance I found several places in code that could be optimized:

Jetty 9 is a monster. Too much functionality for Selenium purposes. I would replace by something lightweight e.g. Undertow. It supports both Servlet API and JAX-RS.
Even if we leave Jetty I would use built-in proxying capabilities instead of doing this manually with Apache HTTP client. Take a look at how it's done in original Java-based GridRouter: https://github.com/seleniumkit/gridrouter/blob/master/proxy/src/main/java/ru/qatools/gridrouter/ProxyServlet.java
I think there are some problems in Apache client settings. To reproduce slow down - just connect approximately 20 nodes to hub and request all available browsers. If you then try to open Grid console - you will notice that it opens slowly. My hypothesis is that something locks in Apache HTTP client connection pooler. However I checked - pool size (2000) is enough. So needs further investigation.

diemol · 2017-06-02T10:02:38Z

Thanks for the comments @vania-pooh, and hopefully we meet in SeleniumConf!

I mostly agree with the three points you mention. The thing is that we are using the grid as it is, we are not compiling our own grid (yet, I don't discard to do it in the future). I'll look into them, so maybe we find a way to improve the grid.

We already found ways to improve Zalenium's performance by tuning some of the parameters passed to the grid and also changing the way we create the containers on the fly. We are still testing those changes, but it looks promising.

It won't be as fast Selenoid :), but at least it is running several threads in parallel in a stable way and in a decent time. More details to come soon.

diemol · 2017-06-08T13:31:38Z

Hi all,

We just released version 3.3.1i, where we have improved a few things. Taking the list of improvements that I mentioned in a previous comment, I can give you an update:

Scale horizontally (thanks to @pearj, we are getting closer), see kubernetes support #103 and First cut of kubernetes support #103 #138
We found an improvement in the container creation, in some cases it was failing and then some tests were failing randomly (Released a few weeks ago).
Reusing containers for more than one test, to avoid the continuous creation. An issue was created for this Restart node after run #135, and we'll work on that to keep improving performance.
We now create containers more aggressively, this improved performance but sometimes causes that a few containers more get created, this will be treated in Too many containers are created #143.

We have worked to improve Zalenium and also created a basic document with our findings.

In addition, for the pending tasks there are separated issues that will complete them.

Please check the document and try the new version we have released. Thank you very much for all the input you gave us.

For now, I would like to close this issue since there are too many things in it. In case of finding new bugs or performance problems, please create a new issue and we will work on it. We invite you to contribute to the linked document with your own performance data, so more people can benefit from it.

felippenardi · 2017-06-20T18:38:53Z

@diemol Can you add the tag for 3.3.1l?

diemol · 2017-06-20T19:10:43Z

Hi @felippenardi,

This was released with tag 3.3.1i, but more improvements were doing in subsequent releases, the current release is 3.3.1k.

3.3.1l is still under development.

felippenardi · 2017-06-21T13:58:26Z

Oh got you! Thanks :)

diemol mentioned this issue Jun 1, 2017

kubernetes support #103

Closed

elgalu added the enhancement label Jun 1, 2017

diemol mentioned this issue Jun 6, 2017

Restart node after run #135

Closed

katryo mentioned this issue Jun 7, 2017

"Address already in use" error when elgalu/selenium container tries to start selenium-node-chrome #140

Closed

SrinivasanTarget mentioned this issue Jun 7, 2017

Advanced concept document: running android tests with docker appium/appium#8613

Closed

diemol closed this as completed Jun 8, 2017

diemol mentioned this issue Jun 8, 2017

Session timeout error #145

Closed

snyk-bot mentioned this issue Apr 16, 2021

[Snyk] Security upgrade npm from 6.1.0 to 6.6.0 barahate90/zalenium#39

Open

snyk-bot mentioned this issue Aug 17, 2021

[Snyk] Security upgrade npm from 6.1.0 to 6.6.0 barahate90/zalenium#44

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issues #116

Performance issues #116

ghost commented May 18, 2017

diemol commented May 18, 2017 •

edited

Loading

woza2000 commented May 23, 2017

SrinivasanTarget commented May 25, 2017

diemol commented May 25, 2017

SrinivasanTarget commented May 25, 2017 •

edited

Loading

diemol commented May 25, 2017

SrinivasanTarget commented May 26, 2017 •

edited

Loading

SrinivasanTarget commented May 29, 2017

diemol commented May 29, 2017

saikrishna321 commented May 29, 2017

tacf commented May 30, 2017

katryo commented May 30, 2017 •

edited

Loading

diemol commented May 31, 2017 •

edited

Loading

SrinivasanTarget commented May 31, 2017

elgalu commented May 31, 2017

SrinivasanTarget commented May 31, 2017

manoj9788 commented Jun 1, 2017

diemol commented Jun 1, 2017

elgalu commented Jun 1, 2017

SrinivasanTarget commented Jun 1, 2017

elgalu commented Jun 1, 2017

vania-pooh commented Jun 1, 2017

vania-pooh commented Jun 1, 2017 •

edited

Loading

SrinivasanTarget commented Jun 1, 2017

manoj9788 commented Jun 2, 2017

vania-pooh commented Jun 2, 2017

manoj9788 commented Jun 2, 2017

vania-pooh commented Jun 2, 2017

diemol commented Jun 2, 2017

diemol commented Jun 8, 2017

felippenardi commented Jun 20, 2017

diemol commented Jun 20, 2017

felippenardi commented Jun 21, 2017

Performance issues #116

Performance issues #116

Comments

ghost commented May 18, 2017

diemol commented May 18, 2017 • edited Loading

woza2000 commented May 23, 2017

SrinivasanTarget commented May 25, 2017

diemol commented May 25, 2017

SrinivasanTarget commented May 25, 2017 • edited Loading

diemol commented May 25, 2017

SrinivasanTarget commented May 26, 2017 • edited Loading

SrinivasanTarget commented May 29, 2017

diemol commented May 29, 2017

saikrishna321 commented May 29, 2017

tacf commented May 30, 2017

katryo commented May 30, 2017 • edited Loading

diemol commented May 31, 2017 • edited Loading

SrinivasanTarget commented May 31, 2017

elgalu commented May 31, 2017

SrinivasanTarget commented May 31, 2017

manoj9788 commented Jun 1, 2017

diemol commented Jun 1, 2017

elgalu commented Jun 1, 2017

SrinivasanTarget commented Jun 1, 2017

elgalu commented Jun 1, 2017

vania-pooh commented Jun 1, 2017

vania-pooh commented Jun 1, 2017 • edited Loading

SrinivasanTarget commented Jun 1, 2017

manoj9788 commented Jun 2, 2017

vania-pooh commented Jun 2, 2017

manoj9788 commented Jun 2, 2017

vania-pooh commented Jun 2, 2017

diemol commented Jun 2, 2017

diemol commented Jun 8, 2017

felippenardi commented Jun 20, 2017

diemol commented Jun 20, 2017

felippenardi commented Jun 21, 2017

diemol commented May 18, 2017 •

edited

Loading

SrinivasanTarget commented May 25, 2017 •

edited

Loading

SrinivasanTarget commented May 26, 2017 •

edited

Loading

katryo commented May 30, 2017 •

edited

Loading

diemol commented May 31, 2017 •

edited

Loading

vania-pooh commented Jun 1, 2017 •

edited

Loading