Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better-scheduled parallel tests #18462

Merged
merged 5 commits into from
Sep 14, 2017
Merged

Better-scheduled parallel tests #18462

merged 5 commits into from
Sep 14, 2017

Conversation

weswigham
Copy link
Member

@weswigham weswigham commented Sep 14, 2017

@rbuckton mentioned that if he were to rewrite the parallel test runner, he would make it use a work stealing system, rather than partitioning work in advance; to help improve thread utilization near the end of a test run. This is effectively that.

This pulls the import parts of our parallel test script into the test runner itself, and then reworks the parallel runner to use a central queue for distributing work to the worker threads, while also sorting the work by file size (approximating how long the test will take to execute - the largest tests execute first). The improved utilization and load balancing takes the end-to-end time of gulp runtests-parallel --tests=rwc (now that that is functional) down from 13 minutes to 8 minutes on my workstation - I can only assume the effect is even more pronounced with more cores at the disposal of the test runner.

The primary downside to this system is that the tests aren't actually run under mocha, but rather a shim which polyfills the behaviors we use. This is so it immediately executes the tests on discovery (since we run one file at a time on-demand), but the downside is that timeouts aren't currently supported in my shim (so a test can just hang if it is malfunctioning).

@sandersn
Copy link
Member

Couple of notes without looking at the code yet:

  1. The travis failure is probably due to stale cache of some kind; I had to wipe my node_modules folder to get a build.
  2. runtests-parallel is much slower for our test suite. It goes from about a minute on my machine to over 2 minutes.

Now that you have my old 24-thread machine up and running again, it would be interesting to compare results from it as well.

@sandersn
Copy link
Member

Also, I don't care about how many tests have passed so far, just how many have failed.

@sandersn
Copy link
Member

With the message-batching change, the PR is as fast (or maybe a little faster) than the current runtests-parallel for the test suite.

Copy link
Member

@rbuckton rbuckton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ok to me. My only comment is that we are stuffing a lot into runner.ts, it might make sense to split up the worker and orchestrator pieces into different files to be more easily maintained.

@weswigham
Copy link
Member Author

@rbuckton done.

@weswigham weswigham merged commit d1c4754 into master Sep 14, 2017
@weswigham weswigham deleted the workstealing-parallel-tests branch September 14, 2017 22:42
This was referenced Sep 15, 2017
@ghost
Copy link

ghost commented Sep 19, 2017

This actually makes tests slower on my desktop. From the prior commit (c522f37), it took 2 minutes 24 seconds. With this one (d1c4754), it takes 2 minutes 48 seconds.

@ghost
Copy link

ghost commented Sep 19, 2017

On my laptop, the time went up from 2min 41s to 3m 10s. My laptop has 8 cores like my desktop but only 8GB of memory while my desktop has 16GB.

@weswigham
Copy link
Member Author

@andy-ms How many workers does your machine use? It is possible that the batch ratio is a bit suboptimal for a given thread count (ie, pre-batching too many or too few tests, resulting in idle threads).

@ghost
Copy link

ghost commented Sep 19, 2017

On both desktop and laptop it says it's using 8 threads / groups.

@weswigham
Copy link
Member Author

Hmmm.... I'll make the batch parameter tuneable. The worst case (pre-batching all tests) should approximately equal the old parallel runner in speed (since there will be many idle threads) - and right now it is hardcoded to batch 90% of the available filesize into the first messages. OFC, since filesize isn't a perfect approximator for test time, this isn't perfect... I have some ideas on how to improve it a bit more, though.

@microsoft microsoft locked and limited conversation to collaborators Jun 14, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants