Better-scheduled parallel tests #18462

weswigham · 2017-09-14T04:35:25Z

@rbuckton mentioned that if he were to rewrite the parallel test runner, he would make it use a work stealing system, rather than partitioning work in advance; to help improve thread utilization near the end of a test run. This is effectively that.

This pulls the import parts of our parallel test script into the test runner itself, and then reworks the parallel runner to use a central queue for distributing work to the worker threads, while also sorting the work by file size (approximating how long the test will take to execute - the largest tests execute first). The improved utilization and load balancing takes the end-to-end time of gulp runtests-parallel --tests=rwc (now that that is functional) down from 13 minutes to 8 minutes on my workstation - I can only assume the effect is even more pronounced with more cores at the disposal of the test runner.

The primary downside to this system is that the tests aren't actually run under mocha, but rather a shim which polyfills the behaviors we use. This is so it immediately executes the tests on discovery (since we run one file at a time on-demand), but the downside is that timeouts aren't currently supported in my shim (so a test can just hang if it is malfunctioning).

sandersn · 2017-09-14T16:12:23Z

Couple of notes without looking at the code yet:

The travis failure is probably due to stale cache of some kind; I had to wipe my node_modules folder to get a build.
runtests-parallel is much slower for our test suite. It goes from about a minute on my machine to over 2 minutes.

Now that you have my old 24-thread machine up and running again, it would be interesting to compare results from it as well.

sandersn · 2017-09-14T16:22:58Z

Also, I don't care about how many tests have passed so far, just how many have failed.

sandersn · 2017-09-14T21:48:07Z

With the message-batching change, the PR is as fast (or maybe a little faster) than the current runtests-parallel for the test suite.

rbuckton

Seems ok to me. My only comment is that we are stuffing a lot into runner.ts, it might make sense to split up the worker and orchestrator pieces into different files to be more easily maintained.

weswigham · 2017-09-14T22:30:18Z

@rbuckton done.

ghost · 2017-09-19T15:53:47Z

This actually makes tests slower on my desktop. From the prior commit (c522f37), it took 2 minutes 24 seconds. With this one (d1c4754), it takes 2 minutes 48 seconds.

ghost · 2017-09-19T16:07:49Z

On my laptop, the time went up from 2min 41s to 3m 10s. My laptop has 8 cores like my desktop but only 8GB of memory while my desktop has 16GB.

weswigham · 2017-09-19T16:09:01Z

@andy-ms How many workers does your machine use? It is possible that the batch ratio is a bit suboptimal for a given thread count (ie, pre-batching too many or too few tests, resulting in idle threads).

ghost · 2017-09-19T16:10:12Z

On both desktop and laptop it says it's using 8 threads / groups.

weswigham · 2017-09-19T16:28:34Z

Hmmm.... I'll make the batch parameter tuneable. The worst case (pre-batching all tests) should approximately equal the old parallel runner in speed (since there will be many idle threads) - and right now it is hardcoded to batch 90% of the available filesize into the first messages. OFC, since filesize isn't a perfect approximator for test time, this isn't perfect... I have some ideas on how to improve it a bit more, though.

weswigham added 2 commits September 13, 2017 19:17

Out with the old...

f4a77ee

Brave new world

68fc644

weswigham requested review from sandersn, rbuckton, mhegazy and a user September 14, 2017 04:35

msftclas added the cla-already-signed label Sep 14, 2017

weswigham added 2 commits September 14, 2017 12:04

Throttle console output

ed4f787

Batches test messages on large inputs initially

f44f5ac

weswigham force-pushed the workstealing-parallel-tests branch from 2b76c74 to f44f5ac Compare September 14, 2017 21:27

rbuckton approved these changes Sep 14, 2017

View reviewed changes

Move parallel runner code into seperate files

c0a8120

weswigham merged commit d1c4754 into master Sep 14, 2017

weswigham deleted the workstealing-parallel-tests branch September 14, 2017 22:42

This was referenced Sep 15, 2017

gulp runtests broken #18505

Closed

gulp runtests-browser broken #18544

Closed

microsoft locked and limited conversation to collaborators Jun 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better-scheduled parallel tests #18462

Better-scheduled parallel tests #18462

weswigham commented Sep 14, 2017 •

edited

Loading

sandersn commented Sep 14, 2017

sandersn commented Sep 14, 2017

sandersn commented Sep 14, 2017

rbuckton left a comment

weswigham commented Sep 14, 2017

ghost commented Sep 19, 2017

ghost commented Sep 19, 2017

weswigham commented Sep 19, 2017

ghost commented Sep 19, 2017

weswigham commented Sep 19, 2017

Better-scheduled parallel tests #18462

Better-scheduled parallel tests #18462

Conversation

weswigham commented Sep 14, 2017 • edited Loading

sandersn commented Sep 14, 2017

sandersn commented Sep 14, 2017

sandersn commented Sep 14, 2017

rbuckton left a comment

Choose a reason for hiding this comment

weswigham commented Sep 14, 2017

ghost commented Sep 19, 2017

ghost commented Sep 19, 2017

weswigham commented Sep 19, 2017

ghost commented Sep 19, 2017

weswigham commented Sep 19, 2017

weswigham commented Sep 14, 2017 •

edited

Loading