Run CI unittests in parallel #3445

pmeier · 2021-02-24T07:20:53Z

Since our CI unittest machines are pretty beefy, we might be able to reduce the wall time significantly by running the tests in parallel.

Note that while this uses the pytest-xdist plugin this not makes our tests dependent on pytest. They can still be run by unittest.

pmeier · 2021-02-24T08:57:51Z

This achieves the following speed-ups

OS	Python	sequential	parallel	speedup
Linux	`3.6`	`9m 24s`	`21m 38s`	`- 57%`
Linux	`3.7`	`9m 37s`	`21m 1s`	`- 54%`
Linux	`3.8`	`9m 49s`	`21m 56s`	`- 55%`
Linux	`3.9`	`10m 1s`	`3m 56s`	`138%`
macOS	`3.6`	`19m 51s`	`9m 10s`	`117%`
macOS	`3.7`	`19m 53s`	`8m 56s`	`123%`
macOS	`3.8`	`21m 31s`	`8m 59s`	`140%`
macOS	`3.9`	`16m 33s`	`7m 43s`	`114%`
Windows	`3.6`	`15m 25s`	`6m 17s`	`145%`
Windows	`3.7`	`17m 37s`	`6m 0s`	`194%`
Windows	`3.8`	`18m 5s`	`5m 59s`	`202%`
Windows	`3.9`	`18m 32s`	`6m 21s`	`192%`

While macOS and Windows test are running much faster with parallel tests, test on Linux for Python3[6-8] are much slower. Since the tests for Linux and Python3.9 are also a lot faster, I suspect this is skipping some tests that slow down the overall execution significantly. I'll investigate.

pmeier · 2021-02-24T09:57:20Z

By fixing --num-processes=2 for Linux we get the same speedup for Python3.[6-8]. I suspect the slowdown for more processes to happen because they run out of memory. I can reproduce something similar locally.

codecov · 2021-02-24T10:04:03Z

Codecov Report

Merging #3445 (bc05f7c) into master (fc33c46) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #3445   +/-   ##
=======================================
  Coverage   76.00%   76.00%           
=======================================
  Files         105      105           
  Lines        9697     9697           
  Branches     1556     1556           
=======================================
  Hits         7370     7370           
  Misses       1841     1841           
  Partials      486      486

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc33c46...6c01b2d. Read the comment docs.

pmeier · 2021-02-24T11:33:40Z

I suspect the slowdown for more processes to happen because they run out of memory.

That was not the problem. The slowdown occurred only in test that made use of torch.jit. --num-processes=auto tells pytest-xdist to spawn N threads to run the tests, where N is the number of CPU cores. Internally, torch.jit also spawns N threads so we end up with N ** 2 threads.

To fix this we can simply set OMP_NUM_THREADS=1 which limits the internally spawned threads. With this the speedup table now looks like this:

OS	Python	sequential	parallel	speedup
Linux	`3.6`	`9m 24s`	`2m 7s`	`344%`
Linux	`3.7`	`9m 37s`	`2m 13s`	`334%`
Linux	`3.8`	`9m 49s`	`1m 30s`	`554%`
Linux	`3.9`	`10m 1s`	`1m 28s`	`583%`
macOS	`3.6`	`19m 51s`	`9m 13s`	`115%`
macOS	`3.7`	`19m 53s`	`8m 45s`	`127%`
macOS	`3.8`	`21m 31s`	`7m 49s`	`175%`
macOS	`3.9`	`16m 33s`	`9m 15s`	`79%`
Windows	`3.6`	`15m 25s`	`3m 51s`	`300%`
Windows	`3.7`	`17m 37s`	`3m 26s`	`413%`
Windows	`3.8`	`18m 5s`	`3m 45s`	`382%`
Windows	`3.9`	`18m 32s`	`3m 42s`	`401%`

fmassa

Changes look great to me, and the CI speedups are amazing!

I'd love to get @seemethere eyes on this as well in case I'm missing something

mthrok · 2021-02-24T17:27:15Z

I used to do this in torchaudio, but removed it. The reason is that when one of the process behave abnormally (segfault / hanging etc) there was no log that I could look at from the browser, and I had to first disable the xdist to debug what was going on, and that was more time consuming. So, be prepared if you proceed with this.

NicolasHug · 2021-02-24T17:43:38Z

The thread over-subcription issue observed in #3445 (comment) is quite typical and it's likely to happen in other places. FWIW, we use xdist in scikit-learn but issues come up once in a while and we do have to come up with some work-arounds like
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/conftest.py#L114

pmeier · 2021-02-24T18:39:08Z

@mthrok

The reason is that when one of the process behave abnormally (segfault / hanging etc) there was no log that I could look at from the browser, and I had to first disable the xdist to debug what was going on, and that was more time consuming.

Can you find an old CI run where this came up? I currently can't really picture the problem.

mthrok · 2021-02-24T19:13:58Z

@mthrok

The reason is that when one of the process behave abnormally (segfault / hanging etc) there was no log that I could look at from the browser, and I had to first disable the xdist to debug what was going on, and that was more time consuming.

Can you find an old CI run where this came up? I currently can't really picture the problem.

It's been months so I cannot find the log, but here is the question, while tests are being executed, can you see what tests are currently being executed? I see that in the CI logs of this PR, we can see the which test has passed/skipped/failed, but the question is when the being ran.

Even if the log is updated as the test runner moves on, if it is only showing the completed tests, then you do not know which test is being executed now. In this case, if any of the test hangs, you do not see which test hangs in the log. In such case, eventually CI system will timeout and kill these jobs but the resulting log won't show which test caused the timeout, and one needs to debug it and he/she has to start from where disabling xdist.

If the log shows which test exhibited abnormal behavior/termination, that's good, but if not, it will be hard for other maintainers to look into the cause.

seemethere · 2021-02-24T19:24:49Z

We may need to increase the no-output-timeout for conda builds since the conda dependency resolver is extremely slow

pmeier · 2021-02-25T06:15:33Z

@mthrok There is pytest-timeout that might help here. It simply terminates a thread and fails the test after a given time. If we set this to a lower value than our CI timeout, we should be fine.

See this for a sample output with tests that timeout. CircleCI also seems to recognize pytest and shows all failing tests in a separate tab.

@fmassa Given the valid concerns of @mthrok I would additionally add pytest-timeout. @seemethere how many seconds will the CI currently timeout?

datumbox

LGTM, awesome changes @pmeier.

Only a couple of notes from me for future reference:

Running tests in parallel will cause them to execute in unpredictable order. Given we don't set the seed at the beginning of every test/class, we might see some flakiness on the future. We've observed similar issues in the past, but I don't think that's a reason not to merge this.
Follow up PRs might be worth adding more control on which classes/tests should be parallelized and which should not.

rgommers · 2021-02-26T16:55:53Z

We may need to increase the no-output-timeout for conda builds since the conda dependency resolver is extremely slow

You may consider switching to mamba if it's getting to the "this takes minutes" level? Can be many times faster in the resolve phase.

pmeier · 2021-03-01T09:46:32Z

@datumbox

Running tests in parallel will cause them to execute in unpredictable order. Given we don't set the seed at the beginning of every test/class, we might see some flakiness on the future.

So you are saying that some tests are not independent of the others? If that is the case we should fix this ASAP.

Since you mentioned seeding, do we have tests that rely on a specific random seed? If so, doesn't this mean that either our method of testing is not adequate or our code actually contains bugs that happen for some inputs?

Follow up PRs might be worth adding more control on which classes/tests should be parallelized and which should not.

What would be a reason to run a test not in parallel? The only thing I can think of are GPU tests that overflow the GPU memory. This is why I spared the parallelization for GPU tests for now. Other than that I can't think of another reason.

pmeier · 2021-03-01T10:00:56Z

To follow up on what @rgommers said:

Mamba is a reimplementation of the conda package manager in C++.

parallel downloading of repository data and package files using multi-threading

libsolv for much faster dependency solving, a state of the art library used in the RPM package manager of Red Hat, Fedora and OpenSUSE

core parts of mamba are implemented in C++ for maximum efficiency

At the same time, mamba utilize the same command line parser, package installation and deinstallation code and transaction verification routines as conda to stay as compatible as possible.

I'll work on that when this is merged.

fmassa

Let's give this a try

pmeier · 2021-03-01T13:09:17Z

There is one failing test. This is most likely due to to tight tolerances. I'll fix this in a follow-up PR. @datumbox is this one of the flaky tests you mentioned before?

fmassa · 2021-03-01T14:34:02Z

Reverting this as some tests are now broken

This reverts commit 4fcaee0.

datumbox · 2021-03-01T18:06:18Z

So you are saying that some tests are not independent of the others? If that is the case we should fix this ASAP.

The tests are in principle independent but some of them exhibit some flakiness and it can happen that running the tests in different order can make us hit a "bad seed". See this old example: #3032 (comment)

What would be a reason to run a test not in parallel? The only thing I can think of are GPU tests that overflow the GPU memory. This is why I spared the parallelization for GPU tests for now. Other than that I can't think of another reason.

Yes that's what I had in mind as well. Parallelizing GPU tests for models should probably be avoided to avoid memory issues. On the other hand, running GPU tests related to transformations in parallel should be OK. Hence having some control over what's parallelized would be useful.

Summary: * enable parallel tests * disable parallelism for GPU tests * [test] limit maximum processes on linux * [debug] limit max processes even further * [test] use subprocesses over threads * [test] limit intra-op threads * only limit intra op threads for CPU tests * [poc] use low timeout for showcasing * [poc] fix syntax * set timeout to 5 minutes * fix timeout on windows Reviewed By: fmassa Differential Revision: D26756257 fbshipit-source-id: f2fc4753a67a1505f01116119926eec365693ab9 Co-authored-by: Francisco Massa <fvsmassa@gmail.com>

Summary: This reverts commit 4fcaee0. Reviewed By: fmassa Differential Revision: D26756268 fbshipit-source-id: ff1c180d64dede17412787e9edd4fc525c4aecb9

* enable parallel tests * disable parallelism for GPU tests * [test] limit maximum processes on linux * [debug] limit max processes even further * [test] use subprocesses over threads * [test] limit intra-op threads * only limit intra op threads for CPU tests * [poc] use low timeout for showcasing * [poc] fix syntax * set timeout to 5 minutes * fix timeout on windows Co-authored-by: Francisco Massa <fvsmassa@gmail.com>

enable parallel tests

51df4aa

facebook-github-bot added the cla signed label Feb 24, 2021

pmeier added 2 commits February 24, 2021 08:52

disable parallelism for GPU tests

15d60c6

[test] limit maximum processes on linux

b0670f1

[debug] limit max processes even further

bc05f7c

pmeier added 3 commits February 24, 2021 11:04

[test] use subprocesses over threads

754f3c0

[test] limit intra-op threads

6c01b2d

only limit intra op threads for CPU tests

654b23f

pmeier requested review from datumbox and fmassa February 24, 2021 11:35

pmeier changed the title ~~[DO NOT MERGE] Run tests in parallel~~ Run CI unittests in parallel Feb 24, 2021

fmassa reviewed Feb 24, 2021

View reviewed changes

seemethere approved these changes Feb 24, 2021

View reviewed changes

[poc] use low timeout for showcasing

641051f

[poc] fix syntax

6c6b478

datumbox approved these changes Feb 26, 2021

View reviewed changes

pmeier requested a review from fmassa March 1, 2021 10:01

fmassa approved these changes Mar 1, 2021

View reviewed changes

fmassa and others added 5 commits March 1, 2021 11:39

Merge branch 'master' into pytest-xdist

1bab2a6

set timeout to 5 minutes

677c447

Merge branch 'master' into pytest-xdist

7eac5e5

Merge remote-tracking branch 'pmeier/pytest-xdist' into pytest-xdist

e2803e8

fix timeout on windows

70926d2

Merge branch 'master' into pytest-xdist

25b118d

fmassa merged commit 4fcaee0 into pytorch:master Mar 1, 2021

pmeier deleted the pytest-xdist branch March 1, 2021 13:30

fmassa added a commit that referenced this pull request Mar 1, 2021

Revert "Run CI unittests in parallel (#3445)"

15ea21d

This reverts commit 4fcaee0.

fmassa mentioned this pull request Mar 1, 2021

Revert "Run CI unittests in parallel" #3480

Merged

fmassa added a commit that referenced this pull request Mar 1, 2021

Revert "Run CI unittests in parallel (#3445)" (#3480)

47d76bb

This reverts commit 4fcaee0.

pmeier mentioned this pull request Mar 1, 2021

Fix lazy importing for dataset tests #3481

Merged

NicolasHug assigned NicolasHug and unassigned NicolasHug Mar 9, 2021

NicolasHug added improvement module: ci revert(ed) For reverted PRs, and PRs that revert other PRs module: tests labels Mar 9, 2021

pmeier mentioned this pull request Mar 30, 2021

Run CI unittests in parallel #3616

Draft

datumbox added enhancement and removed improvement labels Jun 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run CI unittests in parallel #3445

Run CI unittests in parallel #3445

pmeier commented Feb 24, 2021 •

edited

Loading

pmeier commented Feb 24, 2021

pmeier commented Feb 24, 2021

codecov bot commented Feb 24, 2021 •

edited

Loading

pmeier commented Feb 24, 2021

fmassa left a comment

mthrok commented Feb 24, 2021

NicolasHug commented Feb 24, 2021

pmeier commented Feb 24, 2021

mthrok commented Feb 24, 2021 •

edited

Loading

seemethere commented Feb 24, 2021

pmeier commented Feb 25, 2021 •

edited

Loading

datumbox left a comment

rgommers commented Feb 26, 2021

pmeier commented Mar 1, 2021

pmeier commented Mar 1, 2021

fmassa left a comment

pmeier commented Mar 1, 2021

fmassa commented Mar 1, 2021

datumbox commented Mar 1, 2021

Run CI unittests in parallel #3445

Run CI unittests in parallel #3445

Conversation

pmeier commented Feb 24, 2021 • edited Loading

pmeier commented Feb 24, 2021

pmeier commented Feb 24, 2021

codecov bot commented Feb 24, 2021 • edited Loading

Codecov Report

pmeier commented Feb 24, 2021

fmassa left a comment

Choose a reason for hiding this comment

mthrok commented Feb 24, 2021

NicolasHug commented Feb 24, 2021

pmeier commented Feb 24, 2021

mthrok commented Feb 24, 2021 • edited Loading

seemethere commented Feb 24, 2021

pmeier commented Feb 25, 2021 • edited Loading

datumbox left a comment

Choose a reason for hiding this comment

rgommers commented Feb 26, 2021

pmeier commented Mar 1, 2021

pmeier commented Mar 1, 2021

fmassa left a comment

Choose a reason for hiding this comment

pmeier commented Mar 1, 2021

fmassa commented Mar 1, 2021

datumbox commented Mar 1, 2021

pmeier commented Feb 24, 2021 •

edited

Loading

codecov bot commented Feb 24, 2021 •

edited

Loading

mthrok commented Feb 24, 2021 •

edited

Loading

pmeier commented Feb 25, 2021 •

edited

Loading