Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run CI unittests in parallel #3445

Merged
merged 15 commits into from
Mar 1, 2021
Merged

Run CI unittests in parallel #3445

merged 15 commits into from
Mar 1, 2021

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented Feb 24, 2021

Since our CI unittest machines are pretty beefy, we might be able to reduce the wall time significantly by running the tests in parallel.

Note that while this uses the pytest-xdist plugin this not makes our tests dependent on pytest. They can still be run by unittest.

@pmeier
Copy link
Collaborator Author

pmeier commented Feb 24, 2021

This achieves the following speed-ups

OS Python sequential parallel speedup
Linux 3.6 9m 24s 21m 38s - 57%
Linux 3.7 9m 37s 21m 1s - 54%
Linux 3.8 9m 49s 21m 56s - 55%
Linux 3.9 10m 1s 3m 56s 138%
macOS 3.6 19m 51s 9m 10s 117%
macOS 3.7 19m 53s 8m 56s 123%
macOS 3.8 21m 31s 8m 59s 140%
macOS 3.9 16m 33s 7m 43s 114%
Windows 3.6 15m 25s 6m 17s 145%
Windows 3.7 17m 37s 6m 0s 194%
Windows 3.8 18m 5s 5m 59s 202%
Windows 3.9 18m 32s 6m 21s 192%

While macOS and Windows test are running much faster with parallel tests, test on Linux for Python3[6-8] are much slower. Since the tests for Linux and Python3.9 are also a lot faster, I suspect this is skipping some tests that slow down the overall execution significantly. I'll investigate.

@pmeier
Copy link
Collaborator Author

pmeier commented Feb 24, 2021

By fixing --num-processes=2 for Linux we get the same speedup for Python3.[6-8]. I suspect the slowdown for more processes to happen because they run out of memory. I can reproduce something similar locally.

@codecov
Copy link

codecov bot commented Feb 24, 2021

Codecov Report

Merging #3445 (bc05f7c) into master (fc33c46) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3445   +/-   ##
=======================================
  Coverage   76.00%   76.00%           
=======================================
  Files         105      105           
  Lines        9697     9697           
  Branches     1556     1556           
=======================================
  Hits         7370     7370           
  Misses       1841     1841           
  Partials      486      486           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc33c46...6c01b2d. Read the comment docs.

@pmeier
Copy link
Collaborator Author

pmeier commented Feb 24, 2021

I suspect the slowdown for more processes to happen because they run out of memory.

That was not the problem. The slowdown occurred only in test that made use of torch.jit. --num-processes=auto tells pytest-xdist to spawn N threads to run the tests, where N is the number of CPU cores. Internally, torch.jit also spawns N threads so we end up with N ** 2 threads.

To fix this we can simply set OMP_NUM_THREADS=1 which limits the internally spawned threads. With this the speedup table now looks like this:

OS Python sequential parallel speedup
Linux 3.6 9m 24s 2m 7s 344%
Linux 3.7 9m 37s 2m 13s 334%
Linux 3.8 9m 49s 1m 30s 554%
Linux 3.9 10m 1s 1m 28s 583%
macOS 3.6 19m 51s 9m 13s 115%
macOS 3.7 19m 53s 8m 45s 127%
macOS 3.8 21m 31s 7m 49s 175%
macOS 3.9 16m 33s 9m 15s 79%
Windows 3.6 15m 25s 3m 51s 300%
Windows 3.7 17m 37s 3m 26s 413%
Windows 3.8 18m 5s 3m 45s 382%
Windows 3.9 18m 32s 3m 42s 401%

@pmeier pmeier requested review from datumbox and fmassa February 24, 2021 11:35
@pmeier pmeier changed the title [DO NOT MERGE] Run tests in parallel Run CI unittests in parallel Feb 24, 2021
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look great to me, and the CI speedups are amazing!

I'd love to get @seemethere eyes on this as well in case I'm missing something

@mthrok
Copy link
Contributor

mthrok commented Feb 24, 2021

I used to do this in torchaudio, but removed it. The reason is that when one of the process behave abnormally (segfault / hanging etc) there was no log that I could look at from the browser, and I had to first disable the xdist to debug what was going on, and that was more time consuming. So, be prepared if you proceed with this.

@NicolasHug
Copy link
Member

The thread over-subcription issue observed in #3445 (comment) is quite typical and it's likely to happen in other places. FWIW, we use xdist in scikit-learn but issues come up once in a while and we do have to come up with some work-arounds like
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/conftest.py#L114

@pmeier
Copy link
Collaborator Author

pmeier commented Feb 24, 2021

@mthrok

The reason is that when one of the process behave abnormally (segfault / hanging etc) there was no log that I could look at from the browser, and I had to first disable the xdist to debug what was going on, and that was more time consuming.

Can you find an old CI run where this came up? I currently can't really picture the problem.

@mthrok
Copy link
Contributor

mthrok commented Feb 24, 2021

@mthrok

The reason is that when one of the process behave abnormally (segfault / hanging etc) there was no log that I could look at from the browser, and I had to first disable the xdist to debug what was going on, and that was more time consuming.

Can you find an old CI run where this came up? I currently can't really picture the problem.

It's been months so I cannot find the log, but here is the question, while tests are being executed, can you see what tests are currently being executed? I see that in the CI logs of this PR, we can see the which test has passed/skipped/failed, but the question is when the being ran.

Even if the log is updated as the test runner moves on, if it is only showing the completed tests, then you do not know which test is being executed now. In this case, if any of the test hangs, you do not see which test hangs in the log. In such case, eventually CI system will timeout and kill these jobs but the resulting log won't show which test caused the timeout, and one needs to debug it and he/she has to start from where disabling xdist.

If the log shows which test exhibited abnormal behavior/termination, that's good, but if not, it will be hard for other maintainers to look into the cause.

@seemethere
Copy link
Member

We may need to increase the no-output-timeout for conda builds since the conda dependency resolver is extremely slow

@pmeier
Copy link
Collaborator Author

pmeier commented Feb 25, 2021

@mthrok There is pytest-timeout that might help here. It simply terminates a thread and fails the test after a given time. If we set this to a lower value than our CI timeout, we should be fine.

See this for a sample output with tests that timeout. CircleCI also seems to recognize pytest and shows all failing tests in a separate tab.

@fmassa Given the valid concerns of @mthrok I would additionally add pytest-timeout. @seemethere how many seconds will the CI currently timeout?

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, awesome changes @pmeier.

Only a couple of notes from me for future reference:

  • Running tests in parallel will cause them to execute in unpredictable order. Given we don't set the seed at the beginning of every test/class, we might see some flakiness on the future. We've observed similar issues in the past, but I don't think that's a reason not to merge this.
  • Follow up PRs might be worth adding more control on which classes/tests should be parallelized and which should not.

@rgommers
Copy link

We may need to increase the no-output-timeout for conda builds since the conda dependency resolver is extremely slow

You may consider switching to mamba if it's getting to the "this takes minutes" level? Can be many times faster in the resolve phase.

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 1, 2021

@datumbox

Running tests in parallel will cause them to execute in unpredictable order. Given we don't set the seed at the beginning of every test/class, we might see some flakiness on the future.

So you are saying that some tests are not independent of the others? If that is the case we should fix this ASAP.

Since you mentioned seeding, do we have tests that rely on a specific random seed? If so, doesn't this mean that either our method of testing is not adequate or our code actually contains bugs that happen for some inputs?

Follow up PRs might be worth adding more control on which classes/tests should be parallelized and which should not.

What would be a reason to run a test not in parallel? The only thing I can think of are GPU tests that overflow the GPU memory. This is why I spared the parallelization for GPU tests for now. Other than that I can't think of another reason.

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 1, 2021

To follow up on what @rgommers said:

Mamba is a reimplementation of the conda package manager in C++.

  • parallel downloading of repository data and package files using multi-threading
  • libsolv for much faster dependency solving, a state of the art library used in the RPM package manager of Red Hat, Fedora and OpenSUSE
  • core parts of mamba are implemented in C++ for maximum efficiency

At the same time, mamba utilize the same command line parser, package installation and deinstallation code and transaction verification routines as conda to stay as compatible as possible.

I'll work on that when this is merged.

@pmeier pmeier requested a review from fmassa March 1, 2021 10:01
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's give this a try

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 1, 2021

There is one failing test. This is most likely due to to tight tolerances. I'll fix this in a follow-up PR. @datumbox is this one of the flaky tests you mentioned before?

@fmassa fmassa merged commit 4fcaee0 into pytorch:master Mar 1, 2021
@pmeier pmeier deleted the pytest-xdist branch March 1, 2021 13:30
@fmassa
Copy link
Member

fmassa commented Mar 1, 2021

Reverting this as some tests are now broken

fmassa added a commit that referenced this pull request Mar 1, 2021
fmassa added a commit that referenced this pull request Mar 1, 2021
@datumbox
Copy link
Contributor

datumbox commented Mar 1, 2021

So you are saying that some tests are not independent of the others? If that is the case we should fix this ASAP.

The tests are in principle independent but some of them exhibit some flakiness and it can happen that running the tests in different order can make us hit a "bad seed". See this old example: #3032 (comment)

What would be a reason to run a test not in parallel? The only thing I can think of are GPU tests that overflow the GPU memory. This is why I spared the parallelization for GPU tests for now. Other than that I can't think of another reason.

Yes that's what I had in mind as well. Parallelizing GPU tests for models should probably be avoided to avoid memory issues. On the other hand, running GPU tests related to transformations in parallel should be OK. Hence having some control over what's parallelized would be useful.

facebook-github-bot pushed a commit that referenced this pull request Mar 4, 2021
Summary:
* enable parallel tests

* disable parallelism for GPU tests

* [test] limit maximum processes on linux

* [debug] limit max processes even further

* [test] use subprocesses over threads

* [test] limit intra-op threads

* only limit intra op threads for CPU tests

* [poc] use low timeout for showcasing

* [poc] fix syntax

* set timeout to 5 minutes

* fix timeout on windows

Reviewed By: fmassa

Differential Revision: D26756257

fbshipit-source-id: f2fc4753a67a1505f01116119926eec365693ab9

Co-authored-by: Francisco Massa <fvsmassa@gmail.com>
facebook-github-bot pushed a commit that referenced this pull request Mar 4, 2021
Summary: This reverts commit 4fcaee0.

Reviewed By: fmassa

Differential Revision: D26756268

fbshipit-source-id: ff1c180d64dede17412787e9edd4fc525c4aecb9
@NicolasHug NicolasHug assigned NicolasHug and unassigned NicolasHug Mar 9, 2021
@NicolasHug NicolasHug added improvement module: ci revert(ed) For reverted PRs, and PRs that revert other PRs module: tests labels Mar 9, 2021
pmeier added a commit to pmeier/vision that referenced this pull request Mar 30, 2021
* enable parallel tests

* disable parallelism for GPU tests

* [test] limit maximum processes on linux

* [debug] limit max processes even further

* [test] use subprocesses over threads

* [test] limit intra-op threads

* only limit intra op threads for CPU tests

* [poc] use low timeout for showcasing

* [poc] fix syntax

* set timeout to 5 minutes

* fix timeout on windows

Co-authored-by: Francisco Massa <fvsmassa@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed enhancement module: ci module: tests revert(ed) For reverted PRs, and PRs that revert other PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants