Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] 1/3 Moving the CI to conda, picking a more modern cuda + pytorch combo #271

Merged
merged 3 commits into from
Apr 21, 2022

Conversation

blefaudeux
Copy link
Contributor

What does this PR do?

Not super pretty, suggestions welcome. The crux of the matter was that circleCI supports images with cuda 11.1 (soon to be deprecated) and 11.4, while pytorch nightlies are built for everything but cuda 11.4..

a (note the a, unicity is not proven) solution is to rely on conda instead, which handles both pytorch and the matching cuda in a pinch. The handling of shells in circleci is a semi-mistery to me, but in short most of the conda mechanics do not work (the shells have no .bashrc for instance), and I changed the scripts to point to the conda installed python instead (after many more elegant tests which did not work)

A Triton unit test segfaults now, but this is with an old version of triton, I think that spending time on that is ill-advised if we can agree on the next-next-PR (triton2) and land all of them instead

Before submitting

  • Did you have fun?
    • Make sure you had fun coding 🙃
  • Did you read the contributor guideline?
  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?
  • Did you update the changelog? (if needed)

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 13, 2022
@blefaudeux blefaudeux marked this pull request as draft April 13, 2022 05:00
@blefaudeux blefaudeux changed the title [WIP] Moving the CI to conda, picking a more modern cuda + pytorch combo [WIP][RFC] Moving the CI to conda, picking a more modern cuda + pytorch combo Apr 13, 2022
@blefaudeux blefaudeux changed the title [WIP][RFC] Moving the CI to conda, picking a more modern cuda + pytorch combo [WIP][RFC] 1/3 Moving the CI to conda, picking a more modern cuda + pytorch combo Apr 13, 2022
@blefaudeux blefaudeux marked this pull request as ready for review April 19, 2022 21:24
@blefaudeux
Copy link
Contributor Author

blefaudeux commented Apr 19, 2022

@fmassa @dianaml0 this goes with the triton2 PR, the plan was to land all three in one go

Copy link
Contributor

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot!

Another approach we could follow in the future would be to use the docker images from PyTorch, which comes with a number of things already packed in

That being said, the PyTorch team will at some point in the future provide a set of helper functions so that dealing with all of this would be simpler, so we will probably just use that when they get available.

@blefaudeux
Copy link
Contributor Author

blefaudeux commented Apr 20, 2022

LGTM, thanks a lot!

Another approach we could follow in the future would be to use the docker images from PyTorch, which comes with a number of things already packed in

That being said, the PyTorch team will at some point in the future provide a set of helper functions so that dealing with all of this would be simpler, so we will probably just use that when they get available.

Oh that would be great for the docker image ! I had a quick look when penning this one and it seemed that we had to setup the actual image hosting on top, so I skipped that for now, but if there's an existing image hosted by pytorch that would be perfect 😍

@blefaudeux blefaudeux changed the title [WIP][RFC] 1/3 Moving the CI to conda, picking a more modern cuda + pytorch combo [RFC] 1/3 Moving the CI to conda, picking a more modern cuda + pytorch combo Apr 20, 2022
Copy link
Contributor

@dianaml0 dianaml0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks! :)

command: |
source $BASH_ENV
cd docs
python3 -m pip install -r requirements.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be $CONDA_PYTHON also?

blefaudeux and others added 2 commits April 20, 2022 23:23
* parent be72b26
author Kashif Rasul <kashif.rasul@gmail.com> 1648069860 +0100
committer Benjamin Lefaudeux <benjamin.lefaudeux@pm.me> 1650256563 -0700

Move to Triton 2

Author:    Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Benjamin Lefaudeux <benjamin.lefaudeux@pm.me>

Tentatively fixing layernorm

- faster all around
- bugfix

better take on sparse tensors, put layout on the correct device
update the pip packages, minor cleanup

* catering for triton blocksparse being probably more reliable in fp16

* faster layernorm

* Minor blocksparse refactoring, update block size restrictions, relax power of two constraint (#277)

* Relax device size restrictions

* Refactor device creation and run all tests

* linting

Co-authored-by: Cole Hawkins <colehawk@amazon.com>

* code review, thanks @fmassa !

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: colepshawkins <31542048+colehawkins@users.noreply.github.com>
Co-authored-by: Cole Hawkins <colehawk@amazon.com>
@codecov-commenter
Copy link

Codecov Report

Merging #271 (4ecbec1) into main (549bd42) will decrease coverage by 0.11%.
The diff coverage is 91.30%.

@@            Coverage Diff             @@
##             main     #271      +/-   ##
==========================================
- Coverage   92.81%   92.69%   -0.12%     
==========================================
  Files          61       61              
  Lines        3368     3397      +29     
==========================================
+ Hits         3126     3149      +23     
- Misses        242      248       +6     
Flag Coverage Δ
Python 92.69% <91.30%> (-0.12%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
xformers/triton/softmax.py 92.00% <ø> (-0.60%) ⬇️
xformers/utils.py 78.43% <ø> (ø)
xformers/components/multi_head_dispatch.py 97.87% <77.77%> (-2.13%) ⬇️
xformers/components/attention/blocksparse.py 92.18% <80.00%> (-2.02%) ⬇️
xformers/triton/layer_norm.py 87.71% <80.00%> (-0.42%) ⬇️
xformers/components/attention/base.py 97.36% <100.00%> (+0.14%) ⬆️
xformers/components/attention/compositional.py 100.00% <100.00%> (ø)
xformers/components/attention/favor.py 100.00% <100.00%> (ø)
xformers/components/attention/fourier_mix.py 100.00% <100.00%> (ø)
xformers/components/attention/global_tokens.py 92.50% <100.00%> (-7.50%) ⬇️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 549bd42...4ecbec1. Read the comment docs.

@blefaudeux blefaudeux merged commit 498e009 into main Apr 21, 2022
@blefaudeux blefaudeux deleted the conda_ci branch April 21, 2022 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants