Skip to content

Commit

Permalink
[RFC] 1/3 Moving the CI to conda, picking a more modern cuda + pytorc…
Browse files Browse the repository at this point in the history
…h combo (#271)

* testing using conda to get the pytorch nightlies and matching cuda

* [fix] Making it explicit whether the attention mechanism supports an attention mask or not (#266)

check the assert

* [backend] 3/3 Triton 2 update (#272)

* parent be72b26
author Kashif Rasul <kashif.rasul@gmail.com> 1648069860 +0100
committer Benjamin Lefaudeux <benjamin.lefaudeux@pm.me> 1650256563 -0700

Move to Triton 2

Author:    Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Benjamin Lefaudeux <benjamin.lefaudeux@pm.me>

Tentatively fixing layernorm

- faster all around
- bugfix

better take on sparse tensors, put layout on the correct device
update the pip packages, minor cleanup

* catering for triton blocksparse being probably more reliable in fp16

* faster layernorm

* Minor blocksparse refactoring, update block size restrictions, relax power of two constraint (#277)

* Relax device size restrictions

* Refactor device creation and run all tests

* linting

Co-authored-by: Cole Hawkins <colehawk@amazon.com>

* code review, thanks @fmassa !

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: colepshawkins <31542048+colehawkins@users.noreply.github.com>
Co-authored-by: Cole Hawkins <colehawk@amazon.com>

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: colepshawkins <31542048+colehawkins@users.noreply.github.com>
Co-authored-by: Cole Hawkins <colehawk@amazon.com>
  • Loading branch information
4 people authored Apr 21, 2022
1 parent 549bd42 commit 498e009
Show file tree
Hide file tree
Showing 85 changed files with 650 additions and 579 deletions.
180 changes: 106 additions & 74 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@ cpu_py38: &cpu_py38
- image: cimg/python:3.8
resource_class: large

gpu_cu111: &gpu_cu111
gpu_cu114: &gpu_cu114
environment:
CUDA_VERSION: "11.1"
CUDA_HOME: /usr/local/cuda-11.1
CUDA_VERSION: "11.4"
CUDA_HOME: /usr/local/cuda-11.4
machine:
image: ubuntu-1604-cuda-11.1:202012-01
image: ubuntu-2004-cuda-11.4:202110-01
resource_class: gpu.nvidia.medium


Expand Down Expand Up @@ -51,81 +51,105 @@ binary_common: &binary_common
# -------------------------------------------------------------------------------------
# Re-usable commands
# -------------------------------------------------------------------------------------
setup_venv: &setup_venv
setup_conda: &setup_conda
- run:
name: Setup Virtual Env
name: Setup Conda
working_directory: ~/
command: |
python -m venv ~/venv
echo ". ~/venv/bin/activate" >> $BASH_ENV
. ~/venv/bin/activate
python --version
which python
which pip
pip install --upgrade pip
cd /home/circleci
echo 'export MINICONDA=$HOME/miniconda' >> $BASH_ENV
echo 'export PATH="$MINICONDA/bin:$PATH"' >> $BASH_ENV
echo 'export CONDA_PYTHON=/home/circleci/venv/bin/python' >> $BASH_ENV
source $BASH_ENV
# check if we have restored venv cache (/home/circleci/venv) correctly, if so, just skip
if [ -f /home/circleci/venv/check_version.py ]; then python3 /home/circleci/venv/check_version.py torch gt 1.12 && exit 0; fi
hash -r
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
bash miniconda.sh -b -f -p $MINICONDA
conda config --set always_yes yes
conda update conda
conda info -a
conda create -p /home/circleci/venv python=3.8.0 pip # pip is required here, else the system pip will be used
install_dep: &install_dep
- run:
name: Install Dependencies with torch nightly
command: |
source $BASH_ENV
# check if we have restored venv cache (/home/circleci/venv) correctly, if so, just skip
if [ -f /home/circleci/venv/check_version.py ]; then python /home/circleci/venv/check_version.py torch gt 1.11 && exit 0; fi
if [ -f /home/circleci/venv/check_version.py ]; then $CONDA_PYTHON /home/circleci/venv/check_version.py torch gt 1.12 && exit 0; fi
# start installing
pip install --progress-bar off --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cu111/torch_nightly.html
pip install --progress-bar off -r requirements-benchmark.txt
pip install pytorch-lightning
source activate /home/circleci/venv
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch-nightly -q
$CONDA_PYTHON -m pip install -r requirements-benchmark.txt --progress-bar off
# Mark install as complete
touch /home/circleci/miniconda/.finished
install_dep_exp: &install_dep_exp
- run:
name: Install Dependencies for experimental tests
command: |
source $BASH_ENV
# check if we have restored venv cache (/home/circleci/venv) correctly, if so, just skip
if [ -f /home/circleci/venv/check_version.py ]; then python /home/circleci/venv/check_version.py torch gt 1.11 && exit 0; fi
if [ -f /home/circleci/venv/check_version.py ]; then $CONDA_PYTHON /home/circleci/venv/check_version.py torch gt 1.12 && exit 0; fi
# start installing
pip install --progress-bar off --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cu111/torch_nightly.html
cd experimental
pip install --progress-bar off -r requirements.txt
pip install pytest
source activate /home/circleci/venv
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch-nightly -q
$CONDA_PYTHON -m pip install -r experimental/requirements.txt --progress-bar off
install_repo: &install_repo
- run:
name: Install Repository
command: |
python3 -m pip install -e .
$CONDA_PYTHON -m pip install -e .
# Test import.
python -c 'import sys; sys.path = sys.path[1:]; import xformers'
$CONDA_PYTHON -c 'import sys; sys.path = sys.path[1:]; import xformers'
install_experimental_repo: &install_experimental_repo
- run:
name: Install Repository
command: |
source $BASH_ENV
cd experimental
python3 -m pip install -e .
$CONDA_PYTHON -m pip install -e .
run_isort: &run_isort
- run:
name: Run Linter (isort)
command: |
isort . --check --profile black
name: Run Linter (isort)
command: |
source $BASH_ENV
$CONDA_PYTHON -m isort . --check --profile black
run_black: &run_black
- run:
name: Run Linter (black)
command: |
black --check .
name: Run Linter (black)
command: |
source $BASH_ENV
$CONDA_PYTHON -m black --check .
run_mypy: &run_mypy
- run:
name: Run type-checking (mypy)
command: |
mypy --ignore-missing-imports --scripts-are-modules --pretty --exclude build/ --exclude stubs/ .
source $BASH_ENV
$CONDA_PYTHON -m mypy --ignore-missing-imports --scripts-are-modules --pretty --exclude build/ --exclude stubs/ .
run_flake8: &run_flake8
- run:
name: Run Linter (flake8)
command: |
flake8 --config .flake8 --show-source --statistics
source $BASH_ENV
$CONDA_PYTHON -m flake8 --config .flake8 --show-source --statistics
run_clang_format: &run_clang_format
- run:
Expand All @@ -142,51 +166,58 @@ run_coverage: &run_coverage
- run:
name: Run Unit Tests With Coverage
command: |
pytest --junitxml=test-results/junit.xml --verbose --timeout 600 --cov-report=xml --cov=./ tests
source $BASH_ENV
$CONDA_PYTHON -m pytest --junitxml=test-results/junit.xml --verbose --timeout 600 --cov-report=xml --cov=./ tests
#Uploading test coverage for Python code
bash <(curl -s https://codecov.io/bash) -f coverage.xml -cF Python
run_unittests: &run_unittests
- run:
name: Run Unit Tests
command: |
pytest --junitxml=test-results/junit.xml --verbose --timeout 600 tests
source $BASH_ENV
$CONDA_PYTHON -m pytest --junitxml=test-results/junit.xml --verbose --timeout 600 tests
run_experimental_unittests: &run_experimental_unittests
- run:
name: Run Unit Tests
command: |
pytest experimental/tests
source $BASH_ENV
$CONDA_PYTHON -m pytest experimental/tests
run_benchmarks: &run_benchmarks
- run:
name: Run Benchmarks
command: |
CUDA_LAUNCH_BLOCKING=1 python3 xformers/benchmarks/benchmark_encoder.py --activations gelu --plot -emb 128 -bs 16 -heads 4
source $BASH_ENV
$CONDA_PYTHON xformers/benchmarks/benchmark_encoder.py --activations gelu --plot -emb 128 -bs 16 -heads 4
run_pytorch_benchmark: &run_pytorch_benchmark
- run:
name: Run Pytorch benchmark
command: |
python3 xformers/benchmarks/benchmark_pytorch_transformer.py
source $BASH_ENV
$CONDA_PYTHON xformers/benchmarks/benchmark_pytorch_transformer.py
run_vit_benchmark: &run_vit_benchmark
- run:
name: Run ViT Timm benchmark
command: |
python3 xformers/benchmarks/benchmark_vit_timm.py
python3 xformers/benchmarks/benchmark_vit_timm.py --timm
source $BASH_ENV
$CONDA_PYTHON xformers/benchmarks/benchmark_vit_timm.py
$CONDA_PYTHON xformers/benchmarks/benchmark_vit_timm.py --timm
run_doc_build: &run_doc_build
- run:
name: Testing doc build
command: |
cd docs
pip install --progress-bar off -r requirements.txt
make help
make singlehtml | tee make.out
! tail make.out | grep -q warning
name: Testing doc build
command: |
source $BASH_ENV
cd docs
python3 -m pip install -r requirements.txt
make help
make singlehtml | tee make.out
! tail make.out | grep -q warning
commands:
setup_pyenv:
Expand All @@ -200,7 +231,7 @@ commands:
git clone -b master https://github.com/pyenv/pyenv-update.git $(pyenv root)/plugins/pyenv-update
cd $(pyenv root); git checkout master; cd /home/circleci
pyenv update
# figure out the latest python version given a subversion, like 3.8
# figure out the latest python3version given a subversion, like 3.8
LATEST_PY_VERSION=$(pyenv install --list | sed 's/^ //' | grep -E '^[0-9].[0-9].[0-9]' | grep <<parameters.version>> | tail -1)
pyenv install -f $LATEST_PY_VERSION
pyenv global $LATEST_PY_VERSION
Expand All @@ -216,10 +247,13 @@ commands:
- run:
name: Check the installed PyTorch version
command: |
python -c 'import torch; print("Torch version:", torch.__version__)'
python -c 'import torch; assert torch.__version__ > ( <<parameters.major>>, <<parameters.minor>>), "wrong torch version"'
python -m torch.utils.collect_env
wget -O /home/circleci/venv/check_version.py https://raw.githubusercontent.com/min-xu-ai/check_verion/main/check_version.py
source $BASH_ENV
which python
$CONDA_PYTHON -c 'import torch; print("Torch version:", torch.__version__)'
$CONDA_PYTHON -c 'import torch; assert torch.__version__ > ( <<parameters.major>>, <<parameters.minor>>), "wrong torch version"'
$CONDA_PYTHON -m torch.utils.collect_env
wget -O ~/venv/check_version.py https://raw.githubusercontent.com/min-xu-ai/check_verion/main/check_version.py
# -------------------------------------------------------------------------------------
Expand All @@ -235,13 +269,13 @@ jobs:
steps:
- checkout

- <<: *setup_venv

# Cache the venv directory that contains dependencies
- restore_cache:
keys:
- cache-key-cpu-py38-{{ checksum "requirements-test.txt" }}-{{ checksum ".circleci/config.yml" }}

- <<: *setup_conda

- <<: *install_dep

- check_torch:
Expand All @@ -250,7 +284,9 @@ jobs:

- save_cache:
paths:
- ~/miniconda
- ~/venv

key: cache-key-cpu-py38-{{ checksum "requirements-test.txt" }}-{{ checksum ".circleci/config.yml" }}

- <<: *install_repo
Expand All @@ -268,7 +304,7 @@ jobs:


gpu_tests:
<<: *gpu_cu111
<<: *gpu_cu114

working_directory: ~/xformers

Expand All @@ -277,16 +313,12 @@ jobs:

- run: nvidia-smi

- setup_pyenv:
version: 3.9.4

- <<: *setup_venv

# Cache the venv directory that contains dependencies
- restore_cache:
keys:
- cache-key-gpu-111-{{ checksum "requirements-test.txt" }}-{{ checksum ".circleci/config.yml" }}
- cache-key-gpu-114-{{ checksum "requirements-test.txt" }}-{{ checksum ".circleci/config.yml" }}

- <<: *setup_conda
- <<: *install_dep

- check_torch:
Expand All @@ -295,8 +327,10 @@ jobs:

- save_cache:
paths:
- ~/miniconda
- ~/venv
key: cache-key-gpu-111-{{ checksum "requirements-test.txt"}}-{{ checksum ".circleci/config.yml"}}

key: cache-key-gpu-114-{{ checksum "requirements-test.txt"}}-{{ checksum ".circleci/config.yml"}}

- <<: *install_repo

Expand All @@ -312,7 +346,7 @@ jobs:
path: test-results

gpu_experimental_tests:
<<: *gpu_cu111
<<: *gpu_cu114

working_directory: ~/xformers

Expand All @@ -321,16 +355,12 @@ jobs:

- run: nvidia-smi

- setup_pyenv:
version: 3.9.4

- <<: *setup_venv

# Cache the venv directory that contains dependencies
- restore_cache:
keys:
- cache-key-gpu-exp-111-{{ checksum "experimental/requirements.txt" }}-{{ checksum ".circleci/config.yml" }}
- cache-key-gpu-exp-114-{{ checksum "experimental/requirements.txt" }}-{{ checksum ".circleci/config.yml" }}

- <<: *setup_conda
- <<: *install_dep_exp

- check_torch:
Expand All @@ -340,8 +370,10 @@ jobs:

- save_cache:
paths:
- ~/miniconda
- ~/venv
key: cache-key-gpu-exp-111-{{ checksum "experimental/requirements.txt"}}-{{ checksum ".circleci/config.yml"}}

key: cache-key-gpu-exp-114-{{ checksum "experimental/requirements.txt"}}-{{ checksum ".circleci/config.yml"}}

- <<: *install_experimental_repo
- <<: *run_experimental_unittests
Expand Down Expand Up @@ -372,17 +404,17 @@ jobs:
- setup_pyenv:
version: << parameters.python_version >>

- <<: *setup_venv
- <<: *setup_conda
- run:
name: Install dependencies + xformers from binary
command: |
set -ex
echo "torch==${PYTORCH_VERSION}+${CU_VERSION}"
export PYTORCH_CONSTRAINT="torch==${PYTORCH_VERSION}+${CU_VERSION}"
pip install --progress-bar off "${PYTORCH_CONSTRAINT}" -f https://download.pytorch.org/whl/torch_stable.html
pip install --progress-bar off numpy pytest
python3 -m pip install --progress-bar off "${PYTORCH_CONSTRAINT}" -f https://download.pytorch.org/whl/torch_stable.html
python3 -m pip install --progress-bar off numpy pytest
echo $(ls ~/workspace)
pip install --progress-bar off $(ls -d ~/workspace/*)
python3 -m pip install --progress-bar off $(ls -d ~/workspace/*)
- checkout

Expand Down
Loading

0 comments on commit 498e009

Please sign in to comment.