Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3.11 + FA 2.5.0 + Torch 2.3.0 #2898

Merged
merged 77 commits into from
Jan 26, 2024
Merged
Show file tree
Hide file tree
Changes from 66 commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
a708103
image version update
KuuCi Jan 17, 2024
3c516d2
update builder
KuuCi Jan 17, 2024
21751de
remove torch 1.13
KuuCi Jan 17, 2024
2a1dbf6
possible snappy fix
KuuCi Jan 17, 2024
456c82b
moved changes to generate_build_matrix
KuuCi Jan 19, 2024
7d6d8dd
3.11 support
KuuCi Jan 19, 2024
2ba8b3a
test
KuuCi Jan 19, 2024
d89890a
version test
KuuCi Jan 19, 2024
4d82663
remove snappy test
KuuCi Jan 19, 2024
7f26e18
add 3.10 + 3.11
KuuCi Jan 19, 2024
f9c83a1
potential snappy fix
KuuCi Jan 19, 2024
775cb45
nightly patch
KuuCi Jan 19, 2024
1824dd3
debug
KuuCi Jan 19, 2024
e491a2a
debug
KuuCi Jan 19, 2024
ecce586
extrapolated pytorch to depend on python version
KuuCi Jan 20, 2024
965694e
removed python 3.11 pytorch 2.0.1 support and merge conflict
KuuCi Jan 22, 2024
851bc40
python 3.8 deprecation assertion
KuuCi Jan 22, 2024
4d01cab
removed deprecation
KuuCi Jan 22, 2024
7be59d4
removing import for test
KuuCi Jan 22, 2024
9e31fa6
lint
KuuCi Jan 22, 2024
34cd00a
lint
KuuCi Jan 22, 2024
4e744f8
Merge branch 'dev' into version-upgrade-vincent
KuuCi Jan 22, 2024
245107d
pr review changes
KuuCi Jan 23, 2024
fadfbbf
Merge branch 'version-upgrade-vincent' of https://github.com/mosaicml…
KuuCi Jan 23, 2024
1760a6a
apt install snappy before pip install
KuuCi Jan 23, 2024
d7e8956
lint
KuuCi Jan 23, 2024
a5fdcf7
disk usage print logs
KuuCi Jan 23, 2024
9911d98
Merge branch 'dev' of https://github.com/mosaicml/composer into space…
KuuCi Jan 23, 2024
7425871
du depth 3
KuuCi Jan 23, 2024
c98196c
syntax
KuuCi Jan 23, 2024
8854fa6
more syntax
KuuCi Jan 23, 2024
37d3b67
syntax
KuuCi Jan 23, 2024
3515141
syntax
KuuCi Jan 23, 2024
573ed70
inspect root
KuuCi Jan 23, 2024
234bed0
depth 1
KuuCi Jan 23, 2024
c5dcfa5
debug
KuuCi Jan 23, 2024
b154676
remove sys and proc from du
KuuCi Jan 23, 2024
8b91b3c
install fa2 through pip
KuuCi Jan 23, 2024
228c6df
install dependancy
KuuCi Jan 23, 2024
cb121c2
no build isolation
KuuCi Jan 23, 2024
c9946ed
setuptools
KuuCi Jan 23, 2024
4d94725
downgrade to 2.3.6
KuuCi Jan 23, 2024
308d5a8
2 workers
KuuCi Jan 23, 2024
c9c84aa
revert
KuuCi Jan 23, 2024
e11718e
flash 1.0.9
KuuCi Jan 23, 2024
a6887f4
flash 2.3.6
KuuCi Jan 23, 2024
5a27e42
lint
KuuCi Jan 23, 2024
38dd794
fa 2.5.0
KuuCi Jan 24, 2024
1b16ddd
nightly 3.11
KuuCi Jan 24, 2024
eba1c94
type
KuuCi Jan 24, 2024
bda8245
Merge branch 'dev' of https://github.com/mosaicml/composer into space…
KuuCi Jan 24, 2024
c402a68
remove python 3.11 and torch 2.1.2
KuuCi Jan 24, 2024
cedb433
remove timeout
KuuCi Jan 24, 2024
5964c19
reset latest version
KuuCi Jan 24, 2024
438cfb8
smoke test update
KuuCi Jan 24, 2024
5b880ad
lint
KuuCi Jan 24, 2024
d9c3550
update yaml
KuuCi Jan 24, 2024
773f3b5
2.3.6 test
KuuCi Jan 24, 2024
f1ee751
revert test
KuuCi Jan 24, 2024
1969995
reversion continued
KuuCi Jan 24, 2024
3810253
restoring from before reversion
KuuCi Jan 24, 2024
ecc2a60
max jobs
KuuCi Jan 24, 2024
680a702
increase timeout
KuuCi Jan 25, 2024
bc80230
increase timeout
KuuCi Jan 25, 2024
5bab582
revert to only include nightly change
KuuCi Jan 25, 2024
5f71bfb
reset to default build time
KuuCi Jan 25, 2024
a80bbcd
merge
KuuCi Jan 25, 2024
7fb711f
update docker yaml
KuuCi Jan 25, 2024
78c5ba8
new names
KuuCi Jan 25, 2024
84222da
merge
KuuCi Jan 26, 2024
d6062f0
merge
KuuCi Jan 26, 2024
3027d27
fix merge
KuuCi Jan 26, 2024
8b0504a
lint
KuuCi Jan 26, 2024
1d928a3
cpu-3.11-nightly test
KuuCi Jan 26, 2024
4d888fd
temp rm test
KuuCi Jan 26, 2024
1b117d1
cpu unit tst
KuuCi Jan 26, 2024
1b574be
rm test
KuuCi Jan 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/smoketest.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ jobs:
- "3.8"
- "3.9"
- "3.10"
- "3.11"
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
Expand Down
4 changes: 2 additions & 2 deletions composer/datasets/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ class MultiTokenEOSCriteria(transformers.StoppingCriteria):
def __init__(
self,
stop_sequence: str,
tokenizer: transformers.PreTrainedTokenizer,
tokenizer: Union[transformers.PreTrainedTokenizer, transformers.PreTrainedTokenizerFast],
batch_size: int,
) -> None:
self.done_tracker = [False] * batch_size
Expand Down Expand Up @@ -213,7 +213,7 @@ def __call__(self, input_ids, scores: Optional[torch.FloatTensor] = None, **kwar
return False not in self.done_tracker

def stop_sequences_criteria(
tokenizer: transformers.PreTrainedTokenizer,
tokenizer: Union[transformers.PreTrainedTokenizer, transformers.PreTrainedTokenizerFast],
stop_sequences: List[str],
batch_size: int,
) -> transformers.StoppingCriteriaList:
Expand Down
10 changes: 5 additions & 5 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@ RUN apt-get update && \
tcl \
libjpeg8-dev \
less \
libsnappy-dev \
mvpatel2000 marked this conversation as resolved.
Show resolved Hide resolved
# For AWS EFA:
autoconf \
autotools-dev \
Expand Down Expand Up @@ -269,6 +270,7 @@ RUN if [ -n "$MOFED_VERSION" ] ; then \
rm -rf /tmp/mofed ; \
fi


KuuCi marked this conversation as resolved.
Show resolved Hide resolved
#####################
# Install NVIDIA Apex
#####################
Expand All @@ -294,10 +296,7 @@ RUN if [[ -n "$CUDA_VERSION" ]] && [[ -z "${PYTORCH_NIGHTLY_URL}" ]]; then \
RUN if [ -n "$CUDA_VERSION" ] ; then \
pip${PYTHON_VERSION} install --upgrade --no-cache-dir ninja==1.11.1 && \
pip${PYTHON_VERSION} install --upgrade --no-cache-dir --force-reinstall packaging==22.0 && \
git clone --branch v2.4.2 https://github.com/Dao-AILab/flash-attention.git && \
cd flash-attention && \
MAX_JOBS=1 python${PYTHON_VERSION} setup.py install && \
cd .. ; \
MAX_JOBS=1 pip${PYTHON_VERSION} install --no-cache-dir flash-attn==2.5.0; \
KuuCi marked this conversation as resolved.
Show resolved Hide resolved
fi

###############
Expand Down Expand Up @@ -356,7 +355,8 @@ RUN apt-get update && \
RUN pip install --no-cache-dir --upgrade \
certifi${CERTIFI_VERSION} \
ipython${IPYTHON_VERSION} \
urllib3${URLLIB3_VERSION}
urllib3${URLLIB3_VERSION} \
python-snappy
mvpatel2000 marked this conversation as resolved.
Show resolved Hide resolved

##################################################
# Override NVIDIA mistaken env var for 11.8 images
Expand Down
1 change: 1 addition & 0 deletions docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ To install composer, once inside the image, run `pip install mosaicml`.
<!-- BEGIN_PYTORCH_BUILD_MATRIX -->
| Linux Distro | Flavor | PyTorch Version | CUDA Version | Python Version | Docker Tags |
KuuCi marked this conversation as resolved.
Show resolved Hide resolved
|----------------|----------|-------------------|---------------------|------------------|------------------------------------------------------------------------------------------|
| Ubuntu 20.04 | Base | 2.3.0 | 12.1.0 (Infiniband) | 3.11 | `mosaicml/pytorch:2.3.0_cu121-nightly20240110-python3.11-ubuntu20.04` |
| Ubuntu 20.04 | Base | 2.3.0 | 12.1.0 (Infiniband) | 3.10 | `mosaicml/pytorch:2.3.0_cu121-nightly20240110-python3.10-ubuntu20.04` |
| Ubuntu 20.04 | Base | 2.1.2 | 12.1.0 (Infiniband) | 3.10 | `mosaicml/pytorch:latest`, `mosaicml/pytorch:2.1.2_cu121-python3.10-ubuntu20.04` |
| Ubuntu 20.04 | Base | 2.1.2 | 12.1.0 (EFA) | 3.10 | `mosaicml/pytorch:latest-aws`, `mosaicml/pytorch:2.1.2_cu121-python3.10-ubuntu20.04-aws` |
Expand Down
27 changes: 27 additions & 0 deletions docker/build_matrix.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,33 @@
- mosaicml/pytorch:2.3.0_cu121-nightly20240110-python3.10-ubuntu20.04
TARGET: pytorch_stage
TORCHVISION_VERSION: 0.18.0
- AWS_OFI_NCCL_VERSION: ''
BASE_IMAGE: nvidia/cuda:12.1.0-cudnn8-devel-ubuntu20.04
CUDA_VERSION: 12.1.0
IMAGE_NAME: torch-nightly-2-3-0-20240110-cu121
MOFED_VERSION: 5.5-1.0.3.2
NVIDIA_REQUIRE_CUDA_OVERRIDE: cuda>=12.1 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471
brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471
brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471
brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471
brand=tesla,driver>=510,driver<511 brand=unknown,driver>=510,driver<511 brand=nvidia,driver>=510,driver<511
brand=nvidiartx,driver>=510,driver<511 brand=geforce,driver>=510,driver<511 brand=geforcertx,driver>=510,driver<511
brand=quadro,driver>=510,driver<511 brand=quadrortx,driver>=510,driver<511 brand=titan,driver>=510,driver<511
brand=titanrtx,driver>=510,driver<511 brand=tesla,driver>=515,driver<516 brand=unknown,driver>=515,driver<516
brand=nvidia,driver>=515,driver<516 brand=nvidiartx,driver>=515,driver<516 brand=geforce,driver>=515,driver<516
brand=geforcertx,driver>=515,driver<516 brand=quadro,driver>=515,driver<516 brand=quadrortx,driver>=515,driver<516
brand=titan,driver>=515,driver<516 brand=titanrtx,driver>=515,driver<516 brand=tesla,driver>=525,driver<526
brand=unknown,driver>=525,driver<526 brand=nvidia,driver>=525,driver<526 brand=nvidiartx,driver>=525,driver<526
brand=geforce,driver>=525,driver<526 brand=geforcertx,driver>=525,driver<526 brand=quadro,driver>=525,driver<526
brand=quadrortx,driver>=525,driver<526 brand=titan,driver>=525,driver<526 brand=titanrtx,driver>=525,driver<526
PYTHON_VERSION: '3.11'
PYTORCH_NIGHTLY_URL: https://download.pytorch.org/whl/nightly/cu121
PYTORCH_NIGHTLY_VERSION: dev20240110+cu121
PYTORCH_VERSION: 2.3.0
TAGS:
- mosaicml/pytorch:2.3.0_cu121-nightly20240110-python3.11-ubuntu20.04
TARGET: pytorch_stage
TORCHVISION_VERSION: 0.18.0
- AWS_OFI_NCCL_VERSION: ''
BASE_IMAGE: nvidia/cuda:12.1.0-cudnn8-devel-ubuntu20.04
COMPOSER_INSTALL_COMMAND: mosaicml[all]==0.18.0
Expand Down
22 changes: 20 additions & 2 deletions docker/generate_build_matrix.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ def _main():
entry['AWS_OFI_NCCL_VERSION'] = 'v1.7.4-aws'

pytorch_entries.append(entry)
nightly_entry = {
nightly_entry_310 = {
'AWS_OFI_NCCL_VERSION': '',
'BASE_IMAGE': 'nvidia/cuda:12.1.0-cudnn8-devel-ubuntu20.04',
'CUDA_VERSION': '12.1.0',
Expand All @@ -242,7 +242,25 @@ def _main():
'TARGET': 'pytorch_stage',
'TORCHVISION_VERSION': '0.18.0'
}
pytorch_entries.append(nightly_entry)
pytorch_entries.append(nightly_entry_310)

nightly_entry_311 = {
'AWS_OFI_NCCL_VERSION': '',
'BASE_IMAGE': 'nvidia/cuda:12.1.0-cudnn8-devel-ubuntu20.04',
'CUDA_VERSION': '12.1.0',
'IMAGE_NAME': 'torch-nightly-2-3-0-20240110-cu121',
'MOFED_VERSION': '5.5-1.0.3.2',
'NVIDIA_REQUIRE_CUDA_OVERRIDE': _get_cuda_override('12.1.0'),
'PYTHON_VERSION': '3.11',
'PYTORCH_VERSION': '2.3.0',
'PYTORCH_NIGHTLY_URL': 'https://download.pytorch.org/whl/nightly/cu121',
'PYTORCH_NIGHTLY_VERSION': 'dev20240110+cu121',
'TAGS': ['mosaicml/pytorch:2.3.0_cu121-nightly20240110-python3.11-ubuntu20.04'],
'TARGET': 'pytorch_stage',
'TORCHVISION_VERSION': '0.18.0'
}
pytorch_entries.append(nightly_entry_311)

composer_entries = []

# The `GIT_COMMIT` is a placeholder and Jenkins will substitute it with the actual git commit for the `composer_staging` images
Expand Down
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,7 @@ def package_files(prefix: str, directory: str, extension: str):
'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: 3.9',
'Programming Language :: Python :: 3.10',
'Programming Language :: Python :: 3.11',
],
install_requires=install_requires,
entry_points={
Expand Down
4 changes: 2 additions & 2 deletions tests/datasets/test_in_context_learning_datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,13 +73,13 @@ def test_stop_sequences_criteria(tiny_gpt2_tokenizer):
seq1 = tiny_gpt2_tokenizer('Dogs are furry')['input_ids']
seq2 = tiny_gpt2_tokenizer('Dogs are furry\n\n')['input_ids']
seq1 = [50257] * (len(seq2) - len(seq1)) + seq1
input_ids = torch.tensor([seq1, seq2])
input_ids = torch.LongTensor([seq1, seq2])
assert not eos_criteria(input_ids, None)

eos_criteria = MultiTokenEOSCriteria('\n\n', tiny_gpt2_tokenizer, 2)
seq1 = tiny_gpt2_tokenizer('Dogs are furry\n\n')['input_ids']
seq2 = tiny_gpt2_tokenizer('Dogs are furry\n\n')['input_ids']
input_ids = torch.tensor([seq1, seq2])
input_ids = torch.LongTensor([seq1, seq2])
assert eos_criteria(input_ids, None)


Expand Down
Loading