Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/pytorch tpu container #14

Merged
merged 22 commits into from
Mar 26, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
d20a488
PyTorch TPU Dockerfile
shub-kris Feb 15, 2024
84ca2b2
PyTorch TPU Dockerfile
shub-kris Feb 15, 2024
42bbd21
Add Notebook
shub-kris Feb 16, 2024
e679d99
remove: unused dockerfile, sentence-transformers
shub-kris Feb 16, 2024
5447403
Add example with PyTorch_XLA TPU DLC
shub-kris Feb 21, 2024
9b2fb84
Update README to add Single-Host
shub-kris Feb 21, 2024
de6c186
Add more info about different hosts
shub-kris Feb 21, 2024
4680bcd
Replace manual training script with trainer
shub-kris Feb 22, 2024
0901c59
Add Dolly, use TRL and OPT-350M
shub-kris Feb 23, 2024
c2e8a3b
Push local changes
shub-kris Feb 23, 2024
815b0df
Merge branch 'feature/pytorch-tpu-transformers-example' into feature/…
shub-kris Feb 26, 2024
e935b17
Change name, add transformers trainer example
shub-kris Feb 26, 2024
6747611
Update dockerfile and example according to nightly version
shub-kris Feb 28, 2024
68f94b4
Add info to add token
shub-kris Feb 28, 2024
79872fd
checkpointing is faster with this base image
shub-kris Feb 29, 2024
2a80fe2
Build transformers and trl from main for checkpointing and sfttrainer…
shub-kris Mar 14, 2024
b53b200
Update README
shub-kris Mar 14, 2024
e75a3dd
Change parameters
shub-kris Mar 14, 2024
e3041ad
Change paramters, add example for LLama
shub-kris Mar 15, 2024
7f2e378
Rename folder
shub-kris Mar 15, 2024
6bb2197
llama-example: name of function from train_gemma to train_llama
shub-kris Mar 15, 2024
b119483
update(tpu-examples): change model to gemma-7b, add inference at the …
shub-kris Mar 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
FROM us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.1.0_3.10_tpuvm
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed, maybe delete this file

Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
FROM us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.1.0_3.10_tpuvm
# The image with PyTorch = 2.1, Python=3.10
tengomucho marked this conversation as resolved.
Show resolved Hide resolved
# Read more about it here: https://github.com/pytorch/xla?tab=readme-ov-file#docker

LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive

# Versions
ARG TRANSFORMERS='4.37.2'
ARG DIFFUSERS='0.26.1'
ARG PEFT='0.8.2'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if sentence transformers support tpu

ARG TRL='0.7.10'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if sentence transformers support tpu

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right it doesn't.

ARG DATASETS='2.16.1'
ARG ACCELERATE='0.27.0'
ARG EVALUATE='0.4.1'
ARG SENTENCE_TRANSFORMERS='2.3.1'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if sentence transformers support tpu


RUN apt-get update \
&& apt-get install -y \
bzip2 \
curl \
git \
git-lfs \
tar \
gcc \
g++ \
libaio-dev \
# audio
libsndfile1-dev \
ffmpeg \
apt-transport-https \
gnupg \
ca-certificates \
&& apt-get clean autoremove --yes

# Update pip
RUN pip install --upgrade pip


# Install Hugging Face Libraries
RUN pip install --upgrade --no-cache-dir \
transformers[sklearn,sentencepiece,vision]==${TRANSFORMERS} \
diffusers==${DIFFUSERS} \
datasets==${DATASETS} \
accelerate==${ACCELERATE} \
evaluate==${EVALUATE} \
peft==${PEFT} \
trl==${TRL} \
sentence-transformers==${SENTENCE_TRANSFORMERS}

#Install Google Cloud Dependencies
RUN echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" \
| tee -a /etc/apt/sources.list.d/google-cloud-sdk.list && \
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg \
| apt-key --keyring /usr/share/keyrings/cloud.google.gpg add - && \
apt-get update -y && \
apt-get install google-cloud-sdk -y

RUN pip install --upgrade --no-cache-dir \
google-cloud-storage \
google-cloud-bigquery \
google-cloud-aiplatform \
google-cloud-pubsub \
google-cloud-logging

# Check if correct versions are installed
RUN python -c "import transformers, diffusers, datasets, accelerate, evaluate, peft, trl, sentence_transformers, torch; \
assert all([mod.__version__ == version for mod, version in [(transformers, '${TRANSFORMERS}'), (diffusers, '${DIFFUSERS}'), \
(datasets, '${DATASETS}'), (accelerate, '${ACCELERATE}'), (evaluate, '${EVALUATE}'), (peft, '${PEFT}'), (trl, '${TRL}'), \
(sentence_transformers, '${SENTENCE_TRANSFORMERS}'), (torch, '2.1.0')]])"