-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
📝 overhaul of the documentation, now 4.5x bigger (better?) #144
base: main
Are you sure you want to change the base?
Conversation
840e7aa
to
fc2dae5
Compare
2ea95f2
to
1be1304
Compare
1be1304
to
30da685
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
BTW just for reference. We also now link to the optimum-tpu docs from:
The goal is to increase visibility of the doc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the huge work! Some general comments
- I would prefer to avoid repetition: having information repeated in several places can be confusing and it is harder to maintain. E.g.: docker arguments, TGI args
- you specify version numbers, I think it would be best if we could generate that, otherwise it will be a burden to maintain
- Try to keep titles and toc tree in sync
- There is a bit of repetition between the tutorials and howtos. Maybe you can rationalize that.
- the conceptual guides should be more focused on optimum-tpu IMO, what do you think?
@@ -0,0 +1,17 @@ | |||
# Differences between JetStream and PyTorch XLA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about mentioning that you are talking about TGI? Also, "Jetstream Pytorch" might be more precise, as Jetstream has 2 implementations.
Also, I find this page a little bit confusing. We use Pytorch XLA everywhere, even Jetstream uses Pytorch XLA. Optimum TPU's TGI implementation can use Jetstream Pytorch or Pytorch XLA, but keep in mind this should be deprecated as we will probably remove it in the future.
|
||
You can find more information about: | ||
- PyTorch XLA: https://pytorch.org/xla/ and https://github.com/pytorch/xla | ||
- JetStream: https://github.com/google/jaxon/tree/main/jetstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should rather point to https://github.com/AI-Hypercomputer/JetStream or to https://github.com/AI-Hypercomputer/jetstream-pytorch
@@ -0,0 +1,54 @@ | |||
# TPU hardware support | |||
Optimum-TPU support and is optimized for V5e, V5p, and V6e TPUs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the V in V5e etc should be small case (that is how they write it). Also, remove v5p, we have never tested it.
## TPU naming convention | ||
The TPU naming follows this format: `<tpu_version>-<number_of_tpus>` | ||
|
||
TPU versions available: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be "TPU available versions"?
# TPU hardware support | ||
Optimum-TPU support and is optimized for V5e, V5p, and V6e TPUs. | ||
|
||
## When to use TPU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about renaming this "Why choosing TPUs"
|
||
1. Select TPU type: | ||
- We'll use a TPU `v5e-8` (corresponds to a v5litepod8). This is a TPU node containing 8 v5e TPU chips | ||
- For detailed specifications about TPU types, refer to our TPU types documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provide a link
- For deploying existing models, start with Model Serving | ||
- For training new models, begin with Model Training |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provide links
|
||
## 1. Start the Jupyter Container | ||
|
||
Launch the container with the following command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to clone the optimum tpu git, install the jupyter notebook and then you can run it, but you will need to mount the notebook too.
docker run --rm --net host --privileged \ | ||
-v$(pwd)/artifacts:/tmp/output \ | ||
-e HF_TOKEN=${HF_TOKEN} \ | ||
us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-pytorch-training-tpu.2.5.1.transformers.4.46.3.py310 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think we should provide a link to an image url that does not exist yet 😢
- `--privileged`: Required for TPU access | ||
- `--net host`: Uses host network mode | ||
- `-v ~/hf_data:/data`: Volume mount for model storage | ||
- `-e SKIP_WARMUP=1`: Disables warmup for quick testing (not recommended for production) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this
What does this PR do?
This is a complete overhaul of the documentation:
What is missing (could be added):
New Files Added
docs/scripts/auto-generate-examples.py
docs/scripts/examples_list.yml
docs/source/conceptual_guides/difference_between_jetstream_and_xla.mdx
docs/source/conceptual_guides/tpu_hardware_support.mdx
docs/source/contributing.mdx
docs/source/howto/advanced-tgi-serving.mdx
docs/source/howto/deploy_instance_on_ie.mdx
docs/source/howto/installation_inside_a_container.mdx
docs/source/installation.mdx
docs/source/optimum_container.mdx
docs/source/reference/fsdp_v2.mdx
docs/source/reference/tgi_advanced_options.mdx
docs/source/tutorials/inference_on_tpu.mdx
docs/source/tutorials/tpu_setup.mdx
docs/source/tutorials/training_on_tpu.mdx
Modified Files
docs/source/howto/training.mdx
docs/source/index.mdx
docs/source/supported-architectures.mdx