📝 overhaul of the documentation, now 4.5x bigger (better?) #144

baptistecolle · 2025-01-15T13:03:55Z

What does this PR do?

This is a complete overhaul of the documentation:

We want from 1686 to 7565 words (4.5X bigger)
We auto-generate documentation for our examples
- This is auto generated from a .ipynb notebook using a custom scripts and the doc-builder converter feature https://moon-ci-docs.huggingface.co/docs/optimum-tpu/pr_144/en/howto/gemma_tuning
- So when we add new notebook examples they will also be in the docs 😁
New formatting and organization of the docs to make it easier to follow
Added new tutorials, how-to, conceptual guides, and references following the diataxis method

What is missing (could be added):

I think more examples would be nice, showing more diverse use cases
I believe FAQ and glossary would be nice to add, but this PR is big enough already
Guide and examples with Google Colab Pro as you can launch a v5e-1 TPU from there, so a one-click example would be nice
An example using GCE VM on Colab via GCP Marketplace
More diagrams and figures of the internal working of optimum-TPU to give some details would be interesting
A how-to guide on adding new models for new contributors
Docs for GKE is in the work and but not published yet as there are some blockers for that https://github.com/huggingface/optimum-tpu/blob/doc-deploy-gke/docs/source/howto/deploy-gke.md.
The current preview docs for GKE are for CLI only. A GUI guide would be interesting too

New Files Added

docs/scripts/auto-generate-examples.py
docs/scripts/examples_list.yml
docs/source/conceptual_guides/difference_between_jetstream_and_xla.mdx
docs/source/conceptual_guides/tpu_hardware_support.mdx
docs/source/contributing.mdx
docs/source/howto/advanced-tgi-serving.mdx
docs/source/howto/deploy_instance_on_ie.mdx
docs/source/howto/installation_inside_a_container.mdx
docs/source/installation.mdx
docs/source/optimum_container.mdx
docs/source/reference/fsdp_v2.mdx
docs/source/reference/tgi_advanced_options.mdx
docs/source/tutorials/inference_on_tpu.mdx
docs/source/tutorials/tpu_setup.mdx
docs/source/tutorials/training_on_tpu.mdx

Modified Files

docs/source/howto/training.mdx
docs/source/index.mdx
docs/source/supported-architectures.mdx

HuggingFaceDocBuilderDev · 2025-01-15T13:26:31Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

baptistecolle · 2025-01-15T13:55:08Z

BTW just for reference. We also now link to the optimum-tpu docs from:

TGI docs: PR review in progress: 📝 add guide on using TPU with TGI in the docs text-generation-inference#2907
HF Google Cloud docs: https://huggingface.co/docs/google-cloud/en/tpu (already live)

The goal is to increase visibility of the doc

tengomucho

Thanks for the huge work! Some general comments

I would prefer to avoid repetition: having information repeated in several places can be confusing and it is harder to maintain. E.g.: docker arguments, TGI args
you specify version numbers, I think it would be best if we could generate that, otherwise it will be a burden to maintain
Try to keep titles and toc tree in sync
There is a bit of repetition between the tutorials and howtos. Maybe you can rationalize that.
the conceptual guides should be more focused on optimum-tpu IMO, what do you think?

tengomucho · 2025-01-15T13:56:03Z

docs/source/conceptual_guides/difference_between_jetstream_and_xla.mdx

@@ -0,0 +1,17 @@
+# Differences between JetStream and PyTorch XLA


what about mentioning that you are talking about TGI? Also, "Jetstream Pytorch" might be more precise, as Jetstream has 2 implementations.
Also, I find this page a little bit confusing. We use Pytorch XLA everywhere, even Jetstream uses Pytorch XLA. Optimum TPU's TGI implementation can use Jetstream Pytorch or Pytorch XLA, but keep in mind this should be deprecated as we will probably remove it in the future.

tengomucho · 2025-01-15T13:58:08Z

docs/source/conceptual_guides/difference_between_jetstream_and_xla.mdx

+
+You can find more information about:
+- PyTorch XLA: https://pytorch.org/xla/ and https://github.com/pytorch/xla
+- JetStream: https://github.com/google/jaxon/tree/main/jetstream


you should rather point to https://github.com/AI-Hypercomputer/JetStream or to https://github.com/AI-Hypercomputer/jetstream-pytorch

tengomucho · 2025-01-15T14:00:21Z

docs/source/conceptual_guides/tpu_hardware_support.mdx

@@ -0,0 +1,54 @@
+# TPU hardware support
+Optimum-TPU support and is optimized for V5e, V5p, and V6e TPUs.


I think the V in V5e etc should be small case (that is how they write it). Also, remove v5p, we have never tested it.

tengomucho · 2025-01-15T14:04:31Z

docs/source/conceptual_guides/tpu_hardware_support.mdx

+## TPU naming convention
+The TPU naming follows this format: `<tpu_version>-<number_of_tpus>`
+
+TPU versions available: 


shouldn't this be "TPU available versions"?

tengomucho · 2025-01-15T14:05:46Z

docs/source/conceptual_guides/tpu_hardware_support.mdx

+# TPU hardware support
+Optimum-TPU support and is optimized for V5e, V5p, and V6e TPUs.
+
+## When to use TPU


What about renaming this "Why choosing TPUs"

tengomucho · 2025-01-15T16:00:30Z

docs/source/tutorials/tpu_setup.mdx

+
+1. Select TPU type:
+   - We'll use a TPU `v5e-8` (corresponds to a v5litepod8). This is a TPU node containing 8 v5e TPU chips
+   - For detailed specifications about TPU types, refer to our TPU types documentation


provide a link

tengomucho · 2025-01-15T16:01:36Z

docs/source/tutorials/tpu_setup.mdx

+- For deploying existing models, start with Model Serving
+- For training new models, begin with Model Training


provide links

tengomucho · 2025-01-15T16:04:18Z

docs/source/tutorials/training_on_tpu.mdx

+
+## 1. Start the Jupyter Container
+
+Launch the container with the following command:


you need to clone the optimum tpu git, install the jupyter notebook and then you can run it, but you will need to mount the notebook too.

tengomucho · 2025-01-15T16:04:56Z

docs/source/tutorials/training_on_tpu.mdx

+docker run --rm --net host --privileged \
+    -v$(pwd)/artifacts:/tmp/output \
+    -e HF_TOKEN=${HF_TOKEN} \
+    us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-pytorch-training-tpu.2.5.1.transformers.4.46.3.py310 \


I do not think we should provide a link to an image url that does not exist yet 😢

tengomucho · 2025-01-15T16:09:09Z

docs/source/tutorials/inference_on_tpu.mdx

+- `--privileged`: Required for TPU access
+- `--net host`: Uses host network mode
+- `-v ~/hf_data:/data`: Volume mount for model storage
+- `-e SKIP_WARMUP=1`: Disables warmup for quick testing (not recommended for production)


remove this

baptistecolle force-pushed the improve-documentation branch from 840e7aa to fc2dae5 Compare January 15, 2025 13:05

baptistecolle changed the title ~~📝 overhaul of the documentation, now 4.5 bigger (better?)~~ 📝 overhaul of the documentation, now 4.5x bigger (better?) Jan 15, 2025

baptistecolle force-pushed the improve-documentation branch 2 times, most recently from 2ea95f2 to 1be1304 Compare January 15, 2025 13:13

feat(docs): overhaul of the documentation

30da685

baptistecolle force-pushed the improve-documentation branch from 1be1304 to 30da685 Compare January 15, 2025 13:19

wip(ci): fix ci for the auto-generated docs

c16495a

baptistecolle marked this pull request as ready for review January 15, 2025 13:50

baptistecolle requested a review from tengomucho January 15, 2025 13:50

pagezyhf self-requested a review January 15, 2025 15:00

tengomucho reviewed Jan 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📝 overhaul of the documentation, now 4.5x bigger (better?) #144

📝 overhaul of the documentation, now 4.5x bigger (better?) #144

baptistecolle commented Jan 15, 2025 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 15, 2025

baptistecolle commented Jan 15, 2025 •

edited

Loading

tengomucho left a comment

tengomucho Jan 15, 2025

tengomucho Jan 15, 2025

tengomucho Jan 15, 2025

tengomucho Jan 15, 2025

tengomucho Jan 15, 2025

tengomucho Jan 15, 2025

tengomucho Jan 15, 2025

tengomucho Jan 15, 2025

tengomucho Jan 15, 2025

tengomucho Jan 15, 2025

		@@ -0,0 +1,17 @@
		# Differences between JetStream and PyTorch XLA

		@@ -0,0 +1,54 @@
		# TPU hardware support
		Optimum-TPU support and is optimized for V5e, V5p, and V6e TPUs.

		- For deploying existing models, start with Model Serving
		- For training new models, begin with Model Training


		## 1. Start the Jupyter Container

		Launch the container with the following command:

📝 overhaul of the documentation, now 4.5x bigger (better?) #144

Are you sure you want to change the base?

📝 overhaul of the documentation, now 4.5x bigger (better?) #144

Conversation

baptistecolle commented Jan 15, 2025 • edited Loading

What does this PR do?

New Files Added

Modified Files

HuggingFaceDocBuilderDev commented Jan 15, 2025

baptistecolle commented Jan 15, 2025 • edited Loading

tengomucho left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

baptistecolle commented Jan 15, 2025 •

edited

Loading

baptistecolle commented Jan 15, 2025 •

edited

Loading