🧹 Cleanup and fixes for TGI #96

tengomucho · 2024-09-25T14:00:44Z

What does this PR do?

Many changes to prepare a proper TGI image compatible with Jetstream Pytorch.
In particular:

torch default type set to bfloat16 (because that's what's used mostly in TPUs anyway).
Make warmup prepare smallest size from bucket (that it was previously skipping).
ignore common warnings in pytests to avoid cluttering tests outputs
add jetstream installation step to TGI image
update TGI image (refer to commit message for details)
install torch for cpu to avoid installing the default gpu version when possible.

This is what should be used on TPUs.

Also, add timing checks in warmup test.

- TGI version updated from 2.0.3 to 0ff6ff60ada291840beed63d8bf458d6f9606f7f, that is essentially 2.3.0 + few fixes to get the v2 proto interface working again. - This update was done because otherwise debug logs were not working. This can be complicated if we need to debug something in TGI, and so far the only solution was a hack forcing to re-add the debug in the server. This is now fixed and with the Jetstream Pt generator logs are fine now. - Obviously there was a drawback 😖 Logs on threads spawned by the Pytorch/XLA generator were now all weird and always appearing even when debug was off. This has been fixed, but the workaround is not very nice (I set an env var). I think the multithread generator is going to go away soon anyway, so this should not be a big deal. - The new TGI version is built using Python 3.11, while so far with optimum-tpu we have worked on Python 3.10, because that is what they say it should be used on Pytorch/XLA front page. So the image has been updated with the python3.11 support and the required transformers installation for now, because it's easier since they run on separate processes. - TGI build process has changed a bit, so dockerfile has been changed accordingly. The text-generation-router-v2 is renamed into text-generation-router because that is what the launcher expects.

This extra step will make images leaner as it will avoid having the gpu dependencies.

mfuntowicz

LFG! Let's go!

mfuntowicz · 2024-09-27T08:03:39Z

text-generation-inference/docker/Dockerfile

@@ -8,7 +8,7 @@ RUN tar -C /tgi -xf /tgi/sources.tar.gz --strip-components=1

 # Build cargo components (adapted from TGI original Dockerfile)
 # Note that the build image is aligned on the same Linux version as the base image (Debian bookworm/ Ubuntu 22.04)
-FROM lukemathwalker/cargo-chef:latest-rust-1.77-bookworm AS chef
+FROM lukemathwalker/cargo-chef:latest-rust-1.79-bookworm AS chef


What about using 1.81 (latest)?

Umh I used the latest version that was on the TGI docker image.

mfuntowicz · 2024-09-27T08:04:22Z

text-generation-inference/docker/Dockerfile

 COPY --from=tgi /tgi/launcher launcher
-RUN cargo build --release --workspace --exclude benchmark
+RUN cargo build --profile release-opt

 # Python base image
 FROM ubuntu:22.04 AS base


ubuntu:24.04 or is it breaking stuff?

All TPU packages and images are so far tuned for this version. I might update that if at some point I have time to, but I do not see a priority about it for now.

tengomucho added 6 commits September 25, 2024 09:59

feat(Jetstream Pt): set torch default dtype to bfloat16

ec02ad3

This is what should be used on TPUs.

fix(warmup): make warmup work for smallest prefill size

8b35945

Also, add timing checks in warmup test.

chore(pytest): ignore common warnings

0249f09

chore(docker): add jetstream installation step to TGI image

a90d91b

chore(install): install torch for cpu

fae7a04

This extra step will make images leaner as it will avoid having the gpu dependencies.

tengomucho changed the title ~~Quick fixes for jetstream~~ 🧹 Cleanup and fixes for TGI Sep 25, 2024

tengomucho requested a review from mfuntowicz September 25, 2024 14:18

tengomucho marked this pull request as ready for review September 25, 2024 14:18

mfuntowicz approved these changes Sep 27, 2024

View reviewed changes

tengomucho merged commit f5ad698 into main Sep 27, 2024
3 checks passed

tengomucho deleted the quick-fixes-for-jetstream branch September 27, 2024 08:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🧹 Cleanup and fixes for TGI #96

🧹 Cleanup and fixes for TGI #96

tengomucho commented Sep 25, 2024

mfuntowicz left a comment

mfuntowicz Sep 27, 2024

tengomucho Sep 27, 2024

mfuntowicz Sep 27, 2024

tengomucho Sep 27, 2024

🧹 Cleanup and fixes for TGI #96

🧹 Cleanup and fixes for TGI #96

Conversation

tengomucho commented Sep 25, 2024

What does this PR do?

mfuntowicz left a comment

Choose a reason for hiding this comment

mfuntowicz Sep 27, 2024

Choose a reason for hiding this comment

tengomucho Sep 27, 2024

Choose a reason for hiding this comment

mfuntowicz Sep 27, 2024

Choose a reason for hiding this comment

tengomucho Sep 27, 2024

Choose a reason for hiding this comment