-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🧹 Cleanup and fixes for TGI #96
Conversation
This is what should be used on TPUs.
Also, add timing checks in warmup test.
- TGI version updated from 2.0.3 to 0ff6ff60ada291840beed63d8bf458d6f9606f7f, that is essentially 2.3.0 + few fixes to get the v2 proto interface working again. - This update was done because otherwise debug logs were not working. This can be complicated if we need to debug something in TGI, and so far the only solution was a hack forcing to re-add the debug in the server. This is now fixed and with the Jetstream Pt generator logs are fine now. - Obviously there was a drawback 😖 Logs on threads spawned by the Pytorch/XLA generator were now all weird and always appearing even when debug was off. This has been fixed, but the workaround is not very nice (I set an env var). I think the multithread generator is going to go away soon anyway, so this should not be a big deal. - The new TGI version is built using Python 3.11, while so far with optimum-tpu we have worked on Python 3.10, because that is what they say it should be used on Pytorch/XLA front page. So the image has been updated with the python3.11 support and the required transformers installation for now, because it's easier since they run on separate processes. - TGI build process has changed a bit, so dockerfile has been changed accordingly. The text-generation-router-v2 is renamed into text-generation-router because that is what the launcher expects.
This extra step will make images leaner as it will avoid having the gpu dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LFG! Let's go!
@@ -8,7 +8,7 @@ RUN tar -C /tgi -xf /tgi/sources.tar.gz --strip-components=1 | |||
|
|||
# Build cargo components (adapted from TGI original Dockerfile) | |||
# Note that the build image is aligned on the same Linux version as the base image (Debian bookworm/ Ubuntu 22.04) | |||
FROM lukemathwalker/cargo-chef:latest-rust-1.77-bookworm AS chef | |||
FROM lukemathwalker/cargo-chef:latest-rust-1.79-bookworm AS chef |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using 1.81 (latest)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umh I used the latest version that was on the TGI docker image.
COPY --from=tgi /tgi/launcher launcher | ||
RUN cargo build --release --workspace --exclude benchmark | ||
RUN cargo build --profile release-opt | ||
|
||
# Python base image | ||
FROM ubuntu:22.04 AS base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ubuntu:24.04
or is it breaking stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All TPU packages and images are so far tuned for this version. I might update that if at some point I have time to, but I do not see a priority about it for now.
What does this PR do?
Many changes to prepare a proper TGI image compatible with Jetstream Pytorch.
In particular: