-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding the triton docker build minimal example #242
Conversation
Thanks for this example! Would it be possible to run the server from inside the runtime = sgl.Runtime(model_path="mistralai/Mistral-7B-Instruct-v0.2")
sgl.set_default_backend(runtime) |
|
@amirarsalan90 Thanks for contributing to this! Could you document the files better?
|
@merrymercy this implementation follows Triton's convention for model registry. See here for more details. The schema of the API inputs / outputs is specified in the A different approach to this could be to run as a triton backend similar to what vLLM do here. I suspect this would be a bit more involved with how SGLang creates the backend processes as part of the server but haven't looked into it too closely. |
@merrymercy as @isaac-vidas explained, that is the directory convention for triton inference server for model registry. I removed the inference.ipynb notebook and added curl request to the readme file to query the triton server. As far as I understand, vllm backend for triton inference server also disables some features of Triton (like batching) and has some limitations: https://github.com/triton-inference-server/tutorials/blob/main/Quick_Deploy/vLLM/README.md But I agree this is a very minimal and basic way to set up triton for sglang. I needed it for a project of mine, and thought it might be helpful for others too. |
@amirarsalan90 It is merged. Thanks! |
Adding a minimal example to build docker container to serve sglang with triton inference server using python backend.