Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci][test] exclude model download time in server start time #7834

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions tests/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@

import openai
import requests
from huggingface_hub import snapshot_download
from transformers import AutoTokenizer
from typing_extensions import ParamSpec

Expand Down Expand Up @@ -64,6 +65,10 @@ def __init__(self,
env_dict: Optional[Dict[str, str]] = None,
auto_port: bool = True,
max_wait_seconds: Optional[float] = None) -> None:
if not model.startswith("/"):
# download the model if it's not a local path
# to exclude the model download time from the server start time
model = snapshot_download(model)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snapshot_download can download many more files than we need for inference, such as duplicate .pt files when we prefer to use safetensors. We should try to use a common function with how vLLM usually pulls down files for models

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you come up with a fix for it?

I don't know if safetensors would be enough. but we can have atry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically I am proposing using DefaultModelLoader._prepare_weights

def _prepare_weights(self, model_name_or_path: str,

I can make a PR for this later today

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please go ahead, I do see more files are downloaded in https://buildkite.com/vllm/fastcheck/builds/3094#01918525-98da-4596-8d31-f4e2c1172455

if auto_port:
if "-p" in cli_args or "--port" in cli_args:
raise ValueError("You have manually specified the port"
Expand Down
Loading