Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a server component for running multiple workers #1838

Closed
wants to merge 24 commits into from
Closed

Adding a server component for running multiple workers #1838

wants to merge 24 commits into from

Conversation

fozziethebeat
Copy link
Collaborator

@fozziethebeat fozziethebeat commented Jul 3, 2023

Why are these changes needed?

This adds a new server component that let's clients run multiple models on the same worker instance. With the new PeftModelAdapter and an eventual fix for huggingface/peft#430, this server component let's clients run multiple adapters that share the same base model weights and load the base model weights only once.

As of right now this not fully optimized since it loads the base model weights once per configured model, that is blocked on the Peft issue.

Related issue number (if applicable)

Implements #1805 (maybe fixes?)

Checks

  • I've run format.sh to lint the changes in this PR.
  • I've included any doc changes needed.
  • I've made sure the relevant tests are passing (if applicable).

…ing multiple models on the same machine process
@@ -135,15 +135,15 @@ async def check_length(request, prompt, max_tokens):
response = await client.post(
worker_addr + "/model_details",
headers=headers,
json={},
json={"model": request.model},
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I found all the spots that the OpenAI server calls the worker and ensured it includes model. Are there other places that need this fix?

Copy link
Member

@Ying1123 Ying1123 Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you fixed all places. #1858 also applied some of your changes. Please rebase.

return background_tasks


# Note: for all the calls below, we make a hard assumption that the caller
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed this works with two Pythia Peft models. The changes feel a little wonky and maybe fragile. Suggestions for an alternative strategy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is okay for now.

@Ying1123 Ying1123 self-assigned this Jul 3, 2023
@Ying1123 Ying1123 self-requested a review July 3, 2023 01:46
@Ying1123 Ying1123 added the enhancement New feature or request label Jul 3, 2023
@fozziethebeat
Copy link
Collaborator Author

Please leave any review comments, especially on the broad strategy for doing this, i took the most simple strategy I could see.

I only marked this as draft to ensure it doesn't get merged too quickly as I'm pretty sure there's hiding bugs.

@@ -714,7 +714,7 @@ async def create_chat_completion(request: APIChatCompletionRequest):
if error_check_ret is not None:
return error_check_ret

gen_params = get_gen_params(
gen_params = await get_gen_params(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tripped me up when manually testing. Seems this API route doesn't get used by many?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The default path does not use this API route. Some contribute it for their special usage.

docs/openai_api.md Outdated Show resolved Hide resolved
fastchat/serve/model_multi_worker.py Outdated Show resolved Hide resolved
fastchat/serve/model_multi_worker.py Outdated Show resolved Hide resolved
docs/openai_api.md Show resolved Hide resolved
fastchat/serve/model_multi_worker.py Outdated Show resolved Hide resolved
@fozziethebeat
Copy link
Collaborator Author

Seems there's no major redesign comments, marking this as ready for review. I'll be testing this in a docker setup tomorrow to 100% verify it works.

@fozziethebeat fozziethebeat marked this pull request as ready for review July 5, 2023 10:07
Copy link
Member

@Ying1123 Ying1123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. I think the overall design looks good and we can use this as a starting point.

Could you follow our recent refactor in #1858 and try to reuse as much code as possible? For example, could you import ModelWorker or BaseModelWorker from model_worker.py?

We are still iterating on the interface design of the model worker. The current design may not work for your Peft models.
Feel free to propose any changes that can make it better for Peft models and reduce the redundant code.

@@ -135,15 +135,15 @@ async def check_length(request, prompt, max_tokens):
response = await client.post(
worker_addr + "/model_details",
headers=headers,
json={},
json={"model": request.model},
Copy link
Member

@Ying1123 Ying1123 Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you fixed all places. #1858 also applied some of your changes. Please rebase.

@@ -714,7 +714,7 @@ async def create_chat_completion(request: APIChatCompletionRequest):
if error_check_ret is not None:
return error_check_ret

gen_params = get_gen_params(
gen_params = await get_gen_params(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The default path does not use this API route. Some contribute it for their special usage.

return background_tasks


# Note: for all the calls below, we make a hard assumption that the caller
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is okay for now.


# Note: For now the semaphore locks access to all models managed by the worker.
# This makes sense when all models are Peft models sharing the same underlying
# base model weights. It probably doesn't make sense in other scenarios.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think It also makes sense in other scenarios.
This semaphore is used to limit concurrency and prevent OOM.
In other scenarios, if multiple workers share the same GPU, they can be put under the same semaphore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, that was also my thinking. These ultimately all share the same queue.

fozziethebeat and others added 5 commits July 5, 2023 20:01
…ing multiple models on the same machine process
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
@fozziethebeat
Copy link
Collaborator Author

Rebase done, will need to manually test tomorrow and verify nothing breaks

@merrymercy
Copy link
Member

Sorry that we do not have CI set up yet.
If you want to test, you can contribute some unit tests under https://github.com/lm-sys/FastChat/tree/main/tests
Now we use these commands to test the OpenAI API server https://github.com/lm-sys/FastChat/blob/main/docs/commands/test_process.md#test-openai-api-server

@fozziethebeat
Copy link
Collaborator Author

With the fix I just added, tests pass!

Ying1123
Ying1123 previously approved these changes Jul 6, 2023
@Ying1123
Copy link
Member

Ying1123 commented Jul 6, 2023

@fozziethebeat Could you use Github rebase rather than merge? The history is messed up in Github: https://github.com/lm-sys/FastChat/pull/1838/files.

@fozziethebeat
Copy link
Collaborator Author

yeah, I tried and I think when I merged some some changes made via github UI I screwed things up. i'm actually really bad at rebasing, what commands do you suggest?

Something like

git pull --rebase upstream main
git checkout multi_modal_worker
git rebase main

?

@Ying1123
Copy link
Member

Ying1123 commented Jul 6, 2023

For the current case, I suggest copying the files you changed (only 3) and creating a new branch from the main. Then you have two options:

  1. Delete your local branch multi_model_worker, copying the new branch to multi_model_worker, then force push.
  2. Create a new pull request and link back.

For your reference, before the messed up, normally I use the following commands to rebase:

git fetch upstream main
git rebase upstream/main

Then you follow the instruction displayed by git status to resolve conflicts. After fixing all conflicts, run:

git push --force

@fozziethebeat
Copy link
Collaborator Author

Yeah, looking at the git history fixing this is too messy.

Here's the replacement: #1866

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants