Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: restart model service process #3282

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

kyujin-cho
Copy link
Member

@kyujin-cho kyujin-cho commented Dec 20, 2024

This PR enhances model service reloading experience by enabling user to restart model process only, not the whole container. Since this only stops and reloads the process itself, any changes applied to container spec (e.g. image, resource request, env var, ...) will not be reflected.

What's changed

Backend.AI Agent

  • Added new restart_model_service() RPC function
  • Split out model definition dictionary builder function (AbstractAgent.load_model_definition()) from AbstractAgent onto a new class ModelServiceManager
  • Updated AbstractKernel to remember informations about its model service information (model_service_info), which was considered as volatile before
  • Updated Agent to restart model service process while remaining kernel runner steady (AbstractAgent.restart_model_service())
    • Utilizes both newly introduced model_service_info and ModelServiceManager to shutdown existing model service process and replicate the recreation process, which was only done at kernel start process until now

Backend.AI Kernel

  • Added new shutdown_model_service() function

Backend.AI Manager

  • Added restart_model_service REST API and restart_model_service registry function

Checklist: (if applicable)

  • Milestone metadata specifying the target backport version

📚 Documentation preview 📚: https://sorna--3282.org.readthedocs.build/en/3282/


📚 Documentation preview 📚: https://sorna-ko--3282.org.readthedocs.build/ko/3282/

@kyujin-cho kyujin-cho added the type:feature Add new features label Dec 20, 2024
@kyujin-cho kyujin-cho added this to the 24.12 milestone Dec 20, 2024
@kyujin-cho kyujin-cho self-assigned this Dec 20, 2024
@github-actions github-actions bot added area:docs Documentations comp:manager Related to Manager component comp:agent Related to Agent component size:L 100~500 LoC labels Dec 20, 2024
@kyujin-cho kyujin-cho marked this pull request as draft December 20, 2024 16:48
@github-actions github-actions bot added size:XL 500~ LoC and removed size:L 100~500 LoC labels Dec 22, 2024
@kyujin-cho kyujin-cho changed the title feature: restart model service process feat: restart model service process Dec 22, 2024
@kyujin-cho kyujin-cho marked this pull request as ready for review December 22, 2024 07:52
@kyujin-cho kyujin-cho added the urgency:blocker IT SHOULD BE RESOLVED BEFORE NEXT RELEASE! label Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:docs Documentations comp:agent Related to Agent component comp:manager Related to Manager component size:XL 500~ LoC type:feature Add new features urgency:blocker IT SHOULD BE RESOLVED BEFORE NEXT RELEASE!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants