-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set model status using torchserve api #1878
Set model status using torchserve api #1878
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1878 +/- ##
==========================================
+ Coverage 41.67% 44.95% +3.28%
==========================================
Files 55 63 +8
Lines 2282 2609 +327
Branches 1 56 +55
==========================================
+ Hits 951 1173 +222
- Misses 1331 1436 +105
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@jagadeeshi2i @byeongjokim @maaquib Do you know if this PR will be taken to complete state? Even we have hit a scenario that would need readiness check from the model side and this PR implements it. |
@gavrishp @jagadeeshi2i @maaquib |
Logs with llama model
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic works as shown in the logs attached to the PR. Approving for now
Head branch was pushed to by a user without write access
@agunapal |
Thank you for processing the workflows. |
Description
Fixes #1773, kserve/kserve#1915
Issue:
1773 is the PR that tries to set the model status after TS ISVC is deployed. However, it causes problems.
In an initialization function of TS handler, you can do many things to prepare the model such as loading the model weight. (e.g., download tokenizer weight from online storage)
The current code may cause an error if someone calls the Kserve API before TS workers are initialized.
In particular, when the pod is scaled-out with readinessProbe(/v1/model/{mode_name} or /v2/model/{model_name}/status) set.
However, there is also a problem with the previous code.
It calls model.load() at the first request. In that case, an infinite cycle occurs between the ready status and the requests.
Therefore, to solve this problem, the model's ready status should be determined after communication with the torchserve Describe Model API.
Two types of API are used in
load
function.One is for checking only the worker's status. (when MODEL_LOAD_CUSTOMIZED is 'false')
It is used when the initialization of the handler does not take a long time.
The other is for checking the status of both the handler and the worker. (when MODEL_LOAD_CUSTOMIZED is 'true')
The
describe_handle
function should be implemented in the custom handler. RelatedIssueIt is used when the initialization of the handler takes a long time.
Type of change
Feature/Issue validation/testing
log.log
True
in kserve wrapperChecklist: