-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug fixes and E2E tests for PVC storage #340
Conversation
Related: kserve#230, kserve#267 Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
If channel timeout occured before timeToStabilize, then the logic to compute targetStateReached was not run and the method failed despite stable deploy status Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Until allowAnyPVC no longer accepts non-existent PVC we may not get past model "Pending" state Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Until controller fix is in place that prevents non-existent PVCs to be bound to runtime pods Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Pods can die and the connection breaks resulting in flakey tests Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the FVT changes are looking good to me! Just some NITs and questions.
Huge thanks @ckadner for all of the great work on this. And big thanks to @tjohnson31415 too for the thorough review! |
Thank you @tjohnson31415 and @njhill -- I will make the code changes you suggested and tag you for a re-review once the checks completed |
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
@tjohnson31415 -- I think I addressed all of your comments and created new issues to track outstanding followup work items. The review comments which were straight forward to address I collapsed as "Resolved" (Outdated). I left some conversations as unresolved. Those you can mark as resolved if you are okay with my approach. If not let me know and I take another stab at them. Thanks again for your thorough review!!!
New issues: |
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
@tjohnson31415 @njhill -- I updated the "Model storage" text bits and removed the random ready endpoints code. I truly hope this is the last time I am asking you to review this PR :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More tests, less bugs, and better code. LGTM!
Thanks for going through the final iterations to keep the FVT duration down!
Thanks @tjohnson31415 for your review and all the valuable pointers! @njhill I simplified the first PR comment to get a more concise commit message, but feel free to edit it before adding your |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @ckadner @tjohnson31415
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ckadner, njhill, tjohnson31415 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
Motivation Address PVC follow-up work items outlined in kserve#337 for PVC storage introduced in opendatahub-io#267 Modifications Code changes: - Sort PVC mounts on serving runtime specs to avoid unstable repeated runtime rollouts as Kubernetes treat two otherwise identical deployment specs as different if the same set of volume mounts are in different order - Don't add non-existent PVCs from predictor/ISVC when allowAnyPVC is enabled as this would cause all serving pods for that runtime to stay in - Pending state with unbound (pending) volumes - Tolerate missing storage-config secret when allowAnyPVC is enabled - Lint: fix "io/ioutil" deprecations FVT changes: - Add Storage test suite - Add helper methods to add PVC to storage-config during FVT - Allow for additional time in WaitForReadyDeployStatus but allow early abort on success - Check if pod still running before gRPC/REST requests, reconnect if necessary - Only choose "Ready" runtime pod for port-forwards - Include ISVC tests in Predictor test suite to ensure "serial" execution of TLS tests Resolves kserve#337 Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Motivation Address PVC follow-up work items outlined in kserve#337 for PVC storage introduced in opendatahub-io#267 Modifications Code changes: - Sort PVC mounts on serving runtime specs to avoid unstable repeated runtime rollouts as Kubernetes treat two otherwise identical deployment specs as different if the same set of volume mounts are in different order - Don't add non-existent PVCs from predictor/ISVC when allowAnyPVC is enabled as this would cause all serving pods for that runtime to stay in - Pending state with unbound (pending) volumes - Tolerate missing storage-config secret when allowAnyPVC is enabled - Lint: fix "io/ioutil" deprecations FVT changes: - Add Storage test suite - Add helper methods to add PVC to storage-config during FVT - Allow for additional time in WaitForReadyDeployStatus but allow early abort on success - Check if pod still running before gRPC/REST requests, reconnect if necessary - Only choose "Ready" runtime pod for port-forwards - Include ISVC tests in Predictor test suite to ensure "serial" execution of TLS tests Resolves kserve#337 Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Motivation Address PVC follow-up work items outlined in kserve#337 for PVC storage introduced in opendatahub-io#267 Modifications Code changes: - Sort PVC mounts on serving runtime specs to avoid unstable repeated runtime rollouts as Kubernetes treat two otherwise identical deployment specs as different if the same set of volume mounts are in different order - Don't add non-existent PVCs from predictor/ISVC when allowAnyPVC is enabled as this would cause all serving pods for that runtime to stay in - Pending state with unbound (pending) volumes - Tolerate missing storage-config secret when allowAnyPVC is enabled - Lint: fix "io/ioutil" deprecations FVT changes: - Add Storage test suite - Add helper methods to add PVC to storage-config during FVT - Allow for additional time in WaitForReadyDeployStatus but allow early abort on success - Check if pod still running before gRPC/REST requests, reconnect if necessary - Only choose "Ready" runtime pod for port-forwards - Include ISVC tests in Predictor test suite to ensure "serial" execution of TLS tests Resolves kserve#337 Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Motivation
Address PVC follow-up work items outlined in #337 for PVC storage introduced in #267
Modifications
Code changes:
runtime rollouts as Kubernetes treat two otherwise identical deployment
specs as different if the same set of volume mounts are in different order
allowAnyPVC
isenabled as this would cause all serving pods for that runtime to stay in
Pending state with unbound (pending) volumes
storage-config
secret whenallowAnyPVC
is enabled"io/ioutil"
deprecationsFVT changes:
Resolves #337