Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fixes and E2E tests for PVC storage #340

Merged
merged 45 commits into from
Mar 21, 2023
Merged

Bug fixes and E2E tests for PVC storage #340

merged 45 commits into from
Mar 21, 2023

Conversation

ckadner
Copy link
Member

@ckadner ckadner commented Mar 6, 2023

Motivation

Address PVC follow-up work items outlined in #337 for PVC storage introduced in #267

Modifications

Code changes:

  • Sort PVC mounts on serving runtime specs to avoid unstable repeated
    runtime rollouts as Kubernetes treat two otherwise identical deployment
    specs as different if the same set of volume mounts are in different order
  • Don't add non-existent PVCs from predictor/ISVC when allowAnyPVC is
    enabled as this would cause all serving pods for that runtime to stay in
    Pending state with unbound (pending) volumes
  • Tolerate missing storage-config secret when allowAnyPVC is enabled
  • Lint: fix "io/ioutil" deprecations

FVT changes:

  • Add Storage test suite
  • Add helper methods to add PVC to storage-config during FVT
  • Allow for additional time in WaitForReadyDeployStatus but allow early abort on success
  • Check if pod still running before gRPC/REST requests, reconnect if necessary
  • Only choose "Ready" runtime pod for port-forwards
  • Include ISVC tests in Predictor test suite to ensure "serial" execution of TLS tests

Resolves #337

@ckadner ckadner added the test testing related bugs and fixes label Mar 6, 2023
@ckadner ckadner added this to the v0.11.0 milestone Mar 6, 2023
@ckadner ckadner requested review from ScrapCodes, rafvasq and njhill March 6, 2023 22:08
@ckadner ckadner self-assigned this Mar 6, 2023
@ckadner ckadner removed request for tedhtchang and Tomcli March 6, 2023 22:09
Makefile Outdated Show resolved Hide resolved
Related: kserve#230, kserve#267

Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
ckadner added 2 commits March 6, 2023 15:50
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
If channel timeout occured before timeToStabilize,
then the logic to compute targetStateReached was not
run and the method failed despite stable deploy status

Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
ckadner added 6 commits March 7, 2023 03:57
Until allowAnyPVC no longer accepts non-existent PVC
we may not get past model "Pending" state

Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Until controller fix is in place that prevents non-existent
PVCs to be bound to runtime pods

Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Pods can die and the connection breaks resulting in
flakey tests

Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
@ckadner ckadner requested a review from tjohnson31415 March 9, 2023 19:07
@ckadner ckadner marked this pull request as draft March 9, 2023 19:07
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Copy link
Contributor

@tjohnson31415 tjohnson31415 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the FVT changes are looking good to me! Just some NITs and questions.

apis/serving/v1alpha1/predictor_types.go Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
controllers/servingruntime_controller.go Outdated Show resolved Hide resolved
controllers/servingruntime_controller.go Outdated Show resolved Hide resolved
controllers/servingruntime_controller.go Outdated Show resolved Hide resolved
fvt/storage/storage_test.go Outdated Show resolved Hide resolved
fvt/storage/storage_test.go Outdated Show resolved Hide resolved
fvt/testdata/isvcs/isvc-pvc-3.yaml Outdated Show resolved Hide resolved
fvt/testdata/isvcs/isvc-pvc-4.yaml Outdated Show resolved Hide resolved
fvt/predictor/predictor_test.go Outdated Show resolved Hide resolved
@njhill
Copy link
Member

njhill commented Mar 17, 2023

Huge thanks @ckadner for all of the great work on this. And big thanks to @tjohnson31415 too for the thorough review!

@ckadner
Copy link
Member Author

ckadner commented Mar 18, 2023

Thank you @tjohnson31415 and @njhill -- I will make the code changes you suggested and tag you for a re-review once the checks completed

ckadner added 2 commits March 17, 2023 22:15
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
ckadner added 4 commits March 19, 2023 18:53
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
@ckadner
Copy link
Member Author

ckadner commented Mar 20, 2023

@tjohnson31415 -- I think I addressed all of your comments and created new issues to track outstanding followup work items. The review comments which were straight forward to address I collapsed as "Resolved" (Outdated).

I left some conversations as unresolved. Those you can mark as resolved if you are okay with my approach. If not let me know and I take another stab at them. Thanks again for your thorough review!!!

New issues:

ckadner added 2 commits March 20, 2023 16:14
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
@ckadner
Copy link
Member Author

ckadner commented Mar 21, 2023

@tjohnson31415 @njhill -- I updated the "Model storage" text bits and removed the random ready endpoints code.

I truly hope this is the last time I am asking you to review this PR :-)

Copy link
Contributor

@tjohnson31415 tjohnson31415 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More tests, less bugs, and better code. LGTM!
Thanks for going through the final iterations to keep the FVT duration down!

@ckadner
Copy link
Member Author

ckadner commented Mar 21, 2023

Thanks @tjohnson31415 for your review and all the valuable pointers!

@njhill I simplified the first PR comment to get a more concise commit message, but feel free to edit it before adding your /lgmt + /approve

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @ckadner @tjohnson31415

@kserve-oss-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ckadner, njhill, tjohnson31415

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@njhill
Copy link
Member

njhill commented Mar 21, 2023

/lgtm

@njhill njhill merged commit 3f01857 into kserve:main Mar 21, 2023
VedantMahabaleshwarkar pushed a commit to VedantMahabaleshwarkar/modelmesh-serving that referenced this pull request Mar 23, 2023
Motivation

Address PVC follow-up work items outlined in kserve#337 for PVC storage introduced in opendatahub-io#267

Modifications

Code changes:
- Sort PVC mounts on serving runtime specs to avoid unstable repeated
runtime rollouts as Kubernetes treat two otherwise identical deployment
specs as different if the same set of volume mounts are in different order
- Don't add non-existent PVCs from predictor/ISVC when allowAnyPVC is
enabled as this would cause all serving pods for that runtime to stay in
- Pending state with unbound (pending) volumes
- Tolerate missing storage-config secret when allowAnyPVC is enabled
- Lint: fix "io/ioutil" deprecations

FVT changes:
- Add Storage test suite
- Add helper methods to add PVC to storage-config during FVT
- Allow for additional time in WaitForReadyDeployStatus but allow early abort on success
- Check if pod still running before gRPC/REST requests, reconnect if necessary
- Only choose "Ready" runtime pod for port-forwards
- Include ISVC tests in Predictor test suite to ensure "serial" execution of TLS tests

Resolves kserve#337

Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
VedantMahabaleshwarkar pushed a commit to VedantMahabaleshwarkar/modelmesh-serving that referenced this pull request Mar 23, 2023
Motivation

Address PVC follow-up work items outlined in kserve#337 for PVC storage introduced in opendatahub-io#267

Modifications

Code changes:
- Sort PVC mounts on serving runtime specs to avoid unstable repeated
runtime rollouts as Kubernetes treat two otherwise identical deployment
specs as different if the same set of volume mounts are in different order
- Don't add non-existent PVCs from predictor/ISVC when allowAnyPVC is
enabled as this would cause all serving pods for that runtime to stay in
- Pending state with unbound (pending) volumes
- Tolerate missing storage-config secret when allowAnyPVC is enabled
- Lint: fix "io/ioutil" deprecations

FVT changes:
- Add Storage test suite
- Add helper methods to add PVC to storage-config during FVT
- Allow for additional time in WaitForReadyDeployStatus but allow early abort on success
- Check if pod still running before gRPC/REST requests, reconnect if necessary
- Only choose "Ready" runtime pod for port-forwards
- Include ISVC tests in Predictor test suite to ensure "serial" execution of TLS tests

Resolves kserve#337

Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
VedantMahabaleshwarkar pushed a commit to VedantMahabaleshwarkar/modelmesh-serving that referenced this pull request Mar 23, 2023
Motivation

Address PVC follow-up work items outlined in kserve#337 for PVC storage introduced in opendatahub-io#267

Modifications

Code changes:
- Sort PVC mounts on serving runtime specs to avoid unstable repeated
runtime rollouts as Kubernetes treat two otherwise identical deployment
specs as different if the same set of volume mounts are in different order
- Don't add non-existent PVCs from predictor/ISVC when allowAnyPVC is
enabled as this would cause all serving pods for that runtime to stay in
- Pending state with unbound (pending) volumes
- Tolerate missing storage-config secret when allowAnyPVC is enabled
- Lint: fix "io/ioutil" deprecations

FVT changes:
- Add Storage test suite
- Add helper methods to add PVC to storage-config during FVT
- Allow for additional time in WaitForReadyDeployStatus but allow early abort on success
- Check if pod still running before gRPC/REST requests, reconnect if necessary
- Only choose "Ready" runtime pod for port-forwards
- Include ISVC tests in Predictor test suite to ensure "serial" execution of TLS tests

Resolves kserve#337

Signed-off-by: Christian Kadner <ckadner@us.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved bug Something isn't working documentation Improvements or additions to documentation lgtm test testing related bugs and fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PVC support follow up items
5 participants