Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed the DSPA status update on reconciliation #651

Merged
merged 4 commits into from
May 27, 2024

Conversation

hbelmiro
Copy link
Contributor

The issue resolved by this Pull Request:

Resolves https://issues.redhat.com/browse/RHOAIENG-6287

Description of your changes:

This PR refactors controllers.DSPAReconciler#Reconcile so it updates the DSPA status correctly.

Testing instructions

Scenario 1 - Successful deployment

  • Deploy config/samples/v2/dspa-simple to the kubeflow namespace.

  • Wait until all the pods get started.

oc get pods -n kubeflow
NAME                                                      READY   STATUS    RESTARTS   AGE
ds-pipeline-metadata-envoy-sample-5857b74974-psjcd        2/2     Running   0          49s
ds-pipeline-metadata-grpc-sample-79995d8b76-8rg8c         1/1     Running   0          49s
ds-pipeline-persistenceagent-sample-69c64c4cf-8bgvl       1/1     Running   0          50s
ds-pipeline-sample-7975c7f99-b5fl5                        2/2     Running   0          50s
ds-pipeline-scheduledworkflow-sample-bc6f4c55d-prfbz      1/1     Running   0          50s
ds-pipeline-ui-sample-867bf74ff9-kp8f2                    2/2     Running   0          49s
ds-pipeline-workflow-controller-sample-757696fcff-4864z   1/1     Running   0          49s
mariadb-sample-5455fd4c74-plnl9                           1/1     Running   0          69s
minio-sample-5d58bf78f9-4mcnr                             1/1     Running   0          69s
  • Check the DSPA status:
oc get datasciencepipelinesapplication.datasciencepipelinesapplications.opendatahub.io/sample -o yaml -n kubeflow | yq .status
  • You should see a status similar to the following:
conditions:
  - lastTransitionTime: "2024-05-16T18:22:36Z"
    message: Database connectivity successfully verified
    reason: DatabaseAvailable
    status: "True"
    type: DatabaseAvailable
  - lastTransitionTime: "2024-05-16T18:22:36Z"
    message: Object Store connectivity successfully verified
    reason: ObjectStoreAvailable
    status: "True"
    type: ObjectStoreAvailable
  - lastTransitionTime: "2024-05-16T18:22:48Z"
    message: Component [ds-pipeline-sample] is minimally available.
    reason: MinimumReplicasAvailable
    status: "True"
    type: APIServerReady
  - lastTransitionTime: "2024-05-16T18:22:43Z"
    message: Component [ds-pipeline-persistenceagent-sample] is minimally available.
    reason: MinimumReplicasAvailable
    status: "True"
    type: PersistenceAgentReady
  - lastTransitionTime: "2024-05-16T18:22:41Z"
    message: Component [ds-pipeline-scheduledworkflow-sample] is minimally available.
    reason: MinimumReplicasAvailable
    status: "True"
    type: ScheduledWorkflowReady
  - lastTransitionTime: "2024-05-16T18:22:48Z"
    message: All components are ready.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Ready

Scenario 2 - Deployment failure

  • Deploy config/samples/v2/dspa-simple to the an-even-longer-veeeeeeeeeeeeeeery-long-data-science-project namespace.

  • Wait until the deployment fails with the following message:

ERROR	Reconciler error	{"controller": "datasciencepipelinesapplication", "controllerGroup": "datasciencepipelinesapplications.opendatahub.io", "controllerKind": "DataSciencePipelinesApplication", "DataSciencePipelinesApplication": {"name":"sample","namespace":"an-even-longer-veeeeeeeeeeeeeeery-long-data-science-project"}, "namespace": "an-even-longer-veeeeeeeeeeeeeeery-long-data-science-project", "name": "sample", "reconcileID": "30ac4943-f7ca-464c-9e42-7a729bb89b30", "error": "Route.route.openshift.io \"ds-pipeline-sample\" is invalid: spec.host: Invalid value: \"ds-pipeline-sample-an-even-longer-veeeeeeeeeeeeeeery-long-data-science-project.apps.hbelmiro-10.dev.datahub.redhat.com\": must be no more than 63 characters"}
  • Check the DSPA status:
oc get datasciencepipelinesapplication.datasciencepipelinesapplications.opendatahub.io/sample -o yaml -n an-even-longer-veeeeeeeeeeeeeeery-long-data-science-project | yq .status
  • You should see a status similar to the following:
conditions:
  - lastTransitionTime: "2024-05-16T17:59:15Z"
    message: Database connectivity successfully verified
    reason: DatabaseAvailable
    status: "True"
    type: DatabaseAvailable
  - lastTransitionTime: "2024-05-16T17:59:14Z"
    message: Object Store connectivity successfully verified
    reason: ObjectStoreAvailable
    status: "True"
    type: ObjectStoreAvailable
  - lastTransitionTime: "2024-05-16T17:59:15Z"
    message: 'Route.route.openshift.io "ds-pipeline-sample" is invalid: spec.host: Invalid value: "ds-pipeline-sample-an-even-longer-veeeeeeeeeeeeeeery-long-data-science-project.apps.hbelmiro-10.dev.datahub.redhat.com": must be no more than 63 characters'
    reason: FailingToDeploy
    status: "False"
    type: APIServerReady
  - lastTransitionTime: "2024-05-16T17:58:54Z"
    message: ""
    reason: Unknown
    status: Unknown
    type: PersistenceAgentReady
  - lastTransitionTime: "2024-05-16T17:58:54Z"
    message: ""
    reason: Unknown
    status: Unknown
    type: ScheduledWorkflowReady
  - lastTransitionTime: "2024-05-16T17:58:54Z"
    message: "Route.route.openshift.io \"ds-pipeline-sample\" is invalid: spec.host: Invalid value: \"ds-pipeline-sample-an-even-longer-veeeeeeeeeeeeeeery-long-data-science-project.apps.hbelmiro-10.dev.datahub.redhat.com\": must be no more than 63 characters \n \n \n"
    reason: MinimumReplicasAvailable
    status: "False"
    type: Ready

Checklist

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@dsp-developers
Copy link
Contributor

A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-651
An OCP cluster where you are logged in as cluster admin is required.

To use this image run the following:

cd $(mktemp -d)
git clone git@github.com:opendatahub-io/data-science-pipelines-operator.git
cd data-science-pipelines-operator/
git fetch origin pull/651/head
git checkout -b pullrequest 07e8e3b476ed8cf6cf2c9b58dcec2b3e799dfb86
oc new-project opendatahub
make deploy IMG="quay.io/opendatahub/data-science-pipelines-operator:pr-651"

More instructions here on how to deploy and test a Data Science Pipelines Application.

@hbelmiro hbelmiro marked this pull request as ready for review May 16, 2024 18:42
@openshift-ci openshift-ci bot requested review from DharmitD and gmfrasca May 16, 2024 18:42
@HumairAK HumairAK assigned HumairAK and unassigned HumairAK May 22, 2024
@HumairAK HumairAK requested review from HumairAK and removed request for gmfrasca May 22, 2024 18:19
@VaniHaripriya
Copy link
Contributor

VaniHaripriya commented May 22, 2024

@hbelmiro Tested both the scenarios and it worked as expected.
But I see that ds-pipeline-ui pod is crashing due to which kfp-ui is not able to retrieve pipelines.

Scenario1 Result:

conditions:
  - lastTransitionTime: "2024-05-22T17:43:33Z"
    message: Database connectivity successfully verified
    reason: DatabaseAvailable
    status: "True"
    type: DatabaseAvailable
  - lastTransitionTime: "2024-05-22T17:43:33Z"
    message: Object Store connectivity successfully verified
    reason: ObjectStoreAvailable
    status: "True"
    type: ObjectStoreAvailable
  - lastTransitionTime: "2024-05-22T17:43:44Z"
    message: Component [ds-pipeline-sample] is minimally available.
    reason: MinimumReplicasAvailable
    status: "True"
    type: APIServerReady
  - lastTransitionTime: "2024-05-22T17:43:39Z"
    message: Component [ds-pipeline-persistenceagent-sample] is minimally available.
    reason: MinimumReplicasAvailable
    status: "True"
    type: PersistenceAgentReady
  - lastTransitionTime: "2024-05-22T17:43:39Z"
    message: Component [ds-pipeline-scheduledworkflow-sample] is minimally available.
    reason: MinimumReplicasAvailable
    status: "True"
    type: ScheduledWorkflowReady
  - lastTransitionTime: "2024-05-22T17:43:44Z"
    message: All components are ready.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Ready

Scenario2 Result :

$ oc get datasciencepipelinesapplication.datasciencepipelinesapplications.opendatahub.io/sample -o yaml -n an-even-longer-veeeeeeeeeeeeeeery-long-data-science-project | yq .status
conditions:
  - lastTransitionTime: "2024-05-22T17:42:03Z"
    message: Database connectivity successfully verified
    reason: DatabaseAvailable
    status: "True"
    type: DatabaseAvailable
  - lastTransitionTime: "2024-05-22T17:42:03Z"
    message: Object Store connectivity successfully verified
    reason: ObjectStoreAvailable
    status: "True"
    type: ObjectStoreAvailable
  - lastTransitionTime: "2024-05-22T17:42:03Z"
    message: 'Route.route.openshift.io "ds-pipeline-sample" is invalid: spec.host: Invalid value: "ds-pipeline-sample-an-even-longer-veeeeeeeeeeeeeeery-long-data-science-project.apps.vmudadla.dev.datahub.redhat.com": must be no more than 63 characters'
    reason: FailingToDeploy
    status: "False"
    type: APIServerReady
  - lastTransitionTime: "2024-05-22T17:42:03Z"
    message: ""
    reason: Unknown
    status: Unknown
    type: PersistenceAgentReady
  - lastTransitionTime: "2024-05-22T17:42:03Z"
    message: ""
    reason: Unknown
    status: Unknown
    type: ScheduledWorkflowReady
  - lastTransitionTime: "2024-05-22T17:42:03Z"
    message: "Route.route.openshift.io \"ds-pipeline-sample\" is invalid: spec.host: Invalid value: \"ds-pipeline-sample-an-even-longer-veeeeeeeeeeeeeeery-long-data-science-project.apps.vmudadla.dev.datahub.redhat.com\": must be no more than 63 characters \n \n \n"
    reason: MinimumReplicasAvailable
    status: "False"
    type: Ready

@VaniHaripriya
Copy link
Contributor

I re-created the kubeflow namespace and tested out scenario1. I don't see any errors in kfp-ui now. I confirm that both the scenarios are working as expected.

controllers/dspastatus/dspa_status.go Outdated Show resolved Hide resolved
controllers/dspastatus/dspa_status.go Outdated Show resolved Hide resolved
controllers/dspastatus/dspa_status.go Show resolved Hide resolved
controllers/dspipeline_controller.go Outdated Show resolved Hide resolved
controllers/dspastatus/dspa_status.go Outdated Show resolved Hide resolved
Signed-off-by: hbelmiro <helber.belmiro@gmail.com>
Signed-off-by: hbelmiro <helber.belmiro@gmail.com>
Signed-off-by: hbelmiro <helber.belmiro@gmail.com>
Signed-off-by: hbelmiro <helber.belmiro@gmail.com>
@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-651

@HumairAK
Copy link
Contributor

Tested and works as intended:

image

Also tested tearing down pods and DSPA status continues to update as before

great work @hbelmiro !

@HumairAK
Copy link
Contributor

/lgtm
/approve

Copy link
Contributor

openshift-ci bot commented May 27, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: HumairAK

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@HumairAK HumairAK merged commit 5eacdd6 into opendatahub-io:main May 27, 2024
5 of 6 checks passed
@hbelmiro hbelmiro deleted the RHOAIENG-6287 branch May 27, 2024 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants