Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DSPv2] Investigate GetContextByTypeAndName failing in kfp-driver #369

Closed
Tracked by #335
gmfrasca opened this issue Oct 5, 2023 · 4 comments
Closed
Tracked by #335

[DSPv2] Investigate GetContextByTypeAndName failing in kfp-driver #369

gmfrasca opened this issue Oct 5, 2023 · 4 comments

Comments

@gmfrasca
Copy link
Member

gmfrasca commented Oct 5, 2023

When attempting to run a DSPv2 pipeline (from PoC), the run will fail to run, and kfp-driver will have the following error in the pod logs:

{
  "severity":"error",
  "timestamp":"2023-10-04T23:42:50.987Z",
  "logger":"kfp-driver",
  "caller":"controller/run.go:92",
  "message":"kfp-driver execution failed when reconciling Run gfrasca/iris-training-pipeline-b2rbx-root-system-dag-driver: driver.RootDAG(pipelineName=iris-training-pipeline, runID=b3a3f2e3-abb5-4629-800f-27a25b516f56, runtimeConfig, componentSpec) failed: Failed GetContextByTypeAndName(type=\"system.Pipeline\", name=\"iris-training-pipeline\")",
  "knative.dev/controller":"git.luolix.top.kubeflow.pipelines.backend.src.v2.controller.Reconciler",
  "knative.dev/kind":"tekton.dev.CustomRun",
  "knative.dev/traceid":"23c8bb8f-2d62-41ea-aa9b-8d71f06f2682",
  "knative.dev/key":"gfrasca/iris-training-pipeline-b2rbx-root-system-dag-driver",
  "stacktrace":"github.com/kubeflow/pipelines/backend/src/v2/controller.(*Reconciler).ReconcileKind
                    /go/src/github.com/kubeflow/pipelines/backend/src/v2/controller/run.go:92
                github.com/tektoncd/pipeline/pkg/client/injection/reconciler/pipeline/v1beta1/customrun.(*reconcilerImpl).Reconcile
                    /go/src/github.com/kubeflow/pipelines/vendor/github.com/tektoncd/pipeline/pkg/client/injection/reconciler/pipeline/v1beta1/customrun/reconciler.go:240
                knative.dev/pkg/controller.(*Impl).processNextWorkItem
                    /go/src/github.com/kubeflow/pipelines/vendor/knative.dev/pkg/controller/controller.go:542
                knative.dev/pkg/controller.(*Impl).RunContext.func3
                    /go/src/github.com/kubeflow/pipelines/vendor/knative.dev/pkg/controller/controller.go:491"
}

Investigate the cause of this, and determine if it is the source of the non-running pipeline

@rimolive
Copy link
Contributor

rimolive commented Oct 11, 2023

KFP Driver needs a reference to MLMD gRPC server. Currently, we deploy KFP Driver in ocp pipelines namespace
because it uses Tekton libraries and reuses the configmap objects from ocp pipelines installation.

That way, we need to revisit our operator design to see if MLMD should be a central component or it should be deployed per namespace. From my understanding, MLMD should be a central piece as it records all artifacts from pipelines, which adds some Governance level for MLOps engineers and it can be used as some Model Registry solution since we don't have one in kubeflow space. However, from what I checked with the kfp team, MLMD does not handle namespace isolation, which might break our stack-per-namespace solution.

As for now, what it is possible to do is parameterize MLMD info and then decide on that to do next.

@rimolive
Copy link
Contributor

kubeflow/kfp-tekton#1378

@rimolive
Copy link
Contributor

Last PR #413

@rimolive
Copy link
Contributor

PR Link: #422

@HumairAK HumairAK mentioned this issue Oct 27, 2023
8 tasks
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Status: No status
Development

No branches or pull requests

3 participants