Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] v1 Metrics not working in KFP 2.2.0 #10988

Open
Tracked by #2763
MatthiasCarnein opened this issue Jul 9, 2024 · 7 comments
Open
Tracked by #2763

[bug] v1 Metrics not working in KFP 2.2.0 #10988

MatthiasCarnein opened this issue Jul 9, 2024 · 7 comments

Comments

@MatthiasCarnein
Copy link

Environment

  • How do you deploy Kubeflow Pipelines (KFP)? Kubeflow manifests, kustomize
  • KFP version: 2.2.0
  • KFP SDK version: 1.8.22

Steps to reproduce

I ran into a regression with v1 metrics while testing Kubeflow 1.9.0-rc.2 (KFP 2.2.0). When defining v1 metrics they are no longer picked up. The UI does not show the metric as a run output and the metrics field in the run object returned by the SDK is None:
image001

The same example is working fine when downgrading to KFP 2.1.0.

Take one of the (older) Kubeflow metrics samples from the website:

from typing import NamedTuple
from kfp.components import InputPath, OutputPath, create_component_from_func

def produce_metrics() -> NamedTuple('Outputs', [('mlpipeline_metrics', 'Metrics'),]):
  import json

  accuracy = 0.9
  metrics = {
    'metrics': [{
      'name': 'accuracy-score',
      'numberValue':  accuracy,
      'format': "PERCENTAGE",
    }]
  }
  return [json.dumps(metrics)]

produce_metrics_op = create_component_from_func(produce_metrics, base_image='python:3.10')

def my_pipeline():
    produce_metrics_op()

Expected result

The metric should be picked up and shown in the UI. The run.metrics field returned from the SDK should not be None but contain the metric.

Materials and reference

The metrics json is sucessfully written to s3 and shown as an output artifact in the UI. But it's not recognised as a metric.
Sidenote: The UI also shows the run as being cached even though the component is executed as expected. I'm not sure whether this is related though. Issue #10966 also mentions caching issues in KFP 2.2.0.

Labels

/area backend


Impacted by this bug? Give it a 👍.

@MatthiasCarnein
Copy link
Author

I was able to test and confirm this issue on a clean system and a fresh install of Kubeflow 1.9.0-rc.2. I am confident that this is indeed a regression.

@rimolive I think this is relevant for the Kubeflow 1.9 release.

To elaborate on this: I would expect the metrics to show up in the UI as before
run_output_expected

What I am seeing instead is "no metrics found for this run":
run_output

In addition, the metrics returned by the SDK should not be None:

run = client.create_run_from_pipeline_func(my_pipeline, namespace="kubeflow-user-example-com", experiment_name="metrics", run_name="metrics", arguments={})

run_completed = client.wait_for_run_completion(run.run_id, timeout=500)

run_completed.run.metrics ## should not be None but contain the metric

@MatthiasCarnein
Copy link
Author

The ml-pipeline pod shows the following error, indicating that the artifact cannot be found:

I0712 07:56:53.332533       6 error.go:278] ResourceNotFoundError: artifact runs/7bdefab1-5dbc-4a2f-a361-60021c1d90e6/nodes/my-pipeline-jr2fl-metrics-587086406/artifacts/mlpipeline-metrics not found
github.com/kubeflow/pipelines/backend/src/common/util.NewResourceNotFoundError
        /go/src/github.com/kubeflow/pipelines/backend/src/common/util/error.go:170
github.com/kubeflow/pipelines/backend/src/apiserver/resource.(*ResourceManager).ReadArtifact
        /go/src/github.com/kubeflow/pipelines/backend/src/apiserver/resource/resource_manager.go:1457
github.com/kubeflow/pipelines/backend/src/apiserver/server.(*RunServer).ReadArtifactV1
        /go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/run_server.go:430
github.com/kubeflow/pipelines/backend/api/v1beta1/go_client._RunService_ReadArtifactV1_Handler.func1
        /go/src/github.com/kubeflow/pipelines/backend/api/v1beta1/go_client/run.pb.go:2293
main.apiServerInterceptor
        /go/src/github.com/kubeflow/pipelines/backend/src/apiserver/interceptor.go:30
github.com/kubeflow/pipelines/backend/api/v1beta1/go_client._RunService_ReadArtifactV1_Handler
        /go/src/github.com/kubeflow/pipelines/backend/api/v1beta1/go_client/run.pb.go:2295
google.golang.org/grpc.(*Server).processUnaryRPC
        /go/pkg/mod/google.golang.org/grpc@v1.58.3/server.go:1374
google.golang.org/grpc.(*Server).handleStream
        /go/pkg/mod/google.golang.org/grpc@v1.58.3/server.go:1751
google.golang.org/grpc.(*Server).serveStreams.func1.1
        /go/pkg/mod/google.golang.org/grpc@v1.58.3/server.go:986
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1650
failed to read artifact 'run_id:"7bdefab1-5dbc-4a2f-a361-60021c1d90e6" node_id:"my-pipeline-jr2fl-metrics-587086406" artifact_name:"mlpipeline-metrics"'
github.com/kubeflow/pipelines/backend/src/common/util.(*UserError).wrapf
        /go/src/github.com/kubeflow/pipelines/backend/src/common/util/error.go:266
github.com/kubeflow/pipelines/backend/src/common/util.Wrapf
        /go/src/github.com/kubeflow/pipelines/backend/src/common/util/error.go:337
github.com/kubeflow/pipelines/backend/src/apiserver/server.(*RunServer).ReadArtifactV1
        /go/src/github.com/kubeflow/pipelines/backend/src/apiserver/server/run_server.go:433
github.com/kubeflow/pipelines/backend/api/v1beta1/go_client._RunService_ReadArtifactV1_Handler.func1
        /go/src/github.com/kubeflow/pipelines/backend/api/v1beta1/go_client/run.pb.go:2293
main.apiServerInterceptor
        /go/src/github.com/kubeflow/pipelines/backend/src/apiserver/interceptor.go:30
github.com/kubeflow/pipelines/backend/api/v1beta1/go_client._RunService_ReadArtifactV1_Handler
        /go/src/github.com/kubeflow/pipelines/backend/api/v1beta1/go_client/run.pb.go:2295
google.golang.org/grpc.(*Server).processUnaryRPC
        /go/pkg/mod/google.golang.org/grpc@v1.58.3/server.go:1374
google.golang.org/grpc.(*Server).handleStream
        /go/pkg/mod/google.golang.org/grpc@v1.58.3/server.go:1751
google.golang.org/grpc.(*Server).serveStreams.func1.1
        /go/pkg/mod/google.golang.org/grpc@v1.58.3/server.go:986
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1650
/api.RunService/ReadArtifactV1 call failed
github.com/kubeflow/pipelines/backend/src/common/util.(*UserError).wrapf
        /go/src/github.com/kubeflow/pipelines/backend/src/common/util/error.go:266
github.com/kubeflow/pipelines/backend/src/common/util.Wrapf
        /go/src/github.com/kubeflow/pipelines/backend/src/common/util/error.go:337
main.apiServerInterceptor
        /go/src/github.com/kubeflow/pipelines/backend/src/apiserver/interceptor.go:32
github.com/kubeflow/pipelines/backend/api/v1beta1/go_client._RunService_ReadArtifactV1_Handler
        /go/src/github.com/kubeflow/pipelines/backend/api/v1beta1/go_client/run.pb.go:2295
google.golang.org/grpc.(*Server).processUnaryRPC
        /go/pkg/mod/google.golang.org/grpc@v1.58.3/server.go:1374
google.golang.org/grpc.(*Server).handleStream
        /go/pkg/mod/google.golang.org/grpc@v1.58.3/server.go:1751
google.golang.org/grpc.(*Server).serveStreams.func1.1
        /go/pkg/mod/google.golang.org/grpc@v1.58.3/server.go:986
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1650

However, the artifact is successfully written to minio and also shown as an output artifact in the UI:
output_artifacts

@MatthiasCarnein
Copy link
Author

v1 visualisations (tables, html, ...) are still working fine and displayed in the UI. This seems to be specific to metrics.

@MatthiasCarnein
Copy link
Author

As mentioned on Slack by @juliusvonkohout, KFP 2.0.5 already changed how (v1?) metrics are displayed in the UI. They used to be shown on the Experiments tab similar to this older screenshot.
run-scores

This is no longer the case since KFP 2.0.5. I'm not sure whether this was an intentional change. In any case, in KFP 2.0.5 they were still visible as a run output and accessible through the SDK as shown above. This seems to be broken in KFP 2.2.0 now and metrics are no longer working at all.

@juliusvonkohout
Copy link
Member

@rimolive can you pick this up?

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Sep 21, 2024
@juliusvonkohout
Copy link
Member

/lifecycle frozen

@google-oss-prow google-oss-prow bot added lifecycle/frozen and removed lifecycle/stale The issue / pull request is stale, any activities remove this label. labels Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants