Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backend] kfp-kubernetes sets incorrect pod metadata in pipelines with multiple tasks #11077

Open
vanHavel opened this issue Aug 6, 2024 · 5 comments

Comments

@vanHavel
Copy link

vanHavel commented Aug 6, 2024

Environment

  • How did you deploy Kubeflow Pipelines (KFP)? From the manifests
  • KFP version: 2.2.0
  • KFP SDK version:
kfp                                2.8.0
kfp-kubernetes                     1.2.0
kfp-pipeline-spec                  0.3.0
kfp-server-api                     2.0.5

Steps to reproduce

Here is a minimal example pipeline.

from kfp import dsl
from kfp.kubernetes import add_pod_label, add_pod_annotation

@dsl.component
def a_op() -> int:
    return 1

@dsl.component
def b_op() -> int:
    return 2

@dsl.pipeline
def pipe() -> None:
    a = a_op()
    add_pod_label(a, "task", "a")
    add_pod_annotation(a, "task", "a")
    b = b_op()
    add_pod_label(b, "task", "b")
    add_pod_annotation(b, "task", "b")

This is the generated pipeline and platform spec, which looks correct.

Click to expand
pipeline_spec:
  components:
    comp-a-op:
      executorLabel: exec-a-op
      outputDefinitions:
        parameters:
          Output:
            parameterType: NUMBER_INTEGER
    comp-b-op:
      executorLabel: exec-b-op
      outputDefinitions:
        parameters:
          Output:
            parameterType: NUMBER_INTEGER
  deploymentSpec:
    executors:
      exec-a-op:
        container:
          args:
            - '--executor_input'
            - '{{$}}'
            - '--function_to_execute'
            - a_op
          command:
            - sh
            - '-c'
            - >

              if ! [ -x "$(command -v pip)" ]; then
                  python3 -m ensurepip || python3 -m ensurepip --user || apt-get install python3-pip
              fi


              PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet
              --no-warn-script-location 'kfp==2.8.0' '--no-deps'
              'typing-extensions>=3.7.4,<5; python_version<"3.9"' && "$0" "$@"
            - sh
            - '-ec'
            - >
              program_path=$(mktemp -d)


              printf "%s" "$0" > "$program_path/ephemeral_component.py"

              _KFP_RUNTIME=true python3 -m
              kfp.dsl.executor_main                        
              --component_module_path                        
              "$program_path/ephemeral_component.py"                        
              "$@"
            - |+

              import kfp
              from kfp import dsl
              from kfp.dsl import *
              from typing import *

              def a_op() -> int:
                  return 1

          image: 'python:3.8'
      exec-b-op:
        container:
          args:
            - '--executor_input'
            - '{{$}}'
            - '--function_to_execute'
            - b_op
          command:
            - sh
            - '-c'
            - >

              if ! [ -x "$(command -v pip)" ]; then
                  python3 -m ensurepip || python3 -m ensurepip --user || apt-get install python3-pip
              fi


              PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet
              --no-warn-script-location 'kfp==2.8.0' '--no-deps'
              'typing-extensions>=3.7.4,<5; python_version<"3.9"' && "$0" "$@"
            - sh
            - '-ec'
            - >
              program_path=$(mktemp -d)


              printf "%s" "$0" > "$program_path/ephemeral_component.py"

              _KFP_RUNTIME=true python3 -m
              kfp.dsl.executor_main                        
              --component_module_path                        
              "$program_path/ephemeral_component.py"                        
              "$@"
            - |+

              import kfp
              from kfp import dsl
              from kfp.dsl import *
              from typing import *

              def b_op() -> int:
                  return 2

          image: 'python:3.8'
  pipelineInfo:
    name: pipe
  root:
    dag:
      tasks:
        a-op:
          cachingOptions:
            enableCache: true
          componentRef:
            name: comp-a-op
          taskInfo:
            name: a-op
        b-op:
          cachingOptions:
            enableCache: true
          componentRef:
            name: comp-b-op
          taskInfo:
            name: b-op
  schemaVersion: 2.1.0
  sdkVersion: kfp-2.8.0
platform_spec:
  platforms:
    kubernetes:
      deploymentSpec:
        executors:
          exec-a-op:
            podMetadata:
              annotations:
                task: a
              labels:
                task: a
          exec-b-op:
            podMetadata:
              annotations:
                task: b
              labels:
                task: b

However, both pods for the two pipeline tasks get the label and annotation a.

I did some more testing, and it seems that for all tasks, the platform spec of the one that is alphabetically first is taken. If you rename the first task to c_op, both pods get the label / annotation b. This might be because the first platform spec in the YAML is always taken.

Expected result

The pod metadata is set correctly for all tasks. In the above example, task a should have label / annotation a, task b should have label / annotation b.

Materials and Reference

The pull request that introduced the feature to set labels / annotations: #10393

I am not too familiar with the codebase, but it might have something to do with not picking the correct KubernetesExecutorConfig.


Impacted by this bug? Give it a 👍.

@grzegorzluczyna
Copy link

grzegorzluczyna commented Aug 16, 2024

The problem seems to be in backend/src/v2/compiler/argocompiler/container.go:addContainerExecutorTemplate function (see: https://github.com/kubeflow/pipelines/blob/2.2.0/backend/src/v2/compiler/argocompiler/container.go#L197).

This function is supposed to add task's pod metadata to the template which it creates (see lines 279-285). However, this logic is actually called only during the first execution of the function. This is due to caching that takes place in lines 199-204 and 286:

	nameContainerExecutor := "system-container-executor"
	nameContainerImpl := "system-container-impl"
	_, ok := c.templates[nameContainerExecutor]
	if ok {
		return nameContainerExecutor
	}
        ...
        c.templates[nameContainerImpl] = executor

For the very first task (which due to SDK logic happens to be determined by the alphabetical order of components names), a template is created, task's metadata are added to it, and then the template is cached in c.templates map. However as the same key in this map is used for every task (i.e. "system-container-executor"), all other calls to the function returns the template created in the first execution. As a result, pod metadata of tasks other than the first one are ignored, while all tasks' pods receive the metadata defined for the first task.

I wonder if the minimal solution for this problem would be as easy as including the task/comp name (i.e. refName) in the template name, like so:

	nameContainerExecutor := "system-container-executor-" + refName
	nameContainerImpl := "system-container-impl-" + refName

I understand that such solution (if it worked) would have the following two consequences:

  1. The pod names would get longer. However personally I think it might be useful to have the task name included in the pod name, as currently it is always a guessing game when one tries to debug a pipeline with multiple steps.
  2. The workflow definition would get bigger and cluttered with repeated templates definitions. For this reason it seems like a better solution would be to apply the metadata not on the template level but later during runtime, similarly to how it is done for image and command (compare lines 269-270).

@shaikmoeed
Copy link

Any workaround for this? We are facing the same issue!

@grzegorzluczyna
Copy link

I could not find any. And frankly, the current code doesn't offer any hope that a workaround exists.

Unless you are ok with all tasks in your pipeline having the same label(s). Then simply use add_pod_label on the very first task in alphabetical order based on components names.

milosjava added a commit to milosjava/kubeflow_pipelines that referenced this issue Oct 2, 2024
milosjava added a commit to milosjava/kubeflow_pipelines that referenced this issue Oct 2, 2024
milosjava added a commit to milosjava/kubeflow_pipelines that referenced this issue Oct 2, 2024
(cherry picked from commit fb7e6b2)
Copy link

github-actions bot commented Nov 6, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Nov 6, 2024
@vanHavel
Copy link
Author

vanHavel commented Nov 6, 2024

Commenting as this issue is still relevant

@stale stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants