Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2 compiler has data passing limitations #5711

Closed
jackwhelpton opened this issue May 20, 2021 · 11 comments
Closed

v2 compiler has data passing limitations #5711

jackwhelpton opened this issue May 20, 2021 · 11 comments

Comments

@jackwhelpton
Copy link
Contributor

jackwhelpton commented May 20, 2021

Environment

kfp 1.6.2
kfp-pipeline-spec 0.1.7
kfp-server-api 1.5.0

Steps to reproduce

Use the upload_to_explicit_url component in a pipeline (picking a version that includes the latest v2 updates):

upload_op = load_component_from_url("https://raw.githubusercontent.com/kubeflow/pipelines/961b17fa6844e1d79e5d3686bb557d830d7b5a95/components/google-cloud/storage/upload_to_explicit_uri/component.yaml")
upload_task = upload_op(eval_task.output, results)  # pylint: disable=not-callable

As the Data input is untyped, it is treated as an artifact, and compiling produces this error:

TypeError: Passing PipelineParam "run-predictions-Results" with type "String" (as "Parameter") to component input "Data" with type "None" (as "Artifact") is incompatible. Please fix the type of the component input.

The fix appears straightforward, to include the String type on the Data input as well as the GCS path input and output.

Expected result

Pipeline compiles successfully.

Materials and Reference

https://raw.githubusercontent.com/kubeflow/pipelines/961b17fa6844e1d79e5d3686bb557d830d7b5a95/components/google-cloud/storage/upload_to_explicit_uri/component.yaml

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@Ark-kun Ark-kun self-assigned this May 25, 2021
@Ark-kun Ark-kun changed the title GCS pipeline components require further updating for v2 compatibility v2 compiler has data passing limitations May 25, 2021
@Ark-kun
Copy link
Contributor

Ark-kun commented May 25, 2021

The fix appears straightforward, to include the String type on the Data

This won't work. The v2 compiler imposes severe limitations on the String type. The pipeline will still not compile and also you'll lose the ability to upload any binary data like models.

A fix for this v2 compiler issue was proposed and implemented in #5478 Please take a look.

@jackwhelpton
Copy link
Contributor Author

I'll take a look, cheers!

For reference, my version that uses a local component with this change does compile, and works as expected. I imagine you'd run into trouble if you wanted to upload an artifact rather than a string, but as most of my pipelines are being ported from v1 they don't use artifact types (at least not intentionally: a lot of previously untyped parameters started being handled that way).

@Ark-kun
Copy link
Contributor

Ark-kun commented Jun 9, 2021

With the current state of the "v2 compiler", you'll need two separate uploader components - one for uploading strings (String-typed outputs) and another for uploading non-strings...

@Ark-kun
Copy link
Contributor

Ark-kun commented Jun 11, 2021

KFP v1

KFP has simple and easy data passing model that users like. There are different kinds of arguments and different ways to consume an input argument. All combinations working.

Any input can be consumed

  1. As value (inputValue placeholder)
  2. As file (inputPath placeholder).

And the argument for any input can come from

  1. Constant value
  2. Pipeline parameter
  3. Upstream component output

Combining these options there are 6 end-to-end data passing cases, each of which works regardless of types:

  1. Pass constant value which gets consumed as value.
  2. Pass constant value which gets consumed as file.
  3. Pass pipeline parameter that gets consumed as value.
  4. Pass pipeline parameter that gets consumed as file.
  5. Pass task output that gets consumed as value.
  6. Pass task output that gets consumed as file.

The only way types come into play is that when both upstream output and downstream input have types, the types must match.

Current KFP v2 compiler limitations:

The current version of the v2 compiler adds numerous data-passing limitations. Most limitations can easily be lifted (see #5478), but the compiler team wants to hear more user feedback.

  1. Pass constant value which gets consumed as value. Limitation: The input must be typed String, Integer or Float. Cannot pass constant values for inputs of type Date, Uri, JsonObject, GcpProjectId and all other types.
  2. Pass constant value which gets consumed as file. Not supported. Cannot pass small texts or JSON structures etc that get consumed as files.
  3. Pass pipeline parameter that gets consumed as value. Limitation: Both component input and pipeline parameter must be typed String, Integer or Float. Cannot pass pipeline parameters to inputs of type Date, Uri, JsonObject, GcpProjectId and all other types.
  4. Pass pipeline parameter that gets consumed as file. Not supported. Cannot pass small texts or JSON structures etc that get consumed as files.
  5. Pass task output that gets consumed as value. Limitation: Both upstream output and downstream input must be typed String, Integer or Float Cannot consume inputs of type Date, Uri, JsonObject, GcpProjectId and most other types by value.
  6. Pass task output that gets consumed as file. Limitation: Both upstream output and downstream input must NOT be typed String, Integer or Float. Cannot upload strings (this issue).

@chensun
Copy link
Member

chensun commented Jun 19, 2021

KFP v1

KFP has simple and easy data passing model that users like. There are different kinds of arguments and different ways to consume an input argument. All combinations working.

Any input can be consumed

  1. As value (inputValue placeholder)
  2. As file (inputPath placeholder).

And the argument for any input can come from

  1. Constant value
  2. Pipeline parameter
  3. Upstream component output

Combining these options there are 6 end-to-end data passing cases, each of which works regardless of types:

  1. Pass constant value which gets consumed as value.
  2. Pass constant value which gets consumed as file.
  3. Pass pipeline parameter that gets consumed as value.
  4. Pass pipeline parameter that gets consumed as file.
  5. Pass task output that gets consumed as value.
  6. Pass task output that gets consumed as file.

The only way types come into play is that when both upstream output and downstream input have types, the types must match.

Current KFP v2 compiler limitations:

The current version of the v2 compiler adds numerous data-passing limitations. Most limitations can easily be lifted (see #5478), but the compiler team wants to hear more user feedback.

  1. Pass constant value which gets consumed as value. Limitation: The input must be typed String, Integer or Float. Cannot pass constant values for inputs of type Date, Uri, JsonObject, GcpProjectId and all other types.
  2. Pass constant value which gets consumed as file. Not supported. Cannot pass small texts or JSON structures etc that get consumed as files.
  3. Pass pipeline parameter that gets consumed as value. Limitation: Both component input and pipeline parameter must be typed String, Integer or Float. Cannot pass pipeline parameters to inputs of type Date, Uri, JsonObject, GcpProjectId and all other types.
  4. Pass pipeline parameter that gets consumed as file. Not supported. Cannot pass small texts or JSON structures etc that get consumed as files.
  5. Pass task output that gets consumed as value. Limitation: Both upstream output and downstream input must be typed String, Integer or Float Cannot consume inputs of type Date, Uri, JsonObject, GcpProjectId and most other types by value.
  6. Pass task output that gets consumed as file. Limitation: Both upstream output and downstream input must NOT be typed String, Integer or Float. Cannot upload strings (this issue).

KFP v2 has a different data model from KFP v1. In KFP v2, we have the distinction between parameters and artifacts. They have different behaviors. For instance, parameters are stored in MLMD, and they can be queried, while for artifacts, we don't store the file content in MLMD, but artifacts can have additional metadata. I consider this as a feature improvement over the KFP v1 data model.

This doc describes which types are treated as parameters and which are treated as artifacts. We have a simple rule of how parameters vs. artifacts are decided. The decision should be deterministic, but not based on how a component is used. The proposed fix (#5478) you mentioned above would infer user's intention in specific scenarios. As we've discussed offline, it could lead to inconsistent behavior and the team decided to not move forward with it.

I believe most of the related issues user may have can simply be fixed by adding/updating types. And we need to update our samples and components to make them compatible as well (tracked by #5801).
We will also prioritize on adding a new doc that better describes the difference between parameters and artifacts. And we will include a detailed migration guide on how to fix issues like this one.

@chensun
Copy link
Member

chensun commented Jun 19, 2021

I'll take a look, cheers!

For reference, my version that uses a local component with this change does compile, and works as expected. I imagine you'd run into trouble if you wanted to upload an artifact rather than a string, but as most of my pipelines are being ported from v1 they don't use artifact types (at least not intentionally: a lot of previously untyped parameters started being handled that way).

Hi @jackwhelpton, as we spoke last time on this issue, I believe your use case is unblocked by adding/updating the type.
We will have a new doc with the detailed migration guide, and we will also update our components and samples to be compatible with v2 (#5801).

If there are no other blockers, do you think we can close this issue? Thanks!

@jackwhelpton
Copy link
Contributor Author

It seems like everything I'd wanted from this ticket is covered by #5801, so I'm fine with closing this one. Thanks!

@chensun
Copy link
Member

chensun commented Jun 24, 2021

Thanks Jack. Closing this issue.

@Bobgy
Copy link
Contributor

Bobgy commented Jul 26, 2021

/reopen
because we plan to support this in #6080

let's keep this open

@google-oss-robot
Copy link

@Bobgy: Reopened this issue.

In response to this:

/reopen
because we plan to support this in #6080

let's keep this open

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@chensun
Copy link
Member

chensun commented Jul 26, 2021

/reopen
because we plan to support this in #6080

let's keep this open

#6080 was meant to be temporary in the first place (I propose to revert it by KFP SDK 2.0 release).
The more we discussed the pros and cons of the change, the more we feel like it probably shouldn't be merged even for a short period.
Closing again per #6080 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants