-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(sdk): support collecting outputs from conditional branches using dsl.OneOf
#10067
Conversation
35df06d
to
e2ecf3e
Compare
e2ecf3e
to
dddfbac
Compare
/assign @chensun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Only a few nitpicks and questions, nothing blocking.
Great work!
if not isinstance(channel, expected_type): | ||
raise TypeError( | ||
f'Task outputs passed to dsl.{OneOf.__name__} must be the same type. Got two channels with different types: {readable_name_map[expected_type]} at index 0 and {readable_name_map[type(channel)]} at index {i}.' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this function reads a bit complex, while the essential check is all channels are of the same type. Is the complexity to allow specific error message? Not sure if user would understand what index # refers to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simplified the logic a bit and the error message. Please feel free to re-open if there are more changes you suggest.
) -> None: | ||
# avoid circular imports | ||
from kfp.dsl import tasks_group | ||
if not isinstance(parent_group.groups[-1], tasks_group.Else): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean users have to have an ending dsl.Else
?
So the following is illegal?
dsl.If(a == True):
f = foo()
dsl.Elif(a == False):
b = bar()
oneof = dsl.OneOf(f.output, b.output)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly.
For posterity, I'll leave a few notes to document my thinking:
- This is aligned with the current treatment of
value_from_oneof
field inPipelineSpec
on Vertex Pipelines, which is a backend that supports such collection. - If we want different behavior... We can, of course, prescribe new behavior for any backend and drop this validation in the KFP SDK. Since the current implementation is forward compatible with looser validation, my preference is to proceed with the strong validation and listen for user demands to loosen. This helps us get this feature in the hands of users faster with effectively no added cost for users or maintainers.
- If we like this behavior, but want to permit more flexibility... It's technically possible that a user might want to use only
dsl.If
anddsl.Elif
and ensure the conditions are mutually exclusive themselves (nodsl.Else
). Since this comes with the cost of shifting errors right (to runtime, instead of compilation time as in the current implementation) and because this seems like it wouldn't be the typical case, my preference is again to start conservatively with the current approach.
# the combination of OneOf and an f-string is not common, so prefer | ||
# deferring implementation | ||
raise NotImplementedError( | ||
f'dsl.{OneOf.__name__} is not yet supported in f-strings.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Technically f-string
is just one special syntax for string concatenation, while this limitation would apply to all formats, for example, "{} world".format(str1)
, "Hello" + str1
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. Updated.
# isinstance(<some channel>, PipelineParameterChannel/PipelineArtifactChannel) | ||
# checks continue to behave in desirable ways | ||
class OneOfParameter(PipelineParameterChannel, OneOfMixin): | ||
"""OneOf that results in an parameter channel for all downstream tasks.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: a parameter channel
sdk/python/kfp/dsl/tasks_group.py
Outdated
@@ -150,6 +153,16 @@ def __init__(self) -> None: | |||
is_root=False, | |||
) | |||
|
|||
def _get_oneof_id(self) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this seems to be an public method, so maybe no underscore prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, @chensun. Addressed your comments.
# the combination of OneOf and an f-string is not common, so prefer | ||
# deferring implementation | ||
raise NotImplementedError( | ||
f'dsl.{OneOf.__name__} is not yet supported in f-strings.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. Updated.
) -> None: | ||
# avoid circular imports | ||
from kfp.dsl import tasks_group | ||
if not isinstance(parent_group.groups[-1], tasks_group.Else): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly.
For posterity, I'll leave a few notes to document my thinking:
- This is aligned with the current treatment of
value_from_oneof
field inPipelineSpec
on Vertex Pipelines, which is a backend that supports such collection. - If we want different behavior... We can, of course, prescribe new behavior for any backend and drop this validation in the KFP SDK. Since the current implementation is forward compatible with looser validation, my preference is to proceed with the strong validation and listen for user demands to loosen. This helps us get this feature in the hands of users faster with effectively no added cost for users or maintainers.
- If we like this behavior, but want to permit more flexibility... It's technically possible that a user might want to use only
dsl.If
anddsl.Elif
and ensure the conditions are mutually exclusive themselves (nodsl.Else
). Since this comes with the cost of shifting errors right (to runtime, instead of compilation time as in the current implementation) and because this seems like it wouldn't be the typical case, my preference is again to start conservatively with the current approach.
if not isinstance(channel, expected_type): | ||
raise TypeError( | ||
f'Task outputs passed to dsl.{OneOf.__name__} must be the same type. Got two channels with different types: {readable_name_map[expected_type]} at index 0 and {readable_name_map[type(channel)]} at index {i}.' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simplified the logic a bit and the error message. Please feel free to re-open if there are more changes you suggest.
sdk/python/kfp/dsl/tasks_group.py
Outdated
@@ -150,6 +153,16 @@ def __init__(self) -> None: | |||
is_root=False, | |||
) | |||
|
|||
def _get_oneof_id(self) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
Thanks!
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chensun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
… `dsl.OneOf` (kubeflow#10067) * support dsl.OneOf * address review feedback * address review feedback
Description of your changes:
Supports collecting outputs from conditional branches in a pipeline using
dsl.OneOf
. For example:There are three future work items to consider following this PR:
dsl.Collected
implementation to thedsl.OneOf
-style implementation. The abstraction for thedsl.OneOf
is much cleaner and permits more implementation alignment and code reuse betweendsl.OneOf
anddsl.Collected
. I left TODOs for this, opting not to implement in this CL to simplify the already large diff.dsl.Collected
intodsl.OneOf
. This composability increases the complexity of the compiler code + authorable pipelines considerably (high implementation + maintenance cost). We should weigh this against the suspected benefit. Leaving this note here to document the choice and the likely dependency on 1.dsl.OneOf
anddsl.Collected
in f-strings.dsl.Collected
support in f-strings follows naturally from 1 above.dsl.OneOf
support in f-strings requires a new way of injecting a channel into a user string in the compiler code such that all the member channels of adsl.OneOf
are not lost. The approach which I suspect will be most successful (included in a code comment) is to maintain a pipeline-level map of unique key to pipeline channel, then injecting/extracting that key from the user string, then looking up in the map. Similar level of effort consideration to 2 above.Checklist:
repository.