fix(sdk): unblock valid topology. #8416

JOCSTAA · 2022-11-02T22:25:13Z

Description of your changes:
Enables the creation for more complex topologies of task group types that were previously blocked.

Sample topology that is now valid:

@dsl.pipeline
def my_pipeline(string: str = 'string'):
    with dsl.ParallelFor([1, 2, 3]):
        op1 = comp()
        with dsl.Condition(string == 'string'):
            grandchild = comp(inp=op1.output)

Checklist:

The title for your pull request (PR) should follow our title convention. Learn more about the pull request title convention used in this repository.

sdk/python/kfp/components/pipeline_task_test.py

sdk/python/kfp/components/pipeline_task.py

sdk/python/kfp/components/pipeline_task_test.py

sdk/python/kfp/components/pipeline_task.py

sdk/python/kfp/components/pipeline_task_test.py

connor-mccarthy

This PR is getting very close. Thanks, @JOCSTAA!

sdk/python/kfp/compiler/compiler_utils.py

connor-mccarthy · 2022-11-10T23:37:57Z

sdk/python/kfp/compiler/compiler_utils.py

+            # ParralelFor Check
+            for parent in task_name_to_parent_groups[upstream_task.name]:
+                parent = group_name_to_group.get(parent, None)
+                if isinstance(


I think we need to keep checking deeper into the DAG. Currently, this throws an exception for valid pipelines where the downstream task is not a child of the parent ParallelFor, but instead a grandchild. This should be permitted.

For example, I believe the following is valid:

@dsl.pipeline def my_pipeline(string: str = 'string'): with dsl.ParallelFor([1, 2, 3]): op1 = comp() with dsl.Condition(string == 'string'): grandchild = comp(inp=op1.output)

connor-mccarthy · 2022-11-10T23:57:25Z

sdk/python/kfp/compiler/compiler_utils.py

                raise RuntimeError(
-                    f'Task {task.name} cannot dependent on any task inside'
-                    f' the group: {upstream_groups[0]}.')
+                    f'Task {task.name} cannot dependent on any task inside a Exithandler that is not a common ancestor of both tasks'


I'm not sure this is true. For example, I believe the following pipeline is valid:

from kfp import compiler from kfp import dsl from kfp.dsl import component @component def identity(message: str) -> str: return message @component def fail_op(message: str): import sys print(message) sys.exit(1) @dsl.pipeline def my_pipeline(message: str = 'Hello World!'): exit_task = identity(message='Exit handler has worked!') with dsl.ExitHandler(exit_task): inner_task = identity(message=message) fail_op(message='Task failed.') task = identity(message=inner_task.output)

task depends on inner_task, which would be invalid according to the rule in this error.

For the purposes of this PR, I don't think we need to give ExitHandlers any special consideration. As far as topology is concerned, ExitHandlers can be thought of as no control flow, but with special behavior (an exit task executed) after some subset of tasks run.

Apologies -- just remembered we discussed offline and this preserves existing behavior. Thinking more and maybe it's okay to unblock this.

Hey Connor, so would it be okay to mark this as resolved, since no action is required?

Perhaps @chensun can offer a second opinion. @chensun, WDYT about task consuming from inner_task in the pipeline above?

(Sorry for my slow response)

What if inner_task failed? Suppose the exit task should handle it gracefully and allow continuing execution of the remaining tasks, then task would reference a non-exists output, isn't that just like a conditional case?

We can translate the above example to Python code like:

try: inner_task = identity(message=message) fail_op(message='Task failed.') except: pass finally: exit_task = identity(message='Exit handler has worked!') task = identity(message=inner_task.output)

Would this makes sense in the Python programming context? Understand Python doesn't block such usage, but it feels to me the last line of task = ... could, and probably should be, moved into the try block instead. WDTY?

Thanks for mapping this to normal Python -- that's very helpful.

I think the reason for not moving task into the try block is the same as for normal Python: it might catch exceptions unrelated to the one you're trying to capture. KFP's ExitHandler is analogous to a "bare" except in normal Python, so all the more reason to have the minimum required tasks in the ExitHandler.

My personal preference is to permit the above syntax. It valid from a control flow standpoint and I don't see much ambiguity for the author.

Either way, I think we should leave the behavior as it currently stands for this PR and address later if we want to :)

After some testing, I see that there's actually more to enabling this behavior than just unblocking it.

The reference to inner task in identity(message=inner_task.output) fails to compile, since we don't have logic to register the outputs from inner_task to the parent exit handler sub-DAG.

All the more reason to leave as is for this PR.

sdk/python/kfp/compiler/compiler_utils.py

JOCSTAA · 2022-11-11T19:37:35Z

/retest

JOCSTAA · 2022-11-11T20:36:35Z

/retest

JOCSTAA · 2022-11-11T23:52:22Z

/retest

JOCSTAA · 2022-11-14T18:45:36Z

/retest

JOCSTAA · 2022-11-17T20:54:34Z

/retest

connor-mccarthy

Thanks, @JOCSTAA! The test cases are very comprehensive and the logic is good. Just a few refactoring suggestions and nitpicks.

sdk/python/kfp/compiler/compiler_utils.py

connor-mccarthy · 2022-11-18T19:05:36Z

sdk/python/kfp/compiler/compiler_utils.py

+
+            # Condition check
+            dependent_group = group_name_to_group.get(upstream_groups[0], None)
+            if isinstance(dependent_group, tasks_group.Condition):


This check is good in current state of the codebase. Just want to note that I think it is coupled to our checks for ExitHandler and ParallelFor. I anticipate this check will need to be updated when we implement support for ParallelFor fan-in (on roadmap for this year).

The reason that this is coupled to the ParallelFor check is that the condition check only goes up one parent from the upstream task.

To demonstrate with examples...

The following pipeline causes the Condition exception to be raised:

@dsl.pipeline def my_pipeline(string: str = 'string'): with dsl.Condition(string == 'x'): op1 = producer(string=string) op2 = consumer(string=op1.output)

because upstream_groups == ['condition-1', 'producer'] and upstream_groups[0] causes a Condition exception is raised.

Furthermore, the following pipeline is also currently causes an exception to be raised, but it's the ParallelFor exception:

@dsl.pipeline def my_pipeline(string: str = 'string'): with dsl.ParallelFor([1]): with dsl.Condition(string == 'x'): op1 = producer(string=string) op2 = consumer(string=op1.output)

because upstream_groups == ['for-loop-2', 'condition-3', 'producer'] and upstream_groups[0] causes a ParallelFor exception is raised.

Say the ParallelFor check were removed to support fan-in. What would happen in the second example?

upstream_groups[0] would still be a ParallelFor group and a condition exception would not be raised, despite this being an invalid topology.

To handle this, I think we should either:
(a) Add a comment and tests to make this clear for future developers.

(b) Add a more robust Condition check that traverses upstream_groups and performs the same check on each element.

connor-mccarthy · 2022-11-18T19:29:41Z

sdk/python/kfp/compiler/compiler_utils.py

@@ -427,14 +427,43 @@ def get_dependencies(
                task2=task,
            )

-            # a task cannot depend on a task created in a for loop group since individual PipelineTask variables are reassigned after each loop iteration
+            # ParralelFor Check
+            for parent in task_name_to_parent_groups[upstream_task.name]:


I think this logic is mostly implemented by _get_uncommon_ancestors already (except for the case of nested ParallelFor). upstream_groups contains the result -- it's all the ancestors of the upstream task that are not common to the ancestors of the downstream task. I think this is what we're looking for in order to assert that the topology is valid (the upstream cannot be nested under a sub-DAG that the downstream is not nested under, except for in the case of nested ParallelFor).

This means that we can know if the upstream is nested in a ParallelFor, Condition, or ExitHandler just by looking at the value of upstream_groups:

import copy uncommon_upstream_groups = copy.deepcopy(upstream_groups) uncommon_upstream_groups.remove(upstream_task.name) # because a task's `upstream_groups` contains the task's name if uncommon_upstream_groups: raise ...

As far as I can tell nested pipelines are the only case that this would not cover, which would require some special handling:

@dsl.pipeline def my_pipeline(string: str = 'string'): with dsl.ParallelFor([1, 2]): one = producer(string='text') with dsl.ParallelFor([1, 2]): two = consumer(string=one.output)

sdk/python/kfp/compiler/compiler_test.py

connor-mccarthy

Nice work on this, @JOCSTAA!

sdk/python/kfp/compiler/compiler_test.py

sdk/python/kfp/compiler/compiler_utils.py

sdk/python/kfp/components/pipeline_task_test.py

sdk/python/kfp/compiler/compiler_utils.py

connor-mccarthy

Final nit, now that we know we're going to keep this final check.

sdk/python/kfp/compiler/compiler_utils.py

connor-mccarthy

Excited to merge this one! Thanks for this, @JOCSTAA. This unlocks new pipeline functionality for users.

Can you add a PR description and a release note (./sdk/RELEASE.md), describing what sort of functionality this unlocks? Code snippets could be helpful for the description. Then, let's merge!

connor-mccarthy · 2022-12-01T16:00:58Z

sdk/python/kfp/compiler/compiler_utils.py

+
+                else:
+                    task_group_type = 'a ' + tasks_group.ParallelFor.__name__
+
                raise RuntimeError(


No action required / not your code: I want to note here that I'm not sure if this should use a RuntimeError. This is a runtime error in the sense that it results an ambiguous runtime topology, but it's not a true "runtime" error at the time it's raised.

This file used RuntimeError before this PR and, since this PR actually reduces the set of topologies for which this error would be raised, we don't necessarily need to reconsider this in this PR.

Furthermore, an Exception is usually used when the error is attributed to user code, whereas an Error is usually used when the error is attributed to something else, such as an environment. In this case, this is user code.

For this reason, I think it would make sense for this to be a custom InvalidTopologyException or something similar.

Relatedly, some of the ValueErrors from pipeline_task.py now become RuntimeErrors in this PR, so perhaps that is a reason to consider this in the short term.

connor-mccarthy · 2022-12-01T22:35:25Z

/lgtm

Thank you, @JOCSTAA!

google-oss-prow · 2022-12-02T00:15:09Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chensun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~sdk/OWNERS~~ [chensun]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

* unblock valid topology * add more tests * handle git fork * sample_test_cases * main * restore to master * resolve comments on PR * resolve conflicts * resolve conflict 2 * revert conflict fix * fix changes * address comments * review * docformatter presubmit * revert docformatter * update release.md

unblock valid topology

693b583

google-oss-prow bot requested review from connor-mccarthy and zijianjoy November 2, 2022 22:25

google-oss-prow bot added the size/M label Nov 2, 2022

connor-mccarthy requested changes Nov 2, 2022

View reviewed changes

sdk/python/kfp/components/pipeline_task_test.py Outdated Show resolved Hide resolved

sdk/python/kfp/components/pipeline_task.py Outdated Show resolved Hide resolved

add more tests

02ba99a

JOCSTAA requested a review from connor-mccarthy November 2, 2022 23:46

JOCSTAA changed the title ~~fix(sdk): unblock valid topology~~ fix(sdk): unblock valid topology. Fixes #255345076 Nov 2, 2022

JOCSTAA changed the title ~~fix(sdk): unblock valid topology. Fixes #255345076~~ fix(sdk): unblock valid topology. Nov 2, 2022

chensun requested changes Nov 3, 2022

View reviewed changes

sdk/python/kfp/components/pipeline_task.py Outdated Show resolved Hide resolved

connor-mccarthy requested changes Nov 3, 2022

View reviewed changes

sdk/python/kfp/components/pipeline_task_test.py Outdated Show resolved Hide resolved

sdk/python/kfp/components/pipeline_task.py Outdated Show resolved Hide resolved

sdk/python/kfp/components/pipeline_task_test.py Outdated Show resolved Hide resolved

chensun assigned connor-mccarthy Nov 3, 2022

JOCSTAA added 4 commits November 7, 2022 14:17

handle git fork

a23f9ae

sample_test_cases

f73e94d

main

8b3b633

restore to master

472bc02

google-oss-prow bot added size/XS and removed size/M labels Nov 10, 2022

test driven

07e74fd

google-oss-prow bot added size/L and removed size/XS labels Nov 10, 2022

JOCSTAA requested review from chensun and connor-mccarthy November 10, 2022 21:00

connor-mccarthy requested changes Nov 11, 2022

View reviewed changes

resolve comments on PR

4d8b6b4

JOCSTAA requested a review from connor-mccarthy November 11, 2022 19:34

google-oss-prow bot added size/XL and removed size/L labels Nov 15, 2022

JOCSTAA added 2 commits November 15, 2022 10:30

resolve conflict 2

2eb19cf

revert conflict fix

867bc86

google-oss-prow bot added size/L and removed size/XL labels Nov 15, 2022

JOCSTAA added 2 commits November 17, 2022 10:51

fix merge conflict

60c167a

fix changes

9593203

connor-mccarthy requested changes Nov 18, 2022

View reviewed changes

address comments

f01e25e

JOCSTAA requested a review from connor-mccarthy November 29, 2022 19:47

connor-mccarthy requested changes Nov 30, 2022

View reviewed changes

sdk/python/kfp/compiler/compiler_utils.py Outdated Show resolved Hide resolved

review

0a0d15b

google-oss-prow bot added size/XL and removed size/L labels Nov 30, 2022

JOCSTAA requested a review from connor-mccarthy November 30, 2022 19:05

connor-mccarthy reviewed Dec 1, 2022

View reviewed changes

JOCSTAA added 3 commits December 1, 2022 12:03

docformatter presubmit

ecbd172

revert docformatter

6ea91c0

update release.md

3593036

JOCSTAA requested a review from connor-mccarthy December 1, 2022 21:50

google-oss-prow bot added the lgtm label Dec 1, 2022

chensun approved these changes Dec 2, 2022

View reviewed changes

google-oss-prow bot added the approved label Dec 2, 2022

google-oss-prow bot merged commit 614c231 into kubeflow:master Dec 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sdk): unblock valid topology. #8416

fix(sdk): unblock valid topology. #8416

JOCSTAA commented Nov 2, 2022 •

edited

Loading

connor-mccarthy left a comment

connor-mccarthy Nov 10, 2022

connor-mccarthy Nov 10, 2022

connor-mccarthy Nov 11, 2022 •

edited

Loading

JOCSTAA Nov 11, 2022

connor-mccarthy Nov 11, 2022

chensun Nov 17, 2022 •

edited

Loading

connor-mccarthy Nov 17, 2022

connor-mccarthy Nov 18, 2022

connor-mccarthy Nov 18, 2022

JOCSTAA commented Nov 11, 2022

JOCSTAA commented Nov 11, 2022

JOCSTAA commented Nov 11, 2022

JOCSTAA commented Nov 14, 2022

JOCSTAA commented Nov 17, 2022

connor-mccarthy left a comment

connor-mccarthy Nov 18, 2022

connor-mccarthy Nov 18, 2022

connor-mccarthy left a comment

connor-mccarthy left a comment

connor-mccarthy left a comment

connor-mccarthy Dec 1, 2022

connor-mccarthy commented Dec 1, 2022

google-oss-prow bot commented Dec 2, 2022

fix(sdk): unblock valid topology. #8416

fix(sdk): unblock valid topology. #8416

Conversation

JOCSTAA commented Nov 2, 2022 • edited Loading

connor-mccarthy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

connor-mccarthy Nov 11, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chensun Nov 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JOCSTAA commented Nov 11, 2022

JOCSTAA commented Nov 11, 2022

JOCSTAA commented Nov 11, 2022

JOCSTAA commented Nov 14, 2022

JOCSTAA commented Nov 17, 2022

connor-mccarthy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

connor-mccarthy left a comment

Choose a reason for hiding this comment

connor-mccarthy left a comment

Choose a reason for hiding this comment

connor-mccarthy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

connor-mccarthy commented Dec 1, 2022

google-oss-prow bot commented Dec 2, 2022

JOCSTAA commented Nov 2, 2022 •

edited

Loading

connor-mccarthy Nov 11, 2022 •

edited

Loading

chensun Nov 17, 2022 •

edited

Loading