Allow tasks to provide input for other task variables #20

JeanMertz · 2019-06-18T11:38:27Z

see #20 (comment) for the most up to date design decisions

Solve the general problem of "this pipeline requires me to provide a variable value, that I don't have yet" by allowing pipelines to "expose" their capability to provide output that matches the required input of a different pipeline.

Example

Let's take this example:

Say I want to use the "List Feature Flags" pipeline (second one), and it requires me to provide a customer UUID. However, I currently only have a customer email at my disposal.

Let's imagine there to also be a "Find Customer UUID" pipeline that accepts either an email address or a customer name, and returns the matching UUID.

I can now:

open the Find Customer UUID pipeline (this is assuming I know this pipeline exists),
enter the customer's email address,
run the pipeline,
copy (Add "copy to clipboard" button for job results #17) the returned UUID output,
close this pipeline,
open the List Feature Flags pipeline,
paste the copied UUID,
and run the pipeline.

Proposed Solution

It would be great if we can programmatically proof that there is a relationship between these two pipelines, by adding some extra metadata to the output of pipelines, which can be linked to the input needed by other pipelines.

One possible solution I'm thinking of:

When creating a pipeline, you can optionally name the output of each step in that pipeline.
The output of a pipeline equals the output of the last step in a pipeline, and so the output name of a pipeline equals the one used in the last step
When opening a different pipeline, and it asks for a variable named Customer UUID, the system will search for one (or more?) pipelines for which the output name is set to that same name.
It will then suggest to you this other pipeline
What happens then in terms of UI, moving back and forth between pipelines, I'm not sure about.

Here's an example of how such a help message could be displayed. In this case, the field is for the Customer UUID variable in the "List Feature Flags" pipeline, and there is another pipeline called "Find Customer UUID" which has its output name set to Customer UUID:

side note on the second point in the proposed solution:

We can optionally simplify this part by defining the output on the pipeline itself, instead of the steps and then picking the last step in a pipeline. However, this is error prone, because step ordering might change, or new steps might be added, at which point the output might no longer match.

I think it's safer to set the output names on the steps themselves. This also provides more composability in the future, when we work on a UI for creating new pipelines, using step templates, for example.

Design Goals

The above proposal is one possible solution, but there are more. In general, I'd like to try and find a solution that matches these goals:

composable – using existing features (or features already proposed elsewhere), little to no new concepts/data structures required
simple – fits well in the UI, is not confusing to use, should be self-explaining
generic – should solve the overarching problem of "I need to provide a value that I don't know yet"
safe – no automated pipeline runs without explicitly triggering them

The text was updated successfully, but these errors were encountered:

JeanMertz · 2019-07-13T22:28:21Z

Given this:

When creating a pipeline, you can optionally name the output of each step in that pipeline.

Perhaps this can be combined with #22:

[...] allowing steps to define an output attribute, which determines to which temporary variable the output is assigned.

So, a step would define its output name as Customer UUID, and within the task that the step belongs to, you can reference that output in the template (#23) as {{ var['Customer UUID'] }} (or some other namespace, such as {{ output['Customer UUID'] }}?), and if that step is the last output of the task, when opening other tasks that have the input variable named Customer UUID, a link will be provided to the other task that has this variable as its output.

I'm not 100% sold on the idea yet (there's maybe a bit too much coupling between the features, which goes beyond the desired composability), but it seems possible to do this...

JeanMertz · 2019-07-13T22:46:51Z

One thing that could make this work is:

A step output variable cannot exist already within the same task (neither as a task variable nor another similarly named step output variable).
This makes sense, because if Task A accepts Variable 1 as input, then it doesn’t make any sense to have a step in that task output the same variable name (that would mean the step didn’t add any “value” to the task result).
By adding this constraint, it becomes impossible to have duplicate variable names, giving more clarity to what’s going on, and allowing us to use the {{ var['Variable 1'] }} notation.
This constrained is checked when the task is created.

JeanMertz · 2019-07-14T08:01:12Z

After thinking on this some more, I no longer think it makes sense to combine #22 and this issue.

I've written the new design thoughts in #22 for that issue, and will write the ones for this issue below.

Here's the current design I've come up with:

We introduce a new step_output_advertisements table (to be bike-sheded, variable_value_advertisements?).
This has the data (id, step_id, variable_name) (variable_name to be bike-sheded, key?).
It has a uniqueness constraint on (step_id, variable_name).
Whenever a task requires input variable X, it will look in this new table to see if there are any advertised variable names matching X.
If there are, the UI shows a list(dropdown?) of task names providing the relevant variable values somewhat similar to the UI mock-up above (although how this works in the UI is still undecided).

There are several advantages I can see by using this design:

It solves my main concern of merging Allow assigning step output to temporary variables #22 and this feature ("there's maybe a bit too much coupling between the features, which goes beyond the desired composability").
It composes nicely with the current data model of tasks with multiple steps.
It prevents data consistency invariants which cannot be enforced on the data storage level (i.e. removing a step with advertised output does a cascade delete to remove that advertised output).
It allows a single task to provide multiple advertised output values. For example if a task has a step that fetches the Customer UUID and it also fetches the Customer First Name in another step, that task can now be used to provide programmatic access to both those variable values.
There is no coupling between internal variables names (Allow assigning step output to temporary variables #22) used within a task, and any named advertised output. You cannot break one, by changing the other.
You are no longer required to use the final output of a task (so the last step in a task) as the advertised output.

This point is significant, because of this (very common) pattern:
You have a task that finds a customer UUID based on a provided email address. You add some "pretty printing" to the output of the task. In this case, your task could look like this:
- Step 1 reads the provided {{ var['Customer Email'] }} variable value, and uses the SQL Query processor to fetch the relevant customer UUID.
- Step 2 uses the Print Output processor to print the final output of the task as Customer UUID: {{ var['Customer UUID'] }}\n\nRemember to [use this customer information responsibly][link to data protection document], thank you..
.
The output of Step 1 is interesting to other tasks that require a customer UUID as their input variable value. The output of Step 2 is useless for programmatic access, but meant for humans to read.

By allowing Step 1's output to be advertised, we can provide both programmatic access to the relevant data, and make the final output of this task more significant for the person running the task.
A single variable name can be provided by multiple steps (in different tasks). For example, the variable Customer UUID could be provided by a task named Find Customer UUID by Email and one named Find Customer UUID by Username, allowing the person needing this value to decide which one to use, based on the data available to them.

On this point, a design consideration is to show both the task name and description in the dropdown, because both of these tasks might be named "Find Customer UUID", but their description might mention "Searches for a customer UUID based on their {email address,username}"

This is an initial implementation of the concept of "variable advertisements". Check out #20 for a full description on what this enables, but the TL;DR is this: If you have a task `A` that requires a variable `X`, and a task `B` which has a step defined that returns variable `X`, you can now "advertise" that task `B` can provide a value for any other task that requires variable `X` as its input. Once implemented, this allows the client to show a message when someone wants to run task `A`, but they don't know the value of `X`, telling them that they might be able to get that value by running task `B`. A more advanced client-side implementation could automate this process, by allowing someone to run task `B`, take the output of `X` programatically, and use that value as the input of `X` for task `A`.

When a task advertises its capability to provide a value for a variable, any other task requiring that variable value as its input will add a link to the advertising task. If more than one task advertises this capability for the same variable name, a generated dropdown shows each task's title, and their description, allowing the person that needs the value for a variable to make an educated choice which task to choose for their value. As an example, if a task requires the UUID of a customer, one task might be able to provide this UUID by searching for an email address, and another might give the UUID based on a first and last name. The person needing the customer UUID can them choose between those two advertisers, based on the customer data they have at hand. This is a first step towards solving #20. For this first implementation, there is no other automation, other than the automated links added to the relevant advertising tasks.

JeanMertz · 2019-07-16T17:29:37Z

A first step towards this has been implemented in the web client in 7a92f9f.

A "direct link" will be added to another task, if any step within that task advertises itself as a value provider for a variable in the active task:

If more than one task advertises the value, a drop-down menu is shown instead:

Clicking on the link will open the other task. Other than that, no other integration/automation is added yet. You still need to copy the output of the other task, and move back to the previous task to continue.

(this issue will remain open until the whole loop is implemented and there is as little friction as possible)

When a task advertises its capability to provide a value for a variable, any other task requiring that variable value as its input will add a link to the advertising task. If more than one task advertises this capability for the same variable name, a generated dropdown shows each task's title, and their description, allowing the person that needs the value for a variable to make an educated choice which task to choose for their value. As an example, if a task requires the UUID of a customer, one task might be able to provide this UUID by searching for an email address, and another might give the UUID based on a first and last name. The person needing the customer UUID can them choose between those two advertisers, based on the customer data they have at hand. This is a first step towards solving #20. For this first implementation, there is no other automation, other than the automated links added to the relevant advertising tasks.

JeanMertz · 2019-07-16T20:22:24Z

Clicking on the link will open the other task. Other than that, no other integration/automation is added yet. You still need to copy the output of the other task, and move back to the previous task to continue.

This last part is simplified as of fd2db24.

When you navigate to another task from the current task, clicking the "Back" button now moves you back to the first task. This works for infinitely nested tasks. Once you are back at the original task, clicking "Back" returns you to the home page.

JeanMertz · 2019-07-20T09:26:47Z

There's another thing to consider here.

Say for example I want to find the Customer UUID based on some value that might return multiple results (such as seaching by Family Name).

In this case, if Task A provides this functionality, it can provide you with a value for the Customer UUID in Task B, but it can return multiple results, and will not always return a single result.

In these cases it can still be useful to be linked to task A, but there can't be any automation (#33), you'll have to use your own judgement which result to pick (and copy/paste that result over).

We might potentially want to enhance the current variable advertisement functionality introduced in 3d45e61 to allow an advertiser to mark the advertisement as something that can be used automatically (so a single value), or something that has to be picked manually (a set of values, or a value that doesn't match exactly what you'd need for this variable, but can be copy/pasted or inferred from it).

JeanMertz mentioned this issue Jul 14, 2019

Programmatically run tasks #33

Open

JeanMertz mentioned this issue Jul 16, 2019

Support multiple active tasks #34

Closed

JeanMertz changed the title ~~Allow pipelines to provide input for other pipeline variables~~ Allow tasks to provide input for other task variables Jul 24, 2019

JeanMertz mentioned this issue Aug 25, 2019

Rename "Back" to "Home" when applicable #56

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow tasks to provide input for other task variables #20

Allow tasks to provide input for other task variables #20

JeanMertz commented Jun 18, 2019 •

edited

Loading

JeanMertz commented Jul 13, 2019 •

edited

Loading

JeanMertz commented Jul 13, 2019

JeanMertz commented Jul 14, 2019

JeanMertz commented Jul 16, 2019 •

edited

Loading

JeanMertz commented Jul 16, 2019

JeanMertz commented Jul 20, 2019

Allow tasks to provide input for other task variables #20

Allow tasks to provide input for other task variables #20

Comments

JeanMertz commented Jun 18, 2019 • edited Loading

Example

Proposed Solution

Design Goals

JeanMertz commented Jul 13, 2019 • edited Loading

JeanMertz commented Jul 13, 2019

JeanMertz commented Jul 14, 2019

JeanMertz commented Jul 16, 2019 • edited Loading

JeanMertz commented Jul 16, 2019

JeanMertz commented Jul 20, 2019

JeanMertz commented Jun 18, 2019 •

edited

Loading

JeanMertz commented Jul 13, 2019 •

edited

Loading

JeanMertz commented Jul 16, 2019 •

edited

Loading