Skip to content
This repository has been archived by the owner on Dec 7, 2023. It is now read-only.

Allow tasks to provide input for other task variables #20

Open
JeanMertz opened this issue Jun 18, 2019 · 6 comments
Open

Allow tasks to provide input for other task variables #20

JeanMertz opened this issue Jun 18, 2019 · 6 comments

Comments

@JeanMertz
Copy link
Contributor

JeanMertz commented Jun 18, 2019

see #20 (comment) for the most up to date design decisions


Solve the general problem of "this pipeline requires me to provide a variable value, that I don't have yet" by allowing pipelines to "expose" their capability to provide output that matches the required input of a different pipeline.

Example

Let's take this example:

Screenshot 2019-06-08 at 16 12 39

Say I want to use the "List Feature Flags" pipeline (second one), and it requires me to provide a customer UUID. However, I currently only have a customer email at my disposal.

Let's imagine there to also be a "Find Customer UUID" pipeline that accepts either an email address or a customer name, and returns the matching UUID.

I can now:

  1. open the Find Customer UUID pipeline (this is assuming I know this pipeline exists),
  2. enter the customer's email address,
  3. run the pipeline,
  4. copy (Add "copy to clipboard" button for job results #17) the returned UUID output,
  5. close this pipeline,
  6. open the List Feature Flags pipeline,
  7. paste the copied UUID,
  8. and run the pipeline.

Proposed Solution

It would be great if we can programmatically proof that there is a relationship between these two pipelines, by adding some extra metadata to the output of pipelines, which can be linked to the input needed by other pipelines.

One possible solution I'm thinking of:

  • When creating a pipeline, you can optionally name the output of each step in that pipeline.
  • The output of a pipeline equals the output of the last step in a pipeline, and so the output name of a pipeline equals the one used in the last step
  • When opening a different pipeline, and it asks for a variable named Customer UUID, the system will search for one (or more?) pipelines for which the output name is set to that same name.
  • It will then suggest to you this other pipeline
  • What happens then in terms of UI, moving back and forth between pipelines, I'm not sure about.

Here's an example of how such a help message could be displayed. In this case, the field is for the Customer UUID variable in the "List Feature Flags" pipeline, and there is another pipeline called "Find Customer UUID" which has its output name set to Customer UUID:

Screenshot 2019-06-18 at 13 35 26

side note on the second point in the proposed solution:

We can optionally simplify this part by defining the output on the pipeline itself, instead of the steps and then picking the last step in a pipeline. However, this is error prone, because step ordering might change, or new steps might be added, at which point the output might no longer match.

I think it's safer to set the output names on the steps themselves. This also provides more composability in the future, when we work on a UI for creating new pipelines, using step templates, for example.

Design Goals

The above proposal is one possible solution, but there are more. In general, I'd like to try and find a solution that matches these goals:

  • composable – using existing features (or features already proposed elsewhere), little to no new concepts/data structures required
  • simple – fits well in the UI, is not confusing to use, should be self-explaining
  • generic – should solve the overarching problem of "I need to provide a value that I don't know yet"
  • safe – no automated pipeline runs without explicitly triggering them
@JeanMertz
Copy link
Contributor Author

JeanMertz commented Jul 13, 2019

Given this:

When creating a pipeline, you can optionally name the output of each step in that pipeline.

Perhaps this can be combined with #22:

[...] allowing steps to define an output attribute, which determines to which temporary variable the output is assigned.

So, a step would define its output name as Customer UUID, and within the task that the step belongs to, you can reference that output in the template (#23) as {{ var['Customer UUID'] }} (or some other namespace, such as {{ output['Customer UUID'] }}?), and if that step is the last output of the task, when opening other tasks that have the input variable named Customer UUID, a link will be provided to the other task that has this variable as its output.

I'm not 100% sold on the idea yet (there's maybe a bit too much coupling between the features, which goes beyond the desired composability), but it seems possible to do this...

@JeanMertz
Copy link
Contributor Author

One thing that could make this work is:

  • A step output variable cannot exist already within the same task (neither as a task variable nor another similarly named step output variable).
  • This makes sense, because if Task A accepts Variable 1 as input, then it doesn’t make any sense to have a step in that task output the same variable name (that would mean the step didn’t add any “value” to the task result).
  • By adding this constraint, it becomes impossible to have duplicate variable names, giving more clarity to what’s going on, and allowing us to use the {{ var['Variable 1'] }} notation.
  • This constrained is checked when the task is created.

@JeanMertz
Copy link
Contributor Author

After thinking on this some more, I no longer think it makes sense to combine #22 and this issue.

I've written the new design thoughts in #22 for that issue, and will write the ones for this issue below.


Here's the current design I've come up with:

  • We introduce a new step_output_advertisements table (to be bike-sheded, variable_value_advertisements?).
  • This has the data (id, step_id, variable_name) (variable_name to be bike-sheded, key?).
  • It has a uniqueness constraint on (step_id, variable_name).
  • Whenever a task requires input variable X, it will look in this new table to see if there are any advertised variable names matching X.
  • If there are, the UI shows a list(dropdown?) of task names providing the relevant variable values somewhat similar to the UI mock-up above (although how this works in the UI is still undecided).

There are several advantages I can see by using this design:

  • It solves my main concern of merging Allow assigning step output to temporary variables #22 and this feature ("there's maybe a bit too much coupling between the features, which goes beyond the desired composability").

  • It composes nicely with the current data model of tasks with multiple steps.

  • It prevents data consistency invariants which cannot be enforced on the data storage level (i.e. removing a step with advertised output does a cascade delete to remove that advertised output).

  • It allows a single task to provide multiple advertised output values. For example if a task has a step that fetches the Customer UUID and it also fetches the Customer First Name in another step, that task can now be used to provide programmatic access to both those variable values.

  • There is no coupling between internal variables names (Allow assigning step output to temporary variables #22) used within a task, and any named advertised output. You cannot break one, by changing the other.

  • You are no longer required to use the final output of a task (so the last step in a task) as the advertised output.

    This point is significant, because of this (very common) pattern:

    You have a task that finds a customer UUID based on a provided email address. You add some "pretty printing" to the output of the task. In this case, your task could look like this:

    • Step 1 reads the provided {{ var['Customer Email'] }} variable value, and uses the SQL Query processor to fetch the relevant customer UUID.
    • Step 2 uses the Print Output processor to print the final output of the task as Customer UUID: {{ var['Customer UUID'] }}\n\nRemember to [use this customer information responsibly][link to data protection document], thank you..

    .
    The output of Step 1 is interesting to other tasks that require a customer UUID as their input variable value. The output of Step 2 is useless for programmatic access, but meant for humans to read.

    By allowing Step 1's output to be advertised, we can provide both programmatic access to the relevant data, and make the final output of this task more significant for the person running the task.

  • A single variable name can be provided by multiple steps (in different tasks). For example, the variable Customer UUID could be provided by a task named Find Customer UUID by Email and one named Find Customer UUID by Username, allowing the person needing this value to decide which one to use, based on the data available to them.

    On this point, a design consideration is to show both the task name and description in the dropdown, because both of these tasks might be named "Find Customer UUID", but their description might mention "Searches for a customer UUID based on their {email address,username}"

JeanMertz added a commit that referenced this issue Jul 14, 2019
This is an initial implementation of the concept of "variable
advertisements".

Check out #20 for a full
description on what this enables, but the TL;DR is this:

If you have a task `A` that requires a variable `X`, and a task `B`
which has a step defined that returns variable `X`, you can now
"advertise" that task `B` can provide a value for any other task that
requires variable `X` as its input.

Once implemented, this allows the client to show a message when someone
wants to run task `A`, but they don't know the value of `X`, telling
them that they might be able to get that value by running task `B`.

A more advanced client-side implementation could automate this process,
by allowing someone to run task `B`, take the output of `X`
programatically, and use that value as the input of `X` for task `A`.
JeanMertz added a commit that referenced this issue Jul 16, 2019
When a task advertises its capability to provide a value for a variable,
any other task requiring that variable value as its input will add a
link to the advertising task.

If more than one task advertises this capability for the same variable
name, a generated dropdown shows each task's title, and their
description, allowing the person that needs the value for a variable to
make an educated choice which task to choose for their value.

As an example, if a task requires the UUID of a customer, one task might
be able to provide this UUID by searching for an email address, and
another might give the UUID based on a first and last name. The person
needing the customer UUID can them choose between those two advertisers,
based on the customer data they have at hand.

This is a first step towards solving
#20.

For this first implementation, there is no other automation, other than
the automated links added to the relevant advertising tasks.
@JeanMertz
Copy link
Contributor Author

JeanMertz commented Jul 16, 2019

A first step towards this has been implemented in the web client in 7a92f9f.

A "direct link" will be added to another task, if any step within that task advertises itself as a value provider for a variable in the active task:

Screenshot 2019-07-16 at 19 24 37

If more than one task advertises the value, a drop-down menu is shown instead:

Screenshot 2019-07-16 at 19 25 15

Clicking on the link will open the other task. Other than that, no other integration/automation is added yet. You still need to copy the output of the other task, and move back to the previous task to continue.

(this issue will remain open until the whole loop is implemented and there is as little friction as possible)

JeanMertz added a commit that referenced this issue Jul 16, 2019
When a task advertises its capability to provide a value for a variable,
any other task requiring that variable value as its input will add a
link to the advertising task.

If more than one task advertises this capability for the same variable
name, a generated dropdown shows each task's title, and their
description, allowing the person that needs the value for a variable to
make an educated choice which task to choose for their value.

As an example, if a task requires the UUID of a customer, one task might
be able to provide this UUID by searching for an email address, and
another might give the UUID based on a first and last name. The person
needing the customer UUID can them choose between those two advertisers,
based on the customer data they have at hand.

This is a first step towards solving
#20.

For this first implementation, there is no other automation, other than
the automated links added to the relevant advertising tasks.
@JeanMertz
Copy link
Contributor Author

Clicking on the link will open the other task. Other than that, no other integration/automation is added yet. You still need to copy the output of the other task, and move back to the previous task to continue.

This last part is simplified as of fd2db24.

When you navigate to another task from the current task, clicking the "Back" button now moves you back to the first task. This works for infinitely nested tasks. Once you are back at the original task, clicking "Back" returns you to the home page.

@JeanMertz
Copy link
Contributor Author

There's another thing to consider here.

Say for example I want to find the Customer UUID based on some value that might return multiple results (such as seaching by Family Name).

In this case, if Task A provides this functionality, it can provide you with a value for the Customer UUID in Task B, but it can return multiple results, and will not always return a single result.

In these cases it can still be useful to be linked to task A, but there can't be any automation (#33), you'll have to use your own judgement which result to pick (and copy/paste that result over).

We might potentially want to enhance the current variable advertisement functionality introduced in 3d45e61 to allow an advertiser to mark the advertisement as something that can be used automatically (so a single value), or something that has to be picked manually (a set of values, or a value that doesn't match exactly what you'd need for this variable, but can be copy/pasted or inferred from it).

@JeanMertz JeanMertz changed the title Allow pipelines to provide input for other pipeline variables Allow tasks to provide input for other task variables Jul 24, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant