Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write a specification for unsatisfied task inputs. #359

Closed
wants to merge 18 commits into from

Conversation

rhpvorderman
Copy link
Contributor

@rhpvorderman rhpvorderman commented Mar 6, 2020

This PR is a result of the discussion in #347 . This alternate PR was made because the scope shifted during the discussion.

These were some of the points made in the discussion:

  1. Users should be able to know what inputs they should provide by looking at the input section of the top-level workflow. This should not be hindered by implicit optional inputs bubbling up.
  2. For clinically validated workflows only a select number of inputs may ever be changed.
  3. Allowing engines to pursue different models for computing inputs will lead to compatibility problems.
  4. Requiring all task optional inputs to be filled in the top-level workflow input section is infeasible and an enormous hassle.

This PR attempts to address all these points by:

  • Requiring all required task/workflow inputs to be filled by the calling workflow. This way the top-level workflow inputs section will always contain all the inputs necessary for running the workflow. (Solves point 1)
  • Requiring workflows to only accept inputs for the top-level workflow by default (solves 2 and 3)
  • Give engines the flexibility to allow for bubbled-up optional inputs if a configuration option is provided. This gives users the flexibility needed to solve issue 4.

Supporting bubbled-up inputs optionally is feasible in Cromwell thanks to how @cjllanwarne implemented the input calculation. I don't know about the other engines, but since it is optional, it should not be an extra burden.

Pinging @cjllanwarne , @mlin , @orodeh and @DavyCats as they brought up these points in the discussion.

EDIT: I also updated the example to reflect changes in the WDL spec so far.

Checklist

  • Pull request details were added to CHANGELOG.md

versions/development/SPEC.md Outdated Show resolved Hide resolved
versions/development/SPEC.md Outdated Show resolved Hide resolved
versions/development/SPEC.md Outdated Show resolved Hide resolved
Comment on lines 2618 to 2621
* There is currently no way to supply a bubbled-up input in an outer workflow's
call block.
* Example: one cannot say call my_workflow as subworkflow
`{ inputs: my_task.my_task_input=... }`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be made allowed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes maybe. But we should do that in another PR I think.

versions/development/SPEC.md Outdated Show resolved Hide resolved
Comment on lines 2622 to 2626
* By default an engine only allows inputs that are specified in the input
section of the top-level workflow.
* An engine may optionally support supplying bubbled-up optional inputs, but
this has to be explicitly enabled on the engine (via configuration, command
line flags or otherwise).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel it might be better to add some form of syntax in the WDL file itself. That way one would be able to see that there may be interesting nested inputs in the WDL files and, thus, that it would be good to use an engine which supports them.

In addition, a workflow might be written without the intention for the bubble-up behavior to be used. If someone then uses such a flag or configuration to allow for the bubble-up, they might be able to set optional inputs which will mess up the workflow.

An alternative to the last two points in the list:

  • By default only inputs defined in the top-level workflow are made available to the user.
  • The keyword etc must be added to a workflow's input section to enable the bubble-up behavior described above. Only if this etc keyword is present in the workflow's input section may an engine allow for this bubble-up behavior and an engine may choose not to support to feature at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The etc keyword makes sense to me. However, why not just allow it in the workflow meta section? This way, we don't need new syntax.

Copy link
Contributor Author

@rhpvorderman rhpvorderman Mar 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DavyCats good points! @orodeh good suggestion! Setting it in the workflow seems indeed better than setting it in the engine, and using the meta section for this seems the most logical thing to do.

Something like:

meta {
    bubbled-up-inputs: true
}

Except that I don't like the bubbled-up word. It is in the PR, but more on "a lack of a better term" basis.

How about nested inputs?

meta {
     allow-nested-inputs: true
}

Or some other suggestions:

meta {
    allow-hidden-inputs: true
    allow-subworkflow-inputs: true
    allow-extended-inputs: true
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that the meta section is the appropriate place for this. The meta section is just a set of key-value pairs that hold no relevance to the execution of a workflow. It is intended to be a place where you might store contact information or a link to the documentation. This was also discussed (briefly) in #351.

This feels more like something that would fit in the runtime section, except that there is no runtime section for workflows right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allow-nested-inputs seems fine to me, though I wonder if using -s might cause issues with the grammars.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinging @patmagee for the grammar question.

Well we could consider adding a runtime section for workflows, but that requires another spec/language change. (Which is why I wanted to set it engine-side) . Adding a runtime section could be useful, but there need to be other relevant things stored there. Having an entire runtime section for allow-nested-inputs seems a bit ludicrous to me.

I think the meta section is quite appropriate: It states that this workflow has nice nested inputs. A user does not have to use them. I think it is more documentation than runtime information, since an engine does not have to support nested-inputs per se.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are actually really great examples of something that can be added to the hints section, and we leave it up to the engine to decide what to do. We tried to really trim down the runtime lock and make it the bare minimum requirements needed to run a workflow.

We already have the idea of "semi reserved" kw used in the hints section. These are hints that have similar meanings across all engines but do not need to be implemented.

I'd suggest doing something like:

hints {
  allowNestedInputs: true
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patmagee, yes, that seems to be the best section but in the PR for the hints section I only see it at the task level. Will there also be a hints section at the the workflow level?

Copy link
Contributor

@cjllanwarne cjllanwarne Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't feel like it quite matches the semantics of a "hint" to me to be honest.

A hint to me should mean "if an engine completely ignores this, it might miss out on some runtime optimization but the logical evaluation of the workflow should still be possible".

What we're talking about here is:

  1. a warning to users that their workflow might not work on all engines.
  2. a requirement to supporting engines that they should allow bubbled up inputs.
  3. a signal for engines which don't support this feature to quickly stop processing the workflow/task

I'd almost like to add a way to flag plugins or non-standard-features or engine-requirements. I know Ohad has a number of other ideas for features which would be really cool but tricky for Cromwell. A way to flag these high up in the context of a WDL file (perhaps even above the workflow scope) would be really nice.

Copy link
Contributor

@cjllanwarne cjllanwarne Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eg something like:

version 2.0

plugins {
  bubbled-up-inputs,
  streamable-input-files,
  ???
}

it's not 100% ideal because it allows engines to diverge, but at least it'd be obvious right at the top of the file whether your workflow will work with any particular engine (which would presumably be able to publish or otherwise indicate which features they support). It would also allow engines to "get out quickly" if they see required features which they didn't have.

@orodeh
Copy link
Contributor

orodeh commented Mar 9, 2020

I have joined @patmagee in trying to build a grammar for WDL with ANTRL4. My part is writing scala code that builds an AST from a WDL program (https://github.com/dnanexus-rnd/wdlTools). I now have a much better appreciation for the amount of work it takes to modify and maintain the grammar. Hence my reluctance to add additional syntax, unless we really need it.

@rhpvorderman
Copy link
Contributor Author

rhpvorderman commented Mar 10, 2020

Given the feedback I coin the following proposal:

If there is no additional use for a hints or runtime section for workflows, then we should settle on the meta block for harboring a key called allowNestedInputs.

If in due time the need arises for a runtime or hints section before WDL2.0 is released, we can always change this. If the need arises after WDL2.0 is released, we can deprecate (but still support) the use of allowNestedInputs in the meta section to provide a smooth transition.

What do you think? If we are all in agreement that the meta section is (for now, in a pragmatic sense) the way to go I will go ahead and update the proposal.

@orodeh
Copy link
Contributor

orodeh commented Mar 10, 2020

Sounds good to me, thanks.

@cjllanwarne
Copy link
Contributor

@rhpvorderman I should have read further down before replying in thread!

Is the idea that an engine could choose either to allow bubbling up only if this flag was set, or else to fail the workflow when it sees that flag?

Your answer seems fine to me for now. The more general "how do we allow for diverging engine features" can be part of a future discussion.

@rhpvorderman
Copy link
Contributor Author

Is the idea that an engine could choose either to allow bubbling up only if this flag was set, or else to fail the workflow when it sees that flag?

I think the engine should allow bubbling up. If it does not support bubbling up I think it should fail with a message, but only if bubbled-up namespaced inputs are actually used. Since all required inputs should be filled by the top-level workflow, this way all workflows can be run by all engines, regardless whether the flag is present or not.

rhpvorderman and others added 2 commits March 11, 2020 07:33
Co-Authored-By: DavyCats <davycats.dc@gmail.com>
@rhpvorderman
Copy link
Contributor Author

I added the flag to the PR. I did not add any stuff about engines optionally supporting this feature yet. Do we want to specify this diverging behavior now, or can this wait for later since WDL2.0 is not released yet?
As it is in the PR currently, all engines should support it, as no exceptions are made.

@DavyCats
Copy link
Contributor

Regarding diverging behavior, I see two viable stances that might be taken:

  • The spec never declares a feature as optional, but an engine may simply choose not to implement a certain feature. The engine wouldn't be fully spec-compliant, but if that's properly documented it is probably okay.
  • In cases where a feature might interfere with the possibility for a workflow to be run on certain infrastructure, such a feature may be declared in the spec as an optional feature that an engine may choose not to implement. Only features who's absence would not interfere with the runnability of a workflow may be optional, ie. even if the feature is not available in an engine the engine should still be able to run a workflow which would otherwise have made use of it, simply with less functionality/convenience/configurability.

@rhpvorderman
Copy link
Contributor Author

I propose we follow @cjllanwarne's suggestion and discuss diverging engine behavior in a future discussion.

I think the PR is ready. Are there any points that need to be addressed on this PR before it gets into voting?

@patmagee patmagee added this to the v2.0 milestone Apr 3, 2020
@DavyCats
Copy link
Contributor

Should voting be opened on this? There hasn't been any further discussion in a while.

@rhpvorderman
Copy link
Contributor Author

rhpvorderman commented Apr 24, 2020

@patmagee Could you initiate the vote on this? It might be accepted or rejected, but it's getting a bit stale at the moment. I would do it myself if I had the label powers 😇. Thanks for your effort!

@patmagee
Copy link
Member

@rhpvorderman can you please update the changelog.md :)? Once you do that, I can opn this up for voting

@rhpvorderman
Copy link
Contributor Author

@patmagee Done!

@orodeh
Copy link
Contributor

orodeh commented May 6, 2020

👍

1 similar comment
@DavyCats
Copy link
Contributor

DavyCats commented May 7, 2020

👍

again with a further-qualified name.
* Example: my_outer_workflow.my_workflow.my_task.my_task_input.
* There is currently no way to supply a bubbled-up input in an outer workflow's
call block.
Copy link

@pieterlukasse pieterlukasse May 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this point. Does it contradict the previous one? I.e. the previous one seems to introduce an option, but this point seems to state it is not possible. Can you please clarify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first point is about the input json (or whatever other input format)

The outer workflow's call block is in the WDL code itself. When you call a workflow from inside a workflow. In that area there is currently no way to supply nested inputs.

@pieterlukasse
Copy link

thanks @rhpvorderman . Looks good to me 👍
I had only 1 question regarding part of the documentation. Maybe you can take a look at that?

@rhpvorderman
Copy link
Contributor Author

Cromwell implementation pending in broadinstitute/cromwell#5523.

I know this spec change does not generate a lot of buzz. However, it is important that it gets merged as a lot of the behavior it described was previously undefined. Cromwell, for example, allowed for calling tasks without supplying any inputs to the call block. These inputs could then be supplied by the inputs json of the workflow. This carries with it a lot of implicitness that makes workflows very hard to read.
In order to get the implementation to be similar across engines a clear spec is needed.

@patmagee
Copy link
Member

patmagee commented Jun 24, 2020

This has passed with a 3-0 vote. @rhpvorderman would you be able to rebase the changelog so that this can be merged

Currently blocked by: broadinstitute/cromwell#5523

@rhpvorderman
Copy link
Contributor Author

@patmagee done. Thanks

@jdidion jdidion changed the base branch from master to main June 29, 2020 14:05
rhpvorderman added a commit to broadinstitute/cromwell that referenced this pull request Jul 1, 2020
This implements part of the spec change that handles inputs openwdl/wdl#359
Co-authored-by: Miguel Covarrubias <mcovarr@users.noreply.github.com>
@DavyCats
Copy link
Contributor

DavyCats commented Aug 6, 2020

@rhpvorderman This was implemented in cromwell 52, wasn't it?

@rhpvorderman
Copy link
Contributor Author

@DavyCats partially.
The optional inputs are allowed now. The very strict checking on whether all required options are satisfied by the top-level workflow was not.

In this case, the inputs bubble up to become a nested input to the workflow
instead.
* Example: an unsupplied input might have the fully-qualified name
`my_workflow.my_task.my_task_input.`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Period should go outside the backtick, I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting this!

* By default an engine only allows inputs that are specified in the input
section of the top-level workflow.
* If a workflow is suitable for use with nested inputs it should be explicitly
stated. This is done by setting `allowNestedInputs` to `true` in the meta
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My problem with this is that things like allowNestedInputs don't really belong in meta - there's just no where else to put them. I am going to open a proposal to add a hints section to workflow with the same semantics as it has in task. I think things like this more appropriately belong in hints than in meta.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when designing the hints section, I ultimately planned on having one in the workflow as well. however it felt like a more natural transition to start with it in the task. In the lont term this is probably a good move. If we add the hints in 1.1, I feel like 2.0 can propogate the hints to the workflow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree! Meta was always a bit of a "let's put it where it doesn't hurt" manoeuver.

Not having this feature proved to be a major hindrance in getting actual work done. But at least it is implemented in cromwell now, so it is at least possible to get work done until this feature lands.

If the hints proposal is landed I will rewrite this PR.

@jdidion
Copy link
Collaborator

jdidion commented Feb 1, 2021

This is in v1.1 and will be propagated to development.

@jdidion jdidion closed this Feb 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants