Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates, HCA feedback #14

Merged
merged 15 commits into from
Feb 5, 2018
Merged

Updates, HCA feedback #14

merged 15 commits into from
Feb 5, 2018

Conversation

david4096
Copy link
Member

@david4096 david4096 commented Jan 10, 2018

Meant to clean up the repo and make some schema changes to accommodate recent feedback from Marcus at HCA.

  • Removes dead code
  • Begins alignment with TES (thanks @kellrott)
  • Improves comments and protobuf styling

Work in progress

@thejmazz
Copy link

with regards to changing workflow_params from google.protobuf.Struct to string:

Is there an example showing how "JSON you give a WDL executor is not typed, just strings", from here, it looks like JSON?

If it is set to string, would then stringified JSON be compatible?

@tetron
Copy link
Contributor

tetron commented Jan 11, 2018

Changing workflow_parameters from a structured record to a raw string with arbitrary format that has to be parsed is a step backwards in interoperability. This increases the friction involved in connecting a WES endpoint to other systems, as it imposes additional free text input/output format munging. And if the practical outcome is "everyone should use stringified JSON" then why make it a string field in the first place?

@david4096
Copy link
Member Author

Hi @thejmazz and @tetron! Thanks!!!! It looks like the input can be YAML or JSON, and so using an open string allows both cases.

https://github.com/openwdl/wdl/blob/master/versions/draft-2/SPEC.md#specifying-workflow-inputs-in-json

@briandoconnor @mckinsel any opinions regarding this field? I'll roll it back for now!

@mckinsel
Copy link

Hey everybody, thanks for the draft and comments.

I believe this suggestion started with my concern that valid values for workflow_params will vary by platform. The most pressing instance of this for me is that WDL inputs are not explicitly typed, and certain required type conversions can't be implemented easily in some platforms. After some discussion of the issue, I think we decided that this field needed to be maximally flexible.

I don't really feel too strongly about it, but I'm doubtful that there's much of an interoperability difference between an unconstrained string and unconstrained JSON.

Also, I see this change from a struct to a string in the proto file, but the swagger json already has workflow_params as a string. Is it possible for the two WES definitions to get out of sync, or am I looking in the wrong spot?

@david4096
Copy link
Member Author

@mckinsel sounds like a version issue, this PR will add the generated swagger to help avoid this. Sorry!

}

// Enumeration of states for a given workflow request
enum state {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically just recasting the TES state declaration ( https://github.com/ga4gh/task-execution-schemas/blob/master/task_execution.proto#L355 ) but with a different order for the values ( Initializing = 8; vs INITIALIZING = 2; ) and the string equivalents will be 'Initializing' vs 'INITIALIZING'.


// OPTIONAL
// A key-value map of arbitrary metadata outside the scope of the workflow_params but useful to track with this workflow request
map<string,string> key_values = 5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concept as the TES tags ( https://github.com/ga4gh/task-execution-schemas/blob/master/task_execution.proto#L66 ) but with a different name

// The filesystem protocols supported by this service, currently these may include common
// protocols such as 'http', 'https', 'sftp', 's3', 'gs', 'file', 'synapse', or others as
// supported by this service.
repeated string supported_filesystem_protocols = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a discussion about syncing this concept with the one used in TES: https://github.com/ga4gh/task-execution-schemas/blob/master/task_execution.proto#L527

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created an issue #15


// To execute a workflow, send a workflow request including all the details needed to begin downloading
// and executing a given workflow.
message workflow_request {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protobuf style guide suggests camel case for message names ( https://developers.google.com/protocol-buffers/docs/style )

@david4096 david4096 changed the title Updates, HCA feedback (WIP) Updates, HCA feedback Jan 29, 2018
@david4096
Copy link
Member Author

Hi Everyone! I removed the work in progress title and am hoping to merge this today! It's mostly cosmetic changes that remove some dead code, improve style, and align with other GA4GH efforts. The repository should be a little easier to work with now (fewer directions, requirements).

Add requirements txt for installing cwltool
Update swagger json to reflect changes
@briandoconnor
Copy link
Contributor

@tetron I like the idea of having a standardized input JSON regardless of underlying execution system. But for right now, given that both Cromwell and CWLtool use different JSON param formats it means anyone implementing WES will need to translate from WES to the execution-specific format. It seems like we should be able to, as a community, come up with a common JSON parameterization format for CWL and WDL. Do you think that's possible? If we had that, and it happened to be the same format as the WES param JSON that would be the best case scenario.

@briandoconnor
Copy link
Contributor

@david4096 do you have this swagger hosted somewhere? Would be great to look at this API through that interface

@briandoconnor
Copy link
Contributor

@briandoconnor
Copy link
Contributor

Some feedback @david4096 :

  • general
    • I think we need better details in the endpoint descriptions (not a blocker for your pull request, just need this before release)
  • GET /ga4gh/wes/v1/service-info
    • engine_versions -> workflow_engine_versions: Just to make it a little more clear
    • auth_instructions_url: add this so a site can provide a URL a user goes to to log in and generate their auth token... I don't think we do anything more than this right now with auth (other than having certain endpoints that need an authorized bearer token)
    • I thought the feedback from the HCA Driver was they wanted a named key-value reporting what extra params the workflow engine would take, their types, and default values? Is the approach right now to overload the tags with this?
  • POST /ga4gh/wes/v1/workflows
    • I still would prefer workflow_params be a string until we can actually give people a JSON schema used across workflow engines. Either string or JSON I can't parse it at this point since WDL and CWL use different schemas and it's not like we're proposing any structure. I'd prefer a 1.1 version of WES to provide a common JSON format regardless workflow engine (like we do with the output for example).
    • are we using tags to pass extra, optional "hints" to the workflow engine? See comment above in service-info endpoint.
    • workflow_descriptor: we said this would, optionally, accept base64 encoded zip files for systems that support it. What happens if this is a zip? How will the execution system know which WDL/CWL inside the zip is to be executed? In that case do you need to pass in the path that you want to execute via the workflow_url param? Messy but meets the needs of HCA and Marcus

I think that's it. Need responses on these ASAP since we want to cut a release soon to have something stable to share with the larger Commons effort. I think you're really close though! Thanks David and everyone else that's worked hard to push this forward!!

@david4096
Copy link
Member Author

Thanks @briandoconnor I'll incorporate what I can from your comments and push the rest out to issues!

@david4096
Copy link
Member Author

Hi everyone! I updated the OpenAPI description here to include @briandoconnor's comments. Thanks!

@briandoconnor
Copy link
Contributor

Cool @david4096 ! I'll take a look when I get home tonight and merge

@david4096
Copy link
Member Author

FYI I left the workflow_params as a struct, which is a oneof an Object, ListValue, string, etc. which should support the needs of HCA.

@briandoconnor briandoconnor merged commit e6962e8 into ga4gh:develop Feb 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants