Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Struct literal Revisit - Revisit #297

Closed
wants to merge 7 commits into from

Conversation

patmagee
Copy link
Member

I have fixed the identified typos. @DavyCats @cjllanwarne @mlin @geoffjentry please feel free to scrutinize this before it goes to voting again!

Copy link
Contributor

@DavyCats DavyCats left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Person( ... )
```

Arugments placed within the parethesis are key-value pairs, where the key is the name of one of the struct declarations, and the value is the value to set the argument to.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one more 😛

Suggested change
Arugments placed within the parethesis are key-value pairs, where the key is the name of one of the struct declarations, and the value is the value to set the argument to.
Arguments placed within the parethesis are key-value pairs, where the key is the name of one of the struct declarations, and the value is the value to set the argument to.

@illusional
Copy link
Contributor

Can you construct Struct literals in the output? Something like:

task output_struct_collection {
    input {
        String firstName = "Peter"
    }
    output {
        Person harry = Person(name="Harry", age=5+7)
        Person constructed = Person(name=firstName, age=88)
    }
}

As far as I can see, the documentation doesn't address this.

@patmagee
Copy link
Member Author

patmagee commented Apr 1, 2019

@illusional I am actually not sure if that is supported at the moment by any of the implementations. To me it would be good eventually to get to the point where constructing struct literals in the output is supported, however it will require a bit of work from the workflow engine. I would be interested to know from others if they this for now this should be allowed or labelled as future support
@cjllanwarne @mlin @geoffjentry

@geoffjentry
Copy link
Member

I agree that I don't think it's currently possible. I think I think it'd be a good idea in the future.

@patmagee
Copy link
Member Author

patmagee commented Apr 1, 2019

@geoffjentry as it is not explicitly stated in the spec whether we allow this or not, do you think it would be safe to assume that it is not allowed, or do you think we need to expressly forbid it for the time?

@geoffjentry
Copy link
Member

I'd double check with @cjllanwarne and @mlin that I haven't misspoken but if indeed it's not supported we should explicitly disallow it

@mlin
Copy link
Member

mlin commented Apr 2, 2019

I don't quite see the argument to forbid using the new struct literal syntax in output sections. Cromwell doesn't support it there, but nor does it implement the new literal syntax anywhere, right?

@illusional
Copy link
Contributor

I think initialisation in the output would be a useful feature of the language, and not just because I'm biased, is there a way it can be supported?

@geoffjentry
Copy link
Member

@illusional anything can be passed via vote, but to be folded in requires implementation. I can't speak for @mlin but from the perspective of Cromwell it's more likely to see implementation of things which require less work (disclaimer: I don't actually know how hard it'd be to implement ... @cjllanwarne ?)

@DavyCats
Copy link
Contributor

DavyCats commented Apr 4, 2019

If I'm not mistaken, currently cromwell does allow for the initialization of structs in the output section using the (in develop removed) object notations.
eg.:

output {
    IndexedBamFile outputBam = object {
        file: bamFile,
        index: bamIndex
    }
}

@mlin
Copy link
Member

mlin commented Apr 8, 2019

With the new syntax we may want some new constraints on name collisions between struct types and standard library function names (#307), eg my empty struct stdout {} appears to have the literal stdout(). There may be a concern about introducing new standard library functions that break existing WDLs with colliding struct names.

@mlin
Copy link
Member

mlin commented Apr 8, 2019

Could someone familiar with Hermes comment on whether the above & related cases complicate language parsing? @cjllanwarne @geoffjentry

@cjllanwarne
Copy link
Contributor

  • On struct literals v. library functions, I think we'll be ok so long as standard library calls look like x(y, z) and structs look like x(a=y, b=z). But for user sanity a blanket "don't use standard library names as struct names (or task names, workflow names, value names, ...)" might be appropriate?

    • To be absolutely sure, we could try to check this works in Hermes before making a final choice - assuming we're fairly sure the vote will pass, it doesn't seem too additionally painful to try it out before - rather than after - voting?
  • On allowing struct literals in outputs: I don't see any reason why we wouldn't allow this. To me a struct literal is just an special case of an expression which evaluates to a struct - and therefore valid anywhere that an expression is valid - runtime attributes, outputs, inputs, value declarations, even String placeholders like String s = "~{Person(name="Harry", age=17).name}" if you really wanted to .... In fact I think it would be harder and require more special logic for us (Cromwell at least) to disallow this.

@cjllanwarne
Copy link
Contributor

cjllanwarne commented Apr 15, 2019

I had a go at implementing this in the grammar - I had a few thoughts:

  • Mainly: it was tricky to distinguish function calls and struct literals in the grammar. You'll notice that I had to introduce identifier_being_assigned token to allow the grammar to distinguish the two parameter list types. I suspect this will cause side-effects elsewhere when I try the result on anything more complicated than my simple test file. This is partially a hermes thing - and partially a "I could probably do better if I looked at it harder"... but to me this is raising a small warning flag that if it's tricky for the grammer, maybe it'll also be tricky for users to distinguish function calls from structs. These suggestions would hopefully make it easier for both:
    • Maybe we could add a keyword to make it obvious that this is a struct being created, eg Person p = new Person(...)?
    • Maybe we could make the syntax a little more distinct, eg Person p = Person((...))
    • Maybe we don't care? (and/or: the syntax proposed is so nice we're willing to put up with any potential ambiguity coming from this?)
  • I don't think we want to allow structs and function calls to share the same names. In fact I'd be in favor of making function names their own token type in the grammar (ie make them keywords) so that we could validate at parsing time whether people are using the right option out of constructor syntax vs function call syntax. But to disclose my own bias: this would simplify the parsing logic too!
  • For the sake of flexibility - and if we can distinguish function calls from struct literals in another way, what do people think about (a) allowing function calls to name parameters too, and/or (b) allowing struct literals to not name parameters if they're simple?





Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a lot of empty lines...?

@patmagee
Copy link
Member Author

@cjllanwarne thanks for taking a stab at this. In regards to the struct literals in the output section, it appears that the general consensus is that this is already supported and there should be no strict reason to disallow it. If this is the case, I will go ahead and add an example to demonstrate this.

As for the naming collisions, this definitely will represent a problem. Engine functions should be considered key words in my opinion. This has the added side effect of course of making the addition of an engine function a breaking change that must wait for a major version release. But I think this is okay and would be accepted by the community. Therefore there should be no collisions within a WDL for a specific version. But if the user wanted to update to the latest WDL syntax, it is possible they will have to alias some of their structs.

I am also now concerned about complicating the struct literal with an expression from what you described. I am not sure what the best way forward? If anyone is a grammar guru and has a clean suggestion that would be appreciated. Otherwise, we might be stuck with updating the syntax of the literal annotation.

A few suggestions:

  1. new Person(name = "john", age = 23) - This is a bit verbose and is not consistent with any other literal implementation
  2. Person< name = "john", age = 23 > - Might as well consider this? I dont really like it necessarily... but others might.
  3. Person { name = "john", age = 23 } - go back to the old object literal syntax... maybe this was not doing anything wrong except having the object name.
  4. Person::(name = "john", age = 23) - I actually kind of like this, as to me It indicates I am calling a function belonging to the Person struct.

@patmagee
Copy link
Member Author

After looking at the example, I think I personally lean towards using Person::(name = "john", age=23) syntax to completely avoid any collisions. Additionally, to me this notation marks this explicitly as a struct literal and prevents any confusion with other syntax (ie other literals, engine functions, call blocks etc)

@mlin
Copy link
Member

mlin commented Apr 16, 2019

One cheesy trick is to formally constrain struct type names to start with an uppercase letter, while function names start with a lowercase letter. 🧀 🧀 🧀

I also quite like Pat's #3 keeping the existing object { [k: v ...] } syntax just replacing the word object with the struct type name...

@patmagee
Copy link
Member Author

@mlin the one concern I had about repurpossing the object literal syntax was that it might get confused with other types of blocks, either by users or at a grammar level. But if you do not see any issues arising from that, it might be our easiest way forward

@cjllanwarne
Copy link
Contributor

FWIW from a grammar point of view, if we had a defined set of function_name keywords then the ambiguity in the grammer will hopefully disappear - because we could key off of the differing token types:

      (*:left) $e = :function_name :lparen $fcall_param_list :rparen -> FunctionCall(name=$0, params=$2)
      (*:left) $e = :identifier <=> :lparen $struct_param_list :rparen -> StructLiteral(name=$0, params=$2)

@cjllanwarne
Copy link
Contributor

FWIW2: I also kind of like "Idea 3: repurpose the old object syntax but with struct names" idea

@patmagee
Copy link
Member Author

@cjllanwarne @mlin the benefit to option 3 imo, is that it would help for users to migrate away from objects, since it would be quite easier.

@geoffjentry @illusional @DavyCats any specific thoughts on this before I try implementing something

@vdauwera
Copy link
Member

vdauwera commented May 1, 2019

Great discussion, folks. From the perspective of community support wrt capitalization, I’m very much for promoting conventions and against enforcing them with requirements. Case is a tricky thing and can easily cause unwarranted difficulty, especially for people from cultures whose language does not include a concept of case.

@rhpvorderman
Copy link
Contributor

I like the overall idea, and I do not have an opinion on syntactic details. It looks quite reasonable and understandable to me. So 👍 and lots of thanks to @patmagee and everyone involved in this discussion.

@patmagee
Copy link
Member Author

patmagee commented May 6, 2019

It looks like general discussion has died down for this PR in its current iteration. I am going to give it 1 more day for any last minute opinions, and then put it to a vote

@DavyCats
Copy link
Contributor

DavyCats commented May 8, 2019

👍

4 similar comments
@orodeh
Copy link
Contributor

orodeh commented May 8, 2019

👍

@geoffjentry
Copy link
Member

👍

@illusional
Copy link
Contributor

👍

@geoffjentry
Copy link
Member

👍

@patmagee
Copy link
Member Author

Voting passes 4-0

@patmagee patmagee mentioned this pull request Nov 20, 2019
@patmagee patmagee mentioned this pull request Nov 20, 2019
@patmagee patmagee added this to the v2.0 milestone Nov 20, 2019
@jdidion jdidion changed the base branch from master to main June 29, 2020 14:07
@patmagee patmagee mentioned this pull request Aug 27, 2020
1 task
mlin added a commit to chanzuckerberg/miniwdl that referenced this pull request Nov 1, 2020
New struct literal syntax similar to the old `object` syntax, but replaces the `object` keyword with the intended struct type name. This allows earlier static typechecking of the struct members, where before we waited until coercing the object to a struct decl. (The `object` syntax remains valid.)

Some tedious surgery was involved to make declared struct types available to `Expr` typechecking logic, where before they'd only been needed in `Tree`.

#113 openwdl/wdl#297
@mlin
Copy link
Member

mlin commented Nov 3, 2020

The new style struct literals are implemented in miniwdl v0.9.0 for version development documents. The struct type name in the literal definitely helps with typechecking.

I think we can merge this if @patmagee would like to rebase on main and unless Cromwell team has any late-breaking concerns (cc @cjllanwarne)

assigned parameters. Struct literal notation attempts to be more declarative to help engine implementations apply the proper
type conversions to nested structs, as well as remove any ambiguity over what the object being constructed represents.

A `Struct Literal` declaration is similar to a map literal except it is preceded by a reference to the struct name. For example, the basic syntax for the `Person` struct would look like the following:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/map/object/

@jdidion
Copy link
Collaborator

jdidion commented Nov 3, 2020

This is implemented in wdlTools

@jdidion
Copy link
Collaborator

jdidion commented Feb 1, 2021

This is in v1.1 and will be propagated to development.

@jdidion jdidion closed this Feb 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.