Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse one-sided formulas correctly and add tests #20

Closed
wants to merge 1 commit into from

Conversation

kleinschmidt
Copy link
Member

@kleinschmidt kleinschmidt commented May 13, 2017

Constructing one-sided formulas was a pain in butt. A one-sided formula is something like @formula(~ 1+x), which should be converted to Formula(nothing, :(1+x)).

This PR adds a helper function to raise non-infix ~ in @formula expressions (which gets attached to the first argument of the formula) to be the outermost call, and check for single-argument (one-sided) calls to ~.

@kleinschmidt
Copy link
Member Author

Failure on mac nightly looks unrelated...

@nalimilan
Copy link
Member

Looks good to me in general, but why do we want/need to support one-sided formulas?

@kleinschmidt
Copy link
Member Author

They're supported in the Formula type (you can set the LHS to nothing, but not the RHS), and (in principle) you might want to do something like dimensionality reduction on a model matrix which doesn't require a response.

@kleinschmidt
Copy link
Member Author

(And in the Terms type, as well: there's the hasresponse field)

@nalimilan
Copy link
Member

If you have a use case in mind, fine by me.

@ararslan
Copy link
Member

ararslan commented May 13, 2017

I guess I could see this being useful for something like unsupervised learning, where there's no response.

As much as I dislike _ having special treatment, now that assignment to it as a variable name is disallowed by the parser, we could have one-sided formulas be something like @formula(_ ~ x1 + ... + xn). Then the check is just lhs === :_ -> nothing.

@kleinschmidt
Copy link
Member Author

kleinschmidt commented May 13, 2017

Right, that's the kind of use case I'm imagining. I'd be fine with making _ for empty LHS, too/instead. I just want to make sure there's some way of constructing them other than manually calling the Formula constructor directly.

@ararslan
Copy link
Member

Using _ seems a bit cleaner, IMO, since it avoids needing to fight with the parser. Plus it makes it a bit clearer that the intention is to omit the LHS rather than accidentally omitting it. I don't feel too strongly though.

@kleinschmidt
Copy link
Member Author

Is ~ still going to be parsed as an infix operator once the macro is moved out of base?

@kleinschmidt
Copy link
Member Author

In general I agree that making it explicit is preferable (just like making the intercept explicit is preferable).

@ararslan
Copy link
Member

Is ~ still going to be parsed as an infix operator once the macro is moved out of base?

As of 0.6 it's no longer a macro, it's just a regular infix operator in Base that has assignment precedence in the parser. That's why when I made @formula I made sure it handled both cases, macro and operator, so we could support 0.5 and 0.6 simultaneously.

@nalimilan
Copy link
Member

_ ~ x doesn't look nice, though. If we go that route, I'd rather use nothing ~ x. But is there a chance the parser could be changed?

@ararslan
Copy link
Member

_ ~ x doesn't look nice

Yeah, I mean it's not perfect, but IMO it's nicer than just ~ x, which looks like you forgot something.

is there a chance the parser could be changed?

Unlikely, given that ~ has a unary meaning, and when parsed as a unary operator it has a different precedence. That is, ~ x1 + x2 is parsed as (~x2) + x2 rather than ~(x1 + x2), and changing that would be quite breaking.

@kleinschmidt
Copy link
Member Author

I'd like to revisit this now that #54 has been merged which (indirectly) removed the most common way that people in practice are constructing one-sided formulae (by calling Formula(nothing, rhs::Expr)). I still prefer allowing a totally empty LHS in the macro, which only requires a little bit of cleanup. _ might be reserved for currying or piping in the future (JuliaLang/julia#24990), and using a literal nothing is inconsistent with how symbols would normally behave since it's not referring to a column :nothing in a dataframe but is a value.

@kleinschmidt
Copy link
Member Author

One thing to consider would be requiring parens around the RHS (as in @formula ~(1+x+y)), in which case no clean-up is necessary. but it could create a "gotcha" if people don't realize it, and if we're checking for missing parens we're really just one step away from the cleanup necessary to support naked one-sided expressions as in this PR...

@ararslan
Copy link
Member

ararslan commented May 9, 2018

Due to other parsing oddities, we encourage fully parenthesizing formulas, e.g. @formula(y ~ x1 + x2). Having @formula(y ~ x1 + x2) and @formula ~(x1 + x2) looks really weird, IMO.

@kleinschmidt
Copy link
Member Author

I'm closing this in favor of using 0 ~ ... for a one sided formula (as documented in #155)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants