parse one-sided formulas correctly and add tests #20

kleinschmidt · 2017-05-13T14:31:40Z

Constructing one-sided formulas was a pain in butt. A one-sided formula is something like @formula(~ 1+x), which should be converted to Formula(nothing, :(1+x)).

This PR adds a helper function to raise non-infix ~ in @formula expressions (which gets attached to the first argument of the formula) to be the outermost call, and check for single-argument (one-sided) calls to ~.

kleinschmidt · 2017-05-13T14:59:49Z

Failure on mac nightly looks unrelated...

nalimilan · 2017-05-13T16:00:21Z

Looks good to me in general, but why do we want/need to support one-sided formulas?

kleinschmidt · 2017-05-13T16:25:42Z

They're supported in the Formula type (you can set the LHS to nothing, but not the RHS), and (in principle) you might want to do something like dimensionality reduction on a model matrix which doesn't require a response.

kleinschmidt · 2017-05-13T16:30:07Z

(And in the Terms type, as well: there's the hasresponse field)

nalimilan · 2017-05-13T16:37:47Z

If you have a use case in mind, fine by me.

ararslan · 2017-05-13T18:14:42Z

I guess I could see this being useful for something like unsupervised learning, where there's no response.

As much as I dislike _ having special treatment, now that assignment to it as a variable name is disallowed by the parser, we could have one-sided formulas be something like @formula(_ ~ x1 + ... + xn). Then the check is just lhs === :_ -> nothing.

kleinschmidt · 2017-05-13T19:03:29Z

Right, that's the kind of use case I'm imagining. I'd be fine with making _ for empty LHS, too/instead. I just want to make sure there's some way of constructing them other than manually calling the Formula constructor directly.

ararslan · 2017-05-13T19:06:13Z

Using _ seems a bit cleaner, IMO, since it avoids needing to fight with the parser. Plus it makes it a bit clearer that the intention is to omit the LHS rather than accidentally omitting it. I don't feel too strongly though.

kleinschmidt · 2017-05-13T19:11:18Z

Is ~ still going to be parsed as an infix operator once the macro is moved out of base?

kleinschmidt · 2017-05-13T19:11:58Z

In general I agree that making it explicit is preferable (just like making the intercept explicit is preferable).

ararslan · 2017-05-13T19:14:16Z

Is ~ still going to be parsed as an infix operator once the macro is moved out of base?

As of 0.6 it's no longer a macro, it's just a regular infix operator in Base that has assignment precedence in the parser. That's why when I made @formula I made sure it handled both cases, macro and operator, so we could support 0.5 and 0.6 simultaneously.

nalimilan · 2017-05-14T14:00:04Z

_ ~ x doesn't look nice, though. If we go that route, I'd rather use nothing ~ x. But is there a chance the parser could be changed?

ararslan · 2017-05-14T15:53:11Z

_ ~ x doesn't look nice

Yeah, I mean it's not perfect, but IMO it's nicer than just ~ x, which looks like you forgot something.

is there a chance the parser could be changed?

Unlikely, given that ~ has a unary meaning, and when parsed as a unary operator it has a different precedence. That is, ~ x1 + x2 is parsed as (~x2) + x2 rather than ~(x1 + x2), and changing that would be quite breaking.

kleinschmidt · 2018-05-09T20:24:25Z

I'd like to revisit this now that #54 has been merged which (indirectly) removed the most common way that people in practice are constructing one-sided formulae (by calling Formula(nothing, rhs::Expr)). I still prefer allowing a totally empty LHS in the macro, which only requires a little bit of cleanup. _ might be reserved for currying or piping in the future (JuliaLang/julia#24990), and using a literal nothing is inconsistent with how symbols would normally behave since it's not referring to a column :nothing in a dataframe but is a value.

kleinschmidt · 2018-05-09T20:26:19Z

One thing to consider would be requiring parens around the RHS (as in @formula ~(1+x+y)), in which case no clean-up is necessary. but it could create a "gotcha" if people don't realize it, and if we're checking for missing parens we're really just one step away from the cleanup necessary to support naked one-sided expressions as in this PR...

ararslan · 2018-05-09T21:52:03Z

Due to other parsing oddities, we encourage fully parenthesizing formulas, e.g. @formula(y ~ x1 + x2). Having @formula(y ~ x1 + x2) and @formula ~(x1 + x2) looks really weird, IMO.

kleinschmidt · 2019-10-02T12:31:56Z

I'm closing this in favor of using 0 ~ ... for a one sided formula (as documented in #155)

parse one-sided formulas correctly and add tests

448611b

kleinschmidt mentioned this pull request Apr 26, 2018

use macro to construct formula in test for StatsModels v0.2.4 JuliaStats/GLM.jl#222

Merged

kleinschmidt mentioned this pull request Aug 2, 2019

One-sided formula, 2.0 #139

Closed

clintonTE mentioned this pull request Aug 26, 2019

one sided formulae no longer work #143

Closed

kleinschmidt closed this Oct 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parse one-sided formulas correctly and add tests #20

parse one-sided formulas correctly and add tests #20

kleinschmidt commented May 13, 2017 •

edited

Loading

kleinschmidt commented May 13, 2017

nalimilan commented May 13, 2017

kleinschmidt commented May 13, 2017

kleinschmidt commented May 13, 2017

nalimilan commented May 13, 2017

ararslan commented May 13, 2017 •

edited

Loading

kleinschmidt commented May 13, 2017 •

edited

Loading

ararslan commented May 13, 2017

kleinschmidt commented May 13, 2017

kleinschmidt commented May 13, 2017

ararslan commented May 13, 2017

nalimilan commented May 14, 2017

ararslan commented May 14, 2017

kleinschmidt commented May 9, 2018

kleinschmidt commented May 9, 2018

ararslan commented May 9, 2018

kleinschmidt commented Oct 2, 2019

parse one-sided formulas correctly and add tests #20

parse one-sided formulas correctly and add tests #20

Conversation

kleinschmidt commented May 13, 2017 • edited Loading

kleinschmidt commented May 13, 2017

nalimilan commented May 13, 2017

kleinschmidt commented May 13, 2017

kleinschmidt commented May 13, 2017

nalimilan commented May 13, 2017

ararslan commented May 13, 2017 • edited Loading

kleinschmidt commented May 13, 2017 • edited Loading

ararslan commented May 13, 2017

kleinschmidt commented May 13, 2017

kleinschmidt commented May 13, 2017

ararslan commented May 13, 2017

nalimilan commented May 14, 2017

ararslan commented May 14, 2017

kleinschmidt commented May 9, 2018

kleinschmidt commented May 9, 2018

ararslan commented May 9, 2018

kleinschmidt commented Oct 2, 2019

kleinschmidt commented May 13, 2017 •

edited

Loading

ararslan commented May 13, 2017 •

edited

Loading

kleinschmidt commented May 13, 2017 •

edited

Loading