-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AST Format Cleanup #21774
Comments
I had assumed it would be easier to write macros if parsing normalized things to some extent, but if people feel that hasn't been the case then ok. I agree we should try to get rid of the
I assume the alternative parsing for |
I like these being pure sugar and not having to deal with them as separate cases in macros. A tokenizer should be able to round trip these things, but macros shouldn't need to worry about what are semantically just formatting differences in their input. |
@JeffBezanson glad to hear that you're on board.
Assuming you're talking about the burden on macro writers, it shouldn't be that bad. It's a trivial utility function in fact (and this generalises to all the discussed changes): lower_muls(ex) = postwalk(x -> @capture(x, n_Number(y_)) ? :($n*$y) : x, ex) Then just call I think it's worth emphasising that macros have to deal with these kinds of things already; this just reduces the number of awkward special cases. I don't expect this to increase the burden on macros (or I wouldn't propose it) so if anyone has concerns about that I'm happy to see examples and work it through.
They're only "semantically just formatting differences" if you aren't changing the semantics, which is exactly what things like |
Yichao had a good point about |
|
I would point out that the round-tripping issue could be addressed by choosing which form to print with based on properties of the expressions. A conditional with multiple statements has to be an if/else; a conditional with single-expression branches can be either one, if the expressions are short enough, we can format them as a ternary operator. |
There are quite a few more ambiguous cases , e.g.
|
A couple more things to consider:
gets parsed confusingly as begin
a...
b...
end -> f(a...; b...) type a
b
c
end shouldn't get parsed with a begin block IMHO because b and c are not executed sequentially. I've been using |
Parsing code blocks as
|
One more is the arguments to for, which hopefully could be similar to the arguments to :generator e = quote
result = 0
for i = 1:10, j = 11:20
result = result + i + j
end
(i + j for i = 1:10, j = 11:20)
end
e.args[4].args
e.args[6].args |
parse `let` the same as `for`. part of #21774
Status of this:
Actual round-trip-ability is not on the table here; CSTParser handles that very well and moving fully to that is way out of scope for 1.0. |
From triage: preserving |
Please mention more specific examples of weird ASTs if anybody finds any, but otherwise I think we're done here. |
Awesome. Do I take it we are not in favour of preserving |
Let's discuss cleaning up some kinks in Julia's AST format to make things easier for macro authors.
Specific Issues
In many cases, the parser does too much work, losing information:
:(1(2))
→:(1 * 2)
a ? b : c
→if a ...
if a b elseif c d end
→if a b else (if c d end) end
f() do ...
→f(() -> ...)
I think we should aim for round-trip-ability here and have this work be done only at lowering time. Short function definitions and broadcasting syntax are examples of this being done right.
This is more of a parser issue than an AST issue, but it's not currently possible to write
for $cond body end
as the parser checks for anx = y
iteration specification; instead you must use$(Expr(:for, cond, :body)
. For the same reason, it's not possible to write@capture(ex, for cond_ body_ end)
with MacroTools.let
andfor
have the opposite ordering for declarations and body.The existence of both
kw
and=
is strange as it gives us the opposite problem to that above: we have two ways to represent the same surface syntax. Perhaps this is necessary; MacroTools normaliseskw
to=
to avoid surprising failures.We currently have two ways to represent keyword arguments in calls:
f(a, b = c)
which has little processing, andf(a; b = c)
which becomesf($(Expr(:parameters, Expr(:kw, :a, :b)), a)
. The latter is one of the few places (let
is another) where the ordering/structure of the expression mistmatches that of the surface syntax. I suspect a nicer design is possible but I don't have an exact proposal. A matching pattern likef(a, xs__)
currently matches kwargs style and not the other, so at least putting the params at the end of the call consistently would help with that.Motivation
Losing information about the AST limits what macros can do, and/or adds strange edge cases to what they can do. Consider
@static if
for example:The fact that we can't tell these two examples apart means that we have to choose a surprising behaviour for one of them.
As another example, due to the parsing of
1(2)
you can't use DataFlow.jl's syntax for graphs if the graph contains literal numbers. This isn't a crucial use case, but it illustrates the kind of surprising edge cases this leads to, even for innocuous-looking transformations.Concerns
I'm not sure to what extent this can be done after 1.0. What levels of AST are part of Julia's public API? People will be relying on it in practice, although the users should be few enough and expert enough to alleviate the impact.
This adds some work for macro authors similar to that caused by short/long-form function definitions, in cases where the difference doesn't matter. This should be very manageable, as it's easy to provide utility functions that do various parts of normalisation (as MacroTools does already for
longdef
andshortdef
).Let me know if I've missed anything. Input from other macro writers welcome.
The text was updated successfully, but these errors were encountered: