-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a line continuation digraph \#
#35336
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this looks pretty good to me, very competently executed!
A package code search indicates that \#
is indeed rare. In fact I haven't found a single occurrence in searching over packages:
https://juliahub.com/ui/Code?q=%5E%5B%5E%22%5D%2a%5C%5C%23&w=true&r=true&u=all&t=all
Even allowing for spaces between the \
and #
I don't find any results.
A PkgEval run might still be a good idea though; I'm not sure whether juliahub has full coverage of General yet.
Mainly I think this will come down to a debate over whether a digraph syntax is acceptable, and if so, which digraph to choose. One dimension to consider is whether this is "backward incompatible" in the particular desirable sense that the chosen digraph should idealy be a parser error on older versions. We currently have
julia> Meta.parse("a \\#\nb")
:(a \ b)
so in that sense, :#
or .#
might be better options. https://juliahub.com/ui/Code?q=%20%5C.%23&w=true&r=true&u=all&t=all looks promising for .#
, for example.
(if (eof-object? (peek-char port)) | ||
(error "incomplete: expression ends in line continuation")) | ||
(let ((tok (next-token port s))) | ||
(aset! s 2 #t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so a line continuation counts as nonzero whitespace? That makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. The newline should definitely count as whitespace. This prevents things like
@info "hello" \#
x
being parsed as an invalid string juxtaposition @info "hello"x
.
Interestingly, this brings to light an existing behavior of the Julia parser. Whitespace before a comment gets ignored by the next token, so x #==#y
is parsed as x*y
even though there is whitespace after x
.
((eqv? c #\\) | ||
(let ((c (read-char port))) | ||
(if (eqv? (peek-char port) #\#) | ||
(line-continuation-comment port s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small nit: Might it be clearer to structure line-continuation-comment
more similarly to skip-comment
above where next-token
is done separately from skipping the whitespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm completely fine either way. It probably makes sense to first see what the consensus will be about specific choice of characters, etc. and then perfect the implementation.
Good job with the implementation here. I will warn that this is likely to be controversial. The choice of |
|
Sorry, closed by mistake (GitHub misfire). |
Note that we also have
so I think the same considerations apply to |
Yeah, this was always going to be controversial. I guess it might be helpful to restate and update the reasoning from #29273 (comment) which leads to considering a digraph: Here are some desirable properties for line continuation syntax, including reasoning:
Now, it's not clear we can satisfy all of these, in particular (4) and (6) are difficult to satisfy at once. But I'd love if we at least had a way forward on this (even if it needed to wait for Julia 2.0). I'm going to restrain myself from suggesting solutions to the above list, for now! |
I think this can be avoided by requiring a space before the |
Here's a bit of a strange idea: how about https://juliahub.com/ui/Code?q=%23%5C%5C&w=true&r=true&u=all&t=all It fails to be a syntax error in older parsers but aesthetically it's a much closer analogy with our existing |
I'm not sure how we ended up with weird digraphs involving None of these strike me as something that anyone would look at and think "ah, I bet that continues the line". The most intuitive option so far would be using Other candidates that look like that are the |
The reasoning was laid out by @c42f in the comment above and in #29273 (comment).
(A more comprehensive list is available on Wikipedia). For people familiar with e.g. Python, Ruby, etc. the backslash syntax would be familiar, and so I think either
It's not that this syntax is "so important" that it needs ASCII syntax, but I believe no other language feature is available using only Unicode characters, so it's not clear that this should be an exception. (Issues with non-ASCII syntax include both not being clear how to type the character, and not rendering properly in all situations). Of course, if people prefer a Unicode syntax for this, then there is the PR #29273. |
Putting ascii aside, I'm a fan of words. |
Because there's resistance to unicode-only syntax, ascii combinations are valuable, ascii Furthermore, the text on the same line after the line continuation should be a comment: what else could it possibly mean? |
Just as data points, although for some people https://stackoverflow.com/questions/17630835/python-comments-after-backslash Having said that, I think the biggest issue with |
That's because given that |
OK, I see. I'm now convinced that it's not reasonable to expect people to guess |
I feel this has a major drawback in that it makes combining comments with multi-line statements very awkward. MATLAB "solves" this problem by making text after
Nice idea, I think something like this makes a lot of sense. In the current way things work it's a little subtle to implement parser options via syntax because the parser will see the whole text of the file before anything is ever |
That's true, though I think you've done a nicer job of the implementation than I did :-) |
A very naive implementation I'd imagine is to have something like module Future
const __line_continuation__ = Base._line_continuation # some unique object
end in if !(@isdefined(__line_continuation__) && __line_continuation__ === Base._line_continuation)
error("Syntax `\\#` is disallowed without `using Future: __line_continuation__`. Used at line XXX of file XXX.")
end (A slightly more (too?) clever implementation may be to look all parent modules of This also disallows old "valid" use of One downside I can imagine is that, if we want a "real" parser switch at some point in 1.x time-frame, maybe it's better to do this per-file using "pragma" (or per-package using |
It's unclear to me why Matlab would choose to ignore anything after |
I agree this is a problem and it's bad syntax which we certainly shouldn't follow. However I think it's fairly clear why they chose to do this: they asked the question "what could text mean if it came after a line continuation?" and chose the latter of the two possible answers:
Clearly this is a bad idea if your line continuation syntax doesn't look like a comment, and matlab should have required an extra What I am arguing is that
Put two and two together and you end up considering things like |
But neither |
Array literals (matrix literals more specifically) are one of the cases where line continuation would be most useful. This would cause I definitely like the idea of |
Sure, though I fear we've covered this ground before without consensus! In #29273 the typographically-inspired I think the ellipsis would be fine thought having it mean something completely different from splatting depending on context doesn't seem great. There's also some "legitimate" uses for it in vcat:
I'm not sure I've seen this exact abuse, though I've seen some pretty weird constructions involving concatenation syntax. |
I don't really like the |
Triage feels that the particular digraph here is too non-obvious and brittle. Other options that were discussed:
The main use case that's currently unaddressed is array literals where each line is too long to fit on a single line—in all other cases [(1
2
3)
(4
5
6)] The rule for this seems like it would be that in parens all whitespace becomes horizontal. No real conclusion reached here, I'm just recording the options discussed so that we don't have to start from scratch in two weeks. |
Markdown hard line break can be a very good catch : double space (or more) then a newline. And it's still ascii
Pretty sure it will not hurt wherever, whatever The return symbol in unicode “⏎” (U+23CE) is fancy but i will never propose an unicode only ops (btw |
|
That's a cute syntax! It seems there are some cases where it's not an error, unfortunately: julia> quote
@info "hi" \/
a=1
end
quote
#= REPL[5]:2 =#
#= REPL[5]:2 =# @info "hi" \ (/)
#= REPL[5]:3 =#
a = 1
end |
Closes #18612 and #27533 by providing a two-character combination (
\#
) for line continuation.This is based on the suggestions proposed in #29273 (comment). Modifying the implementation to use the alternative suggested character combinations (e.g.
.#
) is fairly trivial. Anything after\#
on the same line is treated as a comment. This should largely play well with existing syntax highlighting.Use cases:
Applying macros to function definitions
Breaking long macro calls
Newlines in long matrix literals:
(Note about the matrix literals: I agree that having huge chunks of data in your code is not a great idea. But with 16 digit numbers, even 4 or 5 columns is enough to justify a line break. These kinds of matrix literals are encountered very commonly in numerical applications, e.g. Butcher tableaux for Runge-Kutta methods).
Note: technically,
\#
is already valid syntax, so this would break something likewhich parses as
A\b
. I think this usage is extremely rare (this should be checked with PkgEval), and the original functionality can be restored quite easily by adding a space (A\ #
).