Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a line continuation digraph \# #35336

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

pazner
Copy link
Contributor

@pazner pazner commented Apr 1, 2020

Closes #18612 and #27533 by providing a two-character combination (\#) for line continuation.

This is based on the suggestions proposed in #29273 (comment). Modifying the implementation to use the alternative suggested character combinations (e.g. .#) is fairly trivial. Anything after \# on the same line is treated as a comment. This should largely play well with existing syntax highlighting.

Use cases:

  • Applying macros to function definitions

      @foo \#
      function bar()
          ...
      end
    
  • Breaking long macro calls

      @info "A message which could be rather long" \#
            a="something more"                     \#
            b="another thing"
    
  • Newlines in long matrix literals:

      A = [1 2 3 4 5 \#
           6 7 8 9 10]
    

    (Note about the matrix literals: I agree that having huge chunks of data in your code is not a great idea. But with 16 digit numbers, even 4 or 5 columns is enough to justify a line break. These kinds of matrix literals are encountered very commonly in numerical applications, e.g. Butcher tableaux for Runge-Kutta methods).

Note: technically, \# is already valid syntax, so this would break something like

A\#
b

which parses as A\b. I think this usage is extremely rare (this should be checked with PkgEval), and the original functionality can be restored quite easily by adding a space (A\ #).

@JeffBezanson JeffBezanson added parser Language parsing and surface syntax triage This should be discussed on a triage call labels Apr 1, 2020
Copy link
Member

@c42f c42f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this looks pretty good to me, very competently executed!

A package code search indicates that \# is indeed rare. In fact I haven't found a single occurrence in searching over packages:
https://juliahub.com/ui/Code?q=%5E%5B%5E%22%5D%2a%5C%5C%23&w=true&r=true&u=all&t=all

Even allowing for spaces between the \ and # I don't find any results.

A PkgEval run might still be a good idea though; I'm not sure whether juliahub has full coverage of General yet.

Mainly I think this will come down to a debate over whether a digraph syntax is acceptable, and if so, which digraph to choose. One dimension to consider is whether this is "backward incompatible" in the particular desirable sense that the chosen digraph should idealy be a parser error on older versions. We currently have

julia> Meta.parse("a \\#\nb")
:(a \ b)

so in that sense, :# or .# might be better options. https://juliahub.com/ui/Code?q=%20%5C.%23&w=true&r=true&u=all&t=all looks promising for .#, for example.

(if (eof-object? (peek-char port))
(error "incomplete: expression ends in line continuation"))
(let ((tok (next-token port s)))
(aset! s 2 #t)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so a line continuation counts as nonzero whitespace? That makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. The newline should definitely count as whitespace. This prevents things like

@info "hello" \#
x

being parsed as an invalid string juxtaposition @info "hello"x.

Interestingly, this brings to light an existing behavior of the Julia parser. Whitespace before a comment gets ignored by the next token, so x #==#y is parsed as x*y even though there is whitespace after x.

((eqv? c #\\)
(let ((c (read-char port)))
(if (eqv? (peek-char port) #\#)
(line-continuation-comment port s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small nit: Might it be clearer to structure line-continuation-comment more similarly to skip-comment above where next-token is done separately from skipping the whitespace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm completely fine either way. It probably makes sense to first see what the consensus will be about specific choice of characters, etc. and then perfect the implementation.

@JeffBezanson
Copy link
Member

Good job with the implementation here.

I will warn that this is likely to be controversial. The choice of \# is kind of clever and I see why it makes some sense, but it seems unlikely that somebody could guess what it means. Let's see if anybody proposes an interesting variation.

@bramtayl
Copy link
Contributor

bramtayl commented Apr 2, 2020

\; is kinda nice

@pazner pazner closed this Apr 2, 2020
@pazner
Copy link
Contributor Author

pazner commented Apr 2, 2020

Sorry, closed by mistake (GitHub misfire).

@pazner pazner reopened this Apr 2, 2020
@pazner
Copy link
Contributor Author

pazner commented Apr 2, 2020

[...] One dimension to consider is whether this is "backward incompatible" in the particular desirable sense that the chosen digraph should idealy be a parser error on older versions. We currently have

julia> Meta.parse("a \\#\nb")
:(a \ b)

so in that sense, :# or .# might be better options. https://juliahub.com/ui/Code?q=%20%5C.%23&w=true&r=true&u=all&t=all looks promising for .#, for example.

Note that we also have

julia> Meta.parse("a.#\nb")
:(a.b)

so I think the same considerations apply to .#. I think :# is better in this regard (at least everything I could come up with involving :# followed by a new line resulted in a ParseError).

@c42f
Copy link
Member

c42f commented Apr 2, 2020

Yeah, this was always going to be controversial. I guess it might be helpful to restate and update the reasoning from #29273 (comment) which leads to considering a digraph:

Here are some desirable properties for line continuation syntax, including reasoning:

  1. It should be ascii because no other syntax in the language is exclusively unicode. The discussion in Add a line continuation character '⤸' #29273 explored unicode options without reaching a consensus.
  2. It should be short but not steal useful character combinations. Ascii combinations are rare and valuable.
  3. It should incorporate the existing comment character # for consistency with other comments.
  4. People should be able to guess what it means by analogy to other syntax or other languages.
  5. Any text after a line continuation but on the same line should be considered a comment. Languages exist where there's no way to add a comment inside a multiline statement (eg, bash) and this is very annoying in practice.
  6. It should be a syntax error in current julia. This means that non backward compatible code gives a clear syntax error in older versions.

Now, it's not clear we can satisfy all of these, in particular (4) and (6) are difficult to satisfy at once. But I'd love if we at least had a way forward on this (even if it needed to wait for Julia 2.0).

I'm going to restrain myself from suggesting solutions to the above list, for now!

@c42f
Copy link
Member

c42f commented Apr 3, 2020

Note that we also have

julia> Meta.parse("a.#\nb")
:(a.b)

so I think the same considerations apply to .#.

I think this can be avoided by requiring a space before the .. So in new code, "a.#\nb" would still parse the same as it does currently, but "a .#\nb" (which is currently a syntax error) would parse as line continuation.

@c42f
Copy link
Member

c42f commented Apr 3, 2020

Here's a bit of a strange idea: how about #\ which, while being a technically-breaking option nevertheless has no hits in julia code on juliahub code search?

https://juliahub.com/ui/Code?q=%23%5C%5C&w=true&r=true&u=all&t=all

It fails to be a syntax error in older parsers but aesthetically it's a much closer analogy with our existing #=, has better analogy to C preprocessor, bash, python and ruby line continuation. Generally it feels like a more forward-looking syntax which I'd be content to wait for 2.0 for, if necessary.

@StefanKarpinski
Copy link
Member

I'm not sure how we ended up with weird digraphs involving # as the set of options for continuing a line here. Although I do think #\ makes a bit more sense than the others.

None of these strike me as something that anyone would look at and think "ah, I bet that continues the line". The most intuitive option so far would be using ... followed only by whitespace and preceded by the start of the line or whitespace. That is illegal in most syntax contexts currently, invokes splatting in some contexts, but could be deprecated so that splatting requires following the splatted expression immediately. Most importantly, it looks like it might mean "continue this expression on the next line".

Other candidates that look like that are the symbol or something like that, but apparently this is so important that not only does it need syntax, it needs ASCII syntax.

@pazner
Copy link
Contributor Author

pazner commented Apr 3, 2020

I'm not sure how we ended up with weird digraphs involving # as the set of options for continuing a line here. Although I do think #\ makes a bit more sense than the others.
None of these strike me as something that anyone would look at and think "ah, I bet that continues the line".

The reasoning was laid out by @c42f in the comment above and in #29273 (comment).
We can compare to what other "non-semicolon" languages do:

  • Python: \
  • Ruby: \
  • Bash: \
  • Matlab: ...
  • Fortran: &
  • Stata: ///
  • Visual Basic: _
  • AppleScript: ¬

(A more comprehensive list is available on Wikipedia).

For people familiar with e.g. Python, Ruby, etc. the backslash syntax would be familiar, and so I think either \# or #\ makes a nice analogy that is actually quite intuitive.

Other candidates that look like that are the ⏎ symbol or something like that, but apparently this is so important that not only does it need syntax, it needs ASCII syntax.

It's not that this syntax is "so important" that it needs ASCII syntax, but I believe no other language feature is available using only Unicode characters, so it's not clear that this should be an exception. (Issues with non-ASCII syntax include both not being clear how to type the character, and not rendering properly in all situations). Of course, if people prefer a Unicode syntax for this, then there is the PR #29273.

@bramtayl
Copy link
Contributor

bramtayl commented Apr 3, 2020

Putting ascii aside, I'm a fan of words. continues?

@c42f
Copy link
Member

c42f commented Apr 3, 2020

I'm not sure how we ended up with weird digraphs involving # as the set of options for continuing a line here.

Because there's resistance to unicode-only syntax, ascii combinations are valuable, ascii # is already taken for comments, there are syntax precedents in digraphs #= and =#.

Furthermore, the text on the same line after the line continuation should be a comment: what else could it possibly mean?

@tkf
Copy link
Member

tkf commented Apr 3, 2020

Just as data points, although for some people \# may not look continuation, it seems that some people were puzzled that \# does not work in Python:

https://stackoverflow.com/questions/17630835/python-comments-after-backslash
https://stackoverflow.com/questions/30050454/comments-in-continuation-lines
https://stackoverflow.com/questions/26985822/commenting-with-line-continuation
https://stackoverflow.com/questions/51988508/put-comments-in-between-multi-line-statement-with-line-continuation
https://stackoverflow.com/questions/45786420/how-to-comment-out-one-line-of-a-multiline-statement
https://mail.python.org/pipermail/python-ideas/2013-May/020885.html

Having said that, I think the biggest issue with \# and alike is that a newly written Julia code expecting this to be the continuation can break in old Julia versions in very subtle ways. So, I don't think looking at existing open-sourced code (e.g., PkgEval) is enough to verify the impact of this. If we can't find a syntax that is ensured to raise parsing error in the old parsers, can we have using Future: __line_continuation__ or something to enable this?

@StefanKarpinski
Copy link
Member

it seems that some people were puzzled that \# does not work in Python

That's because given that \ works and # is a comment then it's expected that \ followed by a comment should work the same way as \ without the comment, not because \# somehow inherently seems like it should continue a line. Indeed, if \ wasn't already an operator in Julia, it would be a fine choice to continue the line.

@tkf
Copy link
Member

tkf commented Apr 3, 2020

OK, I see. I'm now convinced that it's not reasonable to expect people to guess \# to be a mark for line continuation.

@c42f
Copy link
Member

c42f commented Apr 4, 2020

The most intuitive option so far would be using ... followed only by whitespace and preceded by the start of the line or whitespace.

I feel this has a major drawback in that it makes combining comments with multi-line statements very awkward. MATLAB "solves" this problem by making text after ... a comment but I've always found this behavior non-intuitive. From the MATLAB docs:

If three or more periods occur before the end of a line, then MATLAB ignores the rest of the line and continues to the next line. This effectively makes a comment out of anything on the current line that follows the three periods.

If we can't find a syntax that is ensured to raise parsing error in the old parsers, can we have using Future: __line_continuation__ or something to enable this?

Nice idea, I think something like this makes a lot of sense.

In the current way things work it's a little subtle to implement parser options via syntax because the parser will see the whole text of the file before anything is ever eval'd. One option might be to interleave evaluation of top level statements with incremental parsing, and have the runtime keep track of parser options. Another idea would be to have the parser itself interpret certain syntax as parser options. The latter seems a bit like mixing evaluation with parsing, though.

@c42f
Copy link
Member

c42f commented Apr 4, 2020

Of course, if people prefer a Unicode syntax for this, then there is the PR #29273.

That's true, though I think you've done a nicer job of the implementation than I did :-)

@tkf
Copy link
Member

tkf commented Apr 4, 2020

In the current way things work it's a little subtle to implement parser options via syntax

A very naive implementation I'd imagine is to have something like

module Future
const __line_continuation__ = Base._line_continuation  # some unique object
end

in Future and then the parser inserts the check like below at the top-level just before the expression using \#

if !(@isdefined(__line_continuation__) && __line_continuation__ === Base._line_continuation)
    error("Syntax `\\#` is disallowed without `using Future: __line_continuation__`.  Used at line XXX of file XXX.")
end

(A slightly more (too?) clever implementation may be to look all parent modules of @__MODULE__.)

This also disallows old "valid" use of \# but this in this particular case I think it is OK?

One downside I can imagine is that, if we want a "real" parser switch at some point in 1.x time-frame, maybe it's better to do this per-file using "pragma" (or per-package using Project.toml?). Above implementation works at per-module (or per-package) level so we many need to introduce a different solution. It would then be confusing for users that there are different ways to toggle parser's behavior.

@StefanKarpinski
Copy link
Member

I feel this has a major drawback in that it makes combining comments with multi-line statements very awkward. MATLAB "solves" this problem by making text after ... a comment but I've always found this behavior non-intuitive. From the MATLAB docs:

If three or more periods occur before the end of a line, then MATLAB ignores the rest of the line and continues to the next line. This effectively makes a comment out of anything on the current line that follows the three periods.

It's unclear to me why Matlab would choose to ignore anything after ... — that choice seems to be the problem. Compare this with the expectation by Python users that \ # some text works the same as just \ at the end of the line. Can you spot how that's different from the Matlab case where ... some text works the same as ... at the end of the line? There's no comment character in the Matlab case. So why would we do that? If we use ... as a line continuation sequence, then ... # some text could be equivalent but ... some text would not.

@c42f
Copy link
Member

c42f commented Apr 4, 2020

It's unclear to me why Matlab would choose to ignore anything after ... — that choice seems to be the problem.

I agree this is a problem and it's bad syntax which we certainly shouldn't follow.

However I think it's fairly clear why they chose to do this: they asked the question "what could text mean if it came after a line continuation?" and chose the latter of the two possible answers:

  • A syntax error
  • A comment

Clearly this is a bad idea if your line continuation syntax doesn't look like a comment, and matlab should have required an extra % for clarity. Whatever, we all know matlab has many flaws of consistency.

What I am arguing is that

  • Some digraph of # is available ascii syntax, a rare commodity. It's hard to imagine doing anything else with such diagraphs other than making some "funky new kind of comment"
  • It's desirable for the text after line continuation to act like a comment, provided the line continuation syntax is comment-like.

Put two and two together and you end up considering things like #\.

@StefanKarpinski
Copy link
Member

But neither #\ not \# is actually available, just manifestly rare. So these aren’t intuitively suggestive of line continuations and they’re technically breaking. If we’re going to do something technically breaking, why not pick something more intuitive and with more precedent, like ...? That’s illegal at the end of line in most contexts and in contexts where it’s legal—eg inside of array literals—you don’t need a line continuation anyway.

@pazner
Copy link
Contributor Author

pazner commented Apr 4, 2020

That’s illegal at the end of line in most contexts and in contexts where it’s legal—eg inside of array literals—you don’t need a line continuation anyway.

Array literals (matrix literals more specifically) are one of the cases where line continuation would be most useful. This would cause [A...\nA] to parse as [A A] rather than [A...; A], which is a bit of an edge case, but still breaking.

I definitely like the idea of ..., but I'm worried that since that exact character combination is used already for splatting it might not be feasible.

@c42f
Copy link
Member

c42f commented Apr 4, 2020

If we’re going to do something technically breaking, why not pick something more intuitive and with more precedent, like ...?

Sure, though I fear we've covered this ground before without consensus! In #29273 the typographically-inspired --- and -- were also mentioned as possible connectives.

I think the ellipsis would be fine thought having it mean something completely different from splatting depending on context doesn't seem great. There's also some "legitimate" uses for it in vcat:

julia> a = [1 2 3]
1×3 Array{Int64,2}:
 1  2  3

julia> [a...  # without the `...` this is an error
        4]
4-element Array{Int64,1}:
 1
 2
 3
 4

I'm not sure I've seen this exact abuse, though I've seen some pretty weird constructions involving concatenation syntax.

@Keno
Copy link
Member

Keno commented Apr 9, 2020

I don't really like the \# thing, but I would be ok with #\ where no additional characters are allowed between the \ and the end of the line.

@StefanKarpinski
Copy link
Member

Triage feels that the particular digraph here is too non-obvious and brittle. Other options that were discussed:

  • ... at the end of line is slightly breaking and this has nothing to do with splatting
  • -- is an invalid operator so that would be non-breaking. However, if we ever added the -- decrement operator, then n-- would be the syntax for decrementing n and would clash with -- at the end of a line continuing the line. In general, it's felt that this syntax is too valuable to be used on such a niche feature as line continuation.

The main use case that's currently unaddressed is array literals where each line is too long to fit on a single line—in all other cases ( ) can be used to continue lines. It would be possible to allow something like this:

[(1
  2
  3)
 (4
  5
  6)]

The rule for this seems like it would be that in parens all whitespace becomes horizontal. No real conclusion reached here, I'm just recording the options discussed so that we don't have to start from scratch in two weeks.

@o314
Copy link
Contributor

o314 commented Jun 11, 2020

Markdown hard line break can be a very good catch : double space (or more) then a newline. And it's still ascii

foo..
baz

where . means space aka Char(32)

Pretty sure it will not hurt wherever, whatever

The return symbol in unicode “⏎” (U+23CE) is fancy but i will never propose an unicode only ops (btw compose as an identifier should be in Base too <eof_digression/>)

@diegozea
Copy link
Contributor

diegozea commented Aug 3, 2020

\/ is now a syntax error and it looks like going down. I can imagine Julia Mono having a nice rendering for it, similar to the rendering for |>.

@c42f
Copy link
Member

c42f commented Aug 4, 2020

\/ is now a syntax error and it looks like going down

That's a cute syntax! It seems there are some cases where it's not an error, unfortunately:

julia> quote
           @info "hi" \/
               a=1
       end
quote
    #= REPL[5]:2 =#
    #= REPL[5]:2 =# @info "hi" \ (/)
    #= REPL[5]:3 =#
    a = 1
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser Language parsing and surface syntax triage This should be discussed on a triage call
Projects
None yet
Development

Successfully merging this pull request may close these issues.

provide a line continuation syntax for nicer macro calls
9 participants