Change raw string literal syntax: [#]\*"
represents single-line string and [#]\*'''
represents block string
Under current design of string literals, users may make assumptions that a
starting [#]*"""
represents a block string and misunderstand the syntax.
The design of
string literals
specifies a block string literal to start with [#]*"""
. Users may take for
granted the other way around, where any string literal starting with [#]*"""
is a block string literal. This does not hold true, however. Two counter-cases
are """abc"""
represents three tokens ""
, "abc"
and ""
, and #"""#
which is equivalent to "\""
. Neither is a block string literal and may be
visually confusing.
Interpret [#]*"
as the start of single-line string literals, and [#]*'''
as
the start of block string literals. Disallow adjacent string literals like
"""abc"""
.
Users can easily distinguish single-line string literals from block string
literals with the proposed change. Confusion on """abc"""
will be eliminated
because adjacent string literals are invalid in the proposal. On the other hand,
#"""#
will be clear to the user of representing "\""
, as """
does not
represent a block string literal any more. More details can be found
here.
This principle helps make Carbon code easy to read, understand, and write, because it avoids confusion on the type of certain string literals.
In addition to the confusion described above, the
current implementation
complicates the lexing. When the lexer sees [#]+"""
, it temporarily accepts
syntax of both string types because the type of the string is undecided.
Specifically, it accepts vertical whitespaces even if they are not allowed in
single-line strings. The type of the string won't be decided until the lexer
sees a closing "#
or a new line. In case of a closing "#
where the string is
single-line, the lexer will look back on the scanned characters for vertical
whitespaces to decide if the single-line string is valid.
This approach loses some convenience in using raw string literals while
addressing the problem. For example, as discussed in
issue #1359,
#"""#
is a natural way to write a string of "
.
Although something similar to C++ style like "(
solves the problem, the syntax
becomes complicated and hurts readability. In addition, "(")"
is no simpler
than "\""
or #"""#
.
When disallowing adjacent string literals, we can additionally allow [#]+'
on
single-line string literals. Another option is to use [#]+'
for single-line
string literals and [#]+"
for block string literals. In general, #'"'#
is
visually confusing with character '"'
and hurts readability.