-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
@b_str removes backslashes twice #39092
Comments
this in intentional, and, I believe, documented |
@vtjnash The help for |
@mgkuhn You are attempting to make this invariant hold? |
Should the invariant also hold that Because I don't think you will be able to achieve both. I can't find a good example to explain my suspicion though. |
@clarkevans @heetbeet No, both your suggested invariants are neither reasonable goals nor achieveable:
(Same with triple quotes, which make no difference here.) |
Okay I see I made a mistake in my code, and I expect the same happened to @clarkevans. Let's try again. Should the invariant hold that
|
In my initial post I expected that this cannot hold for |
@mgkuhn seems like your revised code has exactly this property for the example I tried: Before fixing b_str
After fixing b_str
|
@heetbeet None of your invariants can be true unless you exclude
Same for (I see how the discussion here evolves once more as evidence for widespread misunderstandings of how Julia's many different string literals work and relate to each other.) |
I see what you mean, I forgot about those. |
@vtjnash What was the design rationale for the current behaviour? Wouldn't it be cleaner to separate for special strings the following two operations:
? This separation could be introduced in a non-breaking way by offering a new, alternative interface for special-string literal macros, such that existing string literal macros continue to receive what they get at present (i.e., some backslashes removed). |
Perhaps a sensible and generic fix for these kind of woes is to allow more flexibility in the string delimiters for custom string macros? (Also related #41041) Then individual string macros wouldn't need weird heuristics to avoid double escaping - the generic answer if the user is having escaping issues would be to use another set of delimiters. Which exact delimiters are available? One possibility could be that either `` or "" begins a string when it's followed by the opposite quote type, with the (reversed/same?) delimiter at the other end of the string. It's currently a syntax error to juxtapose string literals so this syntax is probably available unless I've forgotten something. For example, the string julia> :(x``"hi"``)
ERROR: syntax: cannot juxtapose string literal
Stacktrace:
[1] top-level scope
@ none:1 I'm imagining this mixed delimiter parsing as The rule might be that mixed delimiters can be an arbitrarily long sequence of length at least 3, and the user can always arrange for those to not be present in the string they're trying to quote. (This is just one idea - perhaps there's other delimiters available?) |
Ah yes thanks. I thought I'd seen a longer discussion of this somewhere but couldn't find it. |
The byte-array literals syntax
is currently implemented as
This implementation hides a rather counter-intuitive and undocumented property: in certain situations, the unescaping procedure to remove backslashes is applied twice. As a result, a user needs to use no less than five (5) backslashes to obtain the byte sequence of the ASCII string
\"
:Julia's raw strings use the following escaping rule:
"
is preceded by 2n+1 backslashes, these are replaced by n backslashes, and the"
is passed through literally"
is preceded by 2n backslashes, these are replaced by n backslashes, and the"
acts as the string terminator(This is also the escaping mechanism that the Microsoft C runtime library uses when parsing quoted strings from the Windows command line into
argv
.)This removal of backslashes before
"
occurs not only in raw strings, but in all non-standard string literals, which are just macros ending in_str
. This can be seen from the trivial implementation of the macro behind raw string literals, which is just the identity function:Therefore, when
b"\\\\\""
is processed, backslashes are removed in the following two steps:"
with 2 backslashesunescape_string()
function by macro@b_str()
replaces the remaining\\
with\
.This duplicate backspace reduction is entirely unnecessary in non-standard string literals where the corresponding macro calls
unescape_string()
, because that function does already perform the same\\
→\
and\"
→"
mapping that is behind the 2n+1 rule of the raw-string processing. This redundant, duplicate processing is also likely to surprise users, especially since the documentation does not warn about this at all. It certainly surprised me!There is a simple workaround in the case of
@b_str()
, namely to undo the backslash removal performed by the raw-string processing, usingBase.escape_raw_string
:Now we get
which seems much more intuitive and unsurprising.
But
@b_str()
may be just one example of a type of non-standard string literal that further processes the string received withunescape_string()
, or with any other function that uses backslashes as escape symbols, and therefore performs the same\\
→\
and\"
→"
mapping. If this is indeed the case, then perhaps the compiler mechanics behind non-standard string literals should not remove any backslashes at all, and leave this to the author of the macro? The 2n+1 vs 2n rule would then merely be used to identify the terminating quotation mark, but all characters before that would be passed through to the macro untouched.The text was updated successfully, but these errors were encountered: