-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid char literals are accepted #7945
Comments
"Presumably the manual is the authority and the compiler is wrong to accept" - HA! (Hint: Don't trust either) In all seriousness... I don't think any of those examples can be seen as illegal. Confusing yes, but not wrong: You write a Because it has to be exactly one codepoint, the parser has no problem with it being the same character used to delimit it, and because ASCII is a subset of utf8, both |
The ''' is a less clear case; I'm fine with whichever behavior as long as we make docs and compiler agree. On the other hand, anyone using a literal newline like that deserves to be shot. Various transports used in the real world for code don't preserve line-endings: ftp, web services like pastebins, IRC, and anything using a "text" rather than "binary" mode in its I/O libs will corrupt literals of the 0x27 0x0a 0x27 or 0x27 0x0d 0x27 format, in some cases resulting in code that won't compile anymore (if it turns the sequence into 0x27 0x0d 0x0a 0x27), but in other cases silently changing semantics by converting the 0x0d into 0x0a. Literals of that form also result in indentation violations, so naïve auto-indent will also break said literals. Therefore, even though they are technically legal at present, it seems insane to leave it that way. Languages like C, Java, etc. similarly disallow unescaped 0x0a and 0x0d char literals. |
Nominating for Well-Defined. |
One thing first: All the things you talked about are also true for string literals, so we need to think about them too. So, you're right no one should actually do this, but I don't see that as a reason to only forbid those two. Rust source is utf8, you will have those problems with other byte sequences too. If you are in a situation where But even if it's better to forbid them, it seems arbitrary to only exclude those two codepoints in a literal. What about the other ascii ctrl characters? All the other utf8 sequences that might trip up external tools? A rule like "All non-printable codepoints in the ascii range need to be annotated in escaped form" would at least be better in that case. |
If there's precedent in Java and C disallowing certain character literals, then that's a reasonable argument for us to disallow them as well. But the only reason I say this is because we can cite precedent, because it does seem somewhat arbitrary. |
@Kimundi: I agree that we should probably give string literals some related scrutiny. I believe the primary reason they are forbidden in other languages is that character literals are only allowed to span a single line, and these characters are those which terminate lines. @bstrie: None of C, C++, or Java allows unescaped \r or \n in character literals (in C and C++ the interpretation of what constitutes newlines is up to compilers to an extent but gcc and clang behave as described): From §2.14.3 of the latest C++ draft: And in the Java SE 7 language spec, §3.10.4: |
Accepted for well-defined |
cc me |
As documented in issue #7945, these literal identifiers are all accepted by rust today, but they should probably be disallowed (especially `'''`). This changes all escapable sequences to being *required* to be escaped. Closes #7945 I wanted to write the tests with more exact spans, but I think #9308 will be fixing that?
Fix ICE in undocumented_unsafe_blocks changelog: Fix ICE in [`undocumented_unsafe_blocks`] closes: rust-lang#7934
The byte sequences "0x27 0x0a 0x27", "0x27 0x0d 0x27", and "0x27 0x27 0x27" (newline, carriage return, and single-quote, respectively, sandwiched between single quotes) are accepted as character literals. The former two are, as far as I can tell, allowed per the manual's description of the language, but would not feature in any sane language; I assume this is merely an oversight. The latter is rejected by the grammar described in the manual but accepted by the compiler. Presumably the manual is the authority and the compiler is wrong to accept '''.
The text was updated successfully, but these errors were encountered: