Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve diagnostics for character literals containing multiple codepoints #88684

Closed
eggyal opened this issue Sep 6, 2021 · 3 comments · Fixed by #88795
Closed

Improve diagnostics for character literals containing multiple codepoints #88684

eggyal opened this issue Sep 6, 2021 · 3 comments · Fixed by #88795
Labels
A-diagnostics Area: Messages for errors, warnings, and lints C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@eggyal
Copy link
Contributor

eggyal commented Sep 6, 2021

I tried to compile the following code (playground):

const SPADE: char = '♠️';

For the avoidance of doubt, ♠️ is U+2660. I expected the code to compile—but instead, the literal is rejected:

error: character literal may only contain one codepoint
 --> src/lib.rs:1:21
  |
1 | const SPADE: char = '♠️';
  |                     ^^^
  |
help: if you meant to write a `str` literal, use double quotes
  |
1 | const SPADE: char = "♠️";
  |                     ^^^

Notably, it is accepted if unicode-escaped:

const SPADE: char = '\u{2660}';

Meta

rustc --version --verbose:

rustc 1.57.0-nightly (e30b68353 2021-09-05)
binary: rustc
commit-hash: e30b68353fe22b00f40d021e7914eeb78473b3c1
commit-date: 2021-09-05
host: x86_64-apple-darwin
release: 1.57.0-nightly
LLVM version: 13.0.0
@eggyal eggyal added the C-bug Category: This is a bug. label Sep 6, 2021
@SNCPlay42
Copy link
Contributor

SNCPlay42 commented Sep 6, 2021

The spade "character" in your code is not just a single U+2660, it also has a unicode variation selector U+FE0F: playground

On my machine at least, this makes a visible difference in appearance (which github tries to eliminate by adding a U+FE0F unless I "escape" with a code block):

U+2660 U+FE0F: ♠️
U+2660 alone: ♠

@eggyal
Copy link
Contributor Author

eggyal commented Sep 6, 2021

D'oh! Good spot, thanks @SNCPlay42. Perhaps some improved diagnostics could have helped here?

@SNCPlay42
Copy link
Contributor

We have similar diagnostics in other cases of misleading Unicode so that seems like a good idea.

@jyn514 jyn514 added A-diagnostics Area: Messages for errors, warnings, and lints C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed C-bug Category: This is a bug. labels Sep 6, 2021
@eggyal eggyal changed the title Character literal containing only one codepoint rejected (but only if unescaped) Improve diagnostics for character literals containing multiple codepoints Sep 7, 2021
@bors bors closed this as completed in c2cdba4 Sep 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants