Detect `NulInCStr` error earlier. #119172

nnethercote · 2023-12-21T01:12:44Z

By making it an EscapeError instead of a LitError. This makes it like the other errors produced when checking string literals contents, e.g. for invalid escape sequences or bare CR chars.

NOTE: this means these errors are issued earlier, before expansion, which changes behaviour. It will be possible to move the check back to the later point if desired. If that happens, it's likely that all the string literal contents checks will be delayed together.

One nice thing about this: the old approach had some code in report_lit_error to calculate the span of the nul char from a range. This code used a hardwired +2 to account for the c" at the start of a C string literal, but this should have changed to a +3 for raw C string literals to account for the cr", which meant that the caret in cr" nul error messages was one short of where it should have been. The new approach doesn't need any of this and avoids the off-by-one error.

r? @fee1-dead

nnethercote · 2023-12-21T01:15:55Z

@fee1-dead: some additional context in this zulip thread.

In short, the delayed C NUL str check is inconsistent with all other string literal checks. If it ships in its current state, we're stuck with that behaviour permanently. If we move it earlier right now before it ships, we have the option to delay it (and all other string literal checks) later on (as implemented in #118699). So if we do this in the next few days, we avoid a one-way door shutting.

cc @joshtriplett

nnethercote · 2023-12-21T01:17:30Z

Ugh, the refusal of git and GitHub to show "binary" files is really annoying for this one.

fee1-dead · 2023-12-21T01:18:15Z

Currently on my phone right now, but do we have a test for c"\0" on an earlier edition passed to a macro which then consumes it?

nnethercote · 2023-12-21T01:30:47Z

Ugh, the CI failure is due to a rust-analyzer sync problem #118861, which is blocked by #119124, which is awaiting review.

nnethercote · 2023-12-21T02:50:59Z

Ugh, the refusal of git and GitHub to show "binary" files is really annoying for this one.

To clarify: the test contains 15 C string literals. Prior to this commit, only the first five have ERROR annotations. This commit adds ERROR annotations to the other ten. And the .stderr file changes from having five errors to fifteen errors.

nnethercote · 2023-12-21T02:52:58Z

Currently on my phone right now, but do we have a test for c"\0" on an earlier edition passed to a macro which then consumes it?

I don't think so. Is that interesting? It would produce output like this:

error: expected one of `!`, `.`, `::`, `;`, `?`, `else`, `{`, `}`, or an operator, found `"\0"`
 --> a2.rs:7:10
  |
7 |     _ = c"\0";
  |          ^^^^ expected one of 9 possible tokens

fee1-dead · 2023-12-21T03:03:33Z

I meant something like this:

macro_rules! hi {
    (c $t:tt) => { $t };
}

fn main() {
    println!(hi!(c"hello!\0"))
}

do we already have that in our test suite?

nnethercote · 2023-12-21T03:25:31Z

I don't see anything in tests/ui/ like that, no.

traviscross · 2023-12-21T07:06:07Z

@rustbot labels +I-lang-nominated

Nominating this for us to discuss as there is some time pressure here. If we want to save this space, we should do it before C string literals stabilize in Rust 1.76.

joshtriplett · 2023-12-25T03:17:38Z

Following up on this: the decision made was to defer the stabilization by a release, so that we don't need to rush this change in over the holidays.

rustbot · 2024-01-12T03:39:11Z

rust-analyzer is developed in its own repository. If possible, consider making this change to rust-lang/rust-analyzer instead.

cc @rust-lang/rust-analyzer

nnethercote · 2024-01-12T03:42:00Z

The rust-analyzer problem has been fixed, and I have fixed the conflicts. @fee1-dead, this is ready for review. Unfortunately, git and GitHub both refuse to show the changed test file because it contains NUL chars and so is "binary".

Here's the before:

// edition: 2021

fn main() {
    c"\0";
    //~^ ERROR null characters in C string literals

    c"\u{00}";
    //~^ ERROR null characters in C string literals

    c"^@";
    //~^ ERROR null characters in C string literals

    c"\x00";
    //~^ ERROR null characters in C string literals

    cr"^@";
    //~^ ERROR null characters in C string literals
}

macro_rules! empty {
    ($($tt:tt)*) => {};
}

// The cfg consumes the literals before nul checking occurs.
#[cfg(FALSE)]
fn test() {
    c"\0";
    c"\u{00}";
    c"^@";
    c"\x00";
    cr"^@";
}

// The macro consumes the literals before nul checking occurs.
fn test_empty() {
    empty!(c"\0");
    empty!(c"\u{00}");
    empty!(c"^@");
    empty!(c"\x00");
    empty!(cr"^@");
}

and the after:

// edition: 2021

// The null char check for C string literals was originally implemented after
// expansion, which meant the first five strings in this file triggered errors,
// and the remaining ten did not. But this is different to all the other
// content checks done on string literals, such as checks for invalid escapes
// and bare CR chars. So the check was moved earlier. The check can be moved
// back to after expansion at a later date if necessary, because that would be
// a backward compatible change. (In contrast, moving the check from after
// expansion to lexing time would be a backward incompatible change, because it
// could break code that was previously accepted.)

fn main() {
    c"\0";     //~ ERROR null characters in C string literals
    c"\u{00}"; //~ ERROR null characters in C string literals
    c"^@";     //~ ERROR null characters in C string literals
    c"\x00";   //~ ERROR null characters in C string literals
    cr"^@";    //~ ERROR null characters in C string literals
}

macro_rules! empty {
    ($($tt:tt)*) => {};
}

// The cfg does not consume the literals before nul checking occurs.
#[cfg(FALSE)]
fn test() {
    c"\0";     //~ ERROR null characters in C string literals
    c"\u{00}"; //~ ERROR null characters in C string literals
    c"^@";     //~ ERROR null characters in C string literals
    c"\x00";   //~ ERROR null characters in C string literals
    cr"^@";    //~ ERROR null characters in C string literals
}

// The macro does not consume the literals before nul checking occurs.
fn test_empty() {
    empty!(c"\0");     //~ ERROR null characters in C string literals
    empty!(c"\u{00}"); //~ ERROR null characters in C string literals
    empty!(c"^@");     //~ ERROR null characters in C string literals
    empty!(c"\x00");   //~ ERROR null characters in C string literals
    empty!(cr"^@");    //~ ERROR null characters in C string literals
}

as rendered by vim.

By making it an `EscapeError` instead of a `LitError`. This makes it like the other errors produced when checking string literals contents, e.g. for invalid escape sequences or bare CR chars. NOTE: this means these errors are issued earlier, before expansion, which changes behaviour. It will be possible to move the check back to the later point if desired. If that happens, it's likely that all the string literal contents checks will be delayed together. One nice thing about this: the old approach had some code in `report_lit_error` to calculate the span of the nul char from a range. This code used a hardwired `+2` to account for the `c"` at the start of a C string literal, but this should have changed to a `+3` for raw C string literals to account for the `cr"`, which meant that the caret in `cr"` nul error messages was one short of where it should have been. The new approach doesn't need any of this and avoids the off-by-one error.

petrochenkov · 2024-01-12T15:31:39Z

Blocked on the decisions in #118699, but I'm fine with temporarily landing this if it unblocks stabilizing C-strings sooner.
@rustbot blocked

nnethercote · 2024-01-12T22:11:16Z

Blocked on the decisions in #118699, but I'm fine with temporarily landing this if it unblocks stabilizing C-strings sooner.

That's exactly the point of this PR: to unblock stabilization of C-string literals. Can you unblock?

petrochenkov · 2024-01-17T19:15:29Z

@bors r+

bors · 2024-01-17T19:15:32Z

📌 Commit 9018d2c has been approved by petrochenkov

It is now in the queue for this repository.

…etrochenkov Detect `NulInCStr` error earlier. By making it an `EscapeError` instead of a `LitError`. This makes it like the other errors produced when checking string literals contents, e.g. for invalid escape sequences or bare CR chars. NOTE: this means these errors are issued earlier, before expansion, which changes behaviour. It will be possible to move the check back to the later point if desired. If that happens, it's likely that all the string literal contents checks will be delayed together. One nice thing about this: the old approach had some code in `report_lit_error` to calculate the span of the nul char from a range. This code used a hardwired `+2` to account for the `c"` at the start of a C string literal, but this should have changed to a `+3` for raw C string literals to account for the `cr"`, which meant that the caret in `cr"` nul error messages was one short of where it should have been. The new approach doesn't need any of this and avoids the off-by-one error. r? `@fee1-dead`

ehuss · 2024-01-17T21:20:48Z

@nnethercote Just want to double check if you can post a reference PR update as mentioned at https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/rfc.203349.3A.20mixed.20utf8.20literals/near/409285306 to update the grammar and remove the sentence about post-lexing validation?

…etrochenkov Detect `NulInCStr` error earlier. By making it an `EscapeError` instead of a `LitError`. This makes it like the other errors produced when checking string literals contents, e.g. for invalid escape sequences or bare CR chars. NOTE: this means these errors are issued earlier, before expansion, which changes behaviour. It will be possible to move the check back to the later point if desired. If that happens, it's likely that all the string literal contents checks will be delayed together. One nice thing about this: the old approach had some code in `report_lit_error` to calculate the span of the nul char from a range. This code used a hardwired `+2` to account for the `c"` at the start of a C string literal, but this should have changed to a `+3` for raw C string literals to account for the `cr"`, which meant that the caret in `cr"` nul error messages was one short of where it should have been. The new approach doesn't need any of this and avoids the off-by-one error. r? `@fee1-dead`

…etrochenkov Detect `NulInCStr` error earlier. By making it an `EscapeError` instead of a `LitError`. This makes it like the other errors produced when checking string literals contents, e.g. for invalid escape sequences or bare CR chars. NOTE: this means these errors are issued earlier, before expansion, which changes behaviour. It will be possible to move the check back to the later point if desired. If that happens, it's likely that all the string literal contents checks will be delayed together. One nice thing about this: the old approach had some code in `report_lit_error` to calculate the span of the nul char from a range. This code used a hardwired `+2` to account for the `c"` at the start of a C string literal, but this should have changed to a `+3` for raw C string literals to account for the `cr"`, which meant that the caret in `cr"` nul error messages was one short of where it should have been. The new approach doesn't need any of this and avoids the off-by-one error. r? ``@fee1-dead``

…iaskrgr Rollup of 7 pull requests Successful merges: - rust-lang#119172 (Detect `NulInCStr` error earlier.) - rust-lang#119833 (Make tcx optional from StableMIR run macro and extend it to accept closures) - rust-lang#119955 (Modify GenericArg and Term structs to use strict provenance rules) - rust-lang#120021 (don't store const var origins for known vars) - rust-lang#120038 (Don't create a separate "basename" when naming and opening a MIR dump file) - rust-lang#120057 (Don't ICE when deducing future output if other errors already occurred) - rust-lang#120073 (Remove spastorino from users_on_vacation) r? `@ghost` `@rustbot` modify labels: rollup

…iaskrgr Rollup of 8 pull requests Successful merges: - rust-lang#119172 (Detect `NulInCStr` error earlier.) - rust-lang#119833 (Make tcx optional from StableMIR run macro and extend it to accept closures) - rust-lang#119967 (Add `PatKind::Err` to AST/HIR) - rust-lang#119978 (Move async closure parameters into the resultant closure's future eagerly) - rust-lang#120021 (don't store const var origins for known vars) - rust-lang#120038 (Don't create a separate "basename" when naming and opening a MIR dump file) - rust-lang#120057 (Don't ICE when deducing future output if other errors already occurred) - rust-lang#120073 (Remove spastorino from users_on_vacation) r? `@ghost` `@rustbot` modify labels: rollup

Rollup merge of rust-lang#119172 - nnethercote:earlier-NulInCStr, r=petrochenkov Detect `NulInCStr` error earlier. By making it an `EscapeError` instead of a `LitError`. This makes it like the other errors produced when checking string literals contents, e.g. for invalid escape sequences or bare CR chars. NOTE: this means these errors are issued earlier, before expansion, which changes behaviour. It will be possible to move the check back to the later point if desired. If that happens, it's likely that all the string literal contents checks will be delayed together. One nice thing about this: the old approach had some code in `report_lit_error` to calculate the span of the nul char from a range. This code used a hardwired `+2` to account for the `c"` at the start of a C string literal, but this should have changed to a `+3` for raw C string literals to account for the `cr"`, which meant that the caret in `cr"` nul error messages was one short of where it should have been. The new approach doesn't need any of this and avoids the off-by-one error. r? ```@fee1-dead```

Which moved the checking for NUL chars in C string literals earlier.

nnethercote · 2024-01-22T03:01:16Z

@nnethercote Just want to double check if you can post a reference PR update

Thanks for the reminder, done in rust-lang/reference#1450.

Which moved the checking for NUL chars in C string literals earlier.

…etrochenkov Detect `NulInCStr` error earlier. By making it an `EscapeError` instead of a `LitError`. This makes it like the other errors produced when checking string literals contents, e.g. for invalid escape sequences or bare CR chars. NOTE: this means these errors are issued earlier, before expansion, which changes behaviour. It will be possible to move the check back to the later point if desired. If that happens, it's likely that all the string literal contents checks will be delayed together. One nice thing about this: the old approach had some code in `report_lit_error` to calculate the span of the nul char from a range. This code used a hardwired `+2` to account for the `c"` at the start of a C string literal, but this should have changed to a `+3` for raw C string literals to account for the `cr"`, which meant that the caret in `cr"` nul error messages was one short of where it should have been. The new approach doesn't need any of this and avoids the off-by-one error. r? ```@fee1-dead```

Which moved the checking for NUL chars in C string literals earlier.

rustbot assigned fee1-dead Dec 21, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 21, 2023

This comment has been minimized.

Sign in to view

rustbot added the I-lang-nominated Nominated for discussion during a lang team meeting. label Dec 21, 2023

petrochenkov self-assigned this Dec 21, 2023

This was referenced Dec 22, 2023

Delay literal unescaping #118699

Closed

Set the in-rust-tree feature for all rust-analyzer{-proc-macro-srv} steps #118861

Merged

onur-ozkan mentioned this pull request Dec 25, 2023

Figure out why x.py test --stage 0 rust-analyzer doesn't work #99610

Open

This comment was marked as resolved.

Sign in to view

nnethercote force-pushed the earlier-NulInCStr branch from 07d1088 to 16a7e28 Compare January 12, 2024 03:39

This comment has been minimized.

Sign in to view

nnethercote force-pushed the earlier-NulInCStr branch from 16a7e28 to 9018d2c Compare January 12, 2024 05:19

rustbot added S-blocked Status: Blocked on something else such as an RFC or other implementation work. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 12, 2024

rustbot removed the S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). label Jan 17, 2024

bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Jan 17, 2024

compiler-errors mentioned this pull request Jan 17, 2024

Rollup of 8 pull requests #120066

Closed

matthiaskrgr mentioned this pull request Jan 18, 2024

Rollup of 8 pull requests #120087

Closed

matthiaskrgr mentioned this pull request Jan 18, 2024

Rollup of 7 pull requests #120088

Closed

matthiaskrgr mentioned this pull request Jan 18, 2024

Rollup of 8 pull requests #120089

Merged

bors merged commit ff8c7a7 into rust-lang:master Jan 18, 2024
11 checks passed

rustbot added this to the 1.77.0 milestone Jan 18, 2024

nnethercote added a commit to nnethercote/reference that referenced this pull request Jan 22, 2024

Update reference for rust-lang/rust#119172.

491d7e3

Which moved the checking for NUL chars in C string literals earlier.

nnethercote deleted the earlier-NulInCStr branch January 22, 2024 03:01

nnethercote added a commit to nnethercote/reference that referenced this pull request Jan 22, 2024

Update reference for rust-lang/rust#119172.

14f1c05

Which moved the checking for NUL chars in C string literals earlier.

mattheww mentioned this pull request Jan 22, 2024

Character and string token definitions need updating. rust-lang/reference#626

Open

6 tasks

ehuss mentioned this pull request Jan 25, 2024

Update C-String literals to reject NUL rust-lang/reference#1450

Merged

nnethercote added a commit to nnethercote/reference that referenced this pull request Jan 25, 2024

Update reference for rust-lang/rust#119172.

028f106

Which moved the checking for NUL chars in C string literals earlier.

nnethercote added a commit to nnethercote/reference that referenced this pull request Jan 26, 2024

Update reference for rust-lang/rust#119172.

a393aaf

Which moved the checking for NUL chars in C string literals earlier.

traviscross mentioned this pull request Jan 31, 2024

Tracking Issue for c"…" string literals #105723

Closed

12 tasks

xFrednet mentioned this pull request Feb 5, 2024

new lint: manual_c_str_literals rust-lang/rust-clippy#11919

Merged

Tiger0202 added a commit to Tiger0202/rust-lang that referenced this pull request Dec 11, 2024

Update reference for rust-lang/rust#119172.

697d7a5

Which moved the checking for NUL chars in C string literals earlier.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect `NulInCStr` error earlier. #119172

Detect `NulInCStr` error earlier. #119172

nnethercote commented Dec 21, 2023

nnethercote commented Dec 21, 2023

nnethercote commented Dec 21, 2023

fee1-dead commented Dec 21, 2023

This comment has been minimized.

nnethercote commented Dec 21, 2023

nnethercote commented Dec 21, 2023

nnethercote commented Dec 21, 2023 •

edited

Loading

fee1-dead commented Dec 21, 2023

nnethercote commented Dec 21, 2023

traviscross commented Dec 21, 2023

joshtriplett commented Dec 25, 2023

This comment was marked as resolved.

rustbot commented Jan 12, 2024

nnethercote commented Jan 12, 2024

This comment has been minimized.

petrochenkov commented Jan 12, 2024

nnethercote commented Jan 12, 2024

petrochenkov commented Jan 17, 2024

bors commented Jan 17, 2024

ehuss commented Jan 17, 2024

nnethercote commented Jan 22, 2024

Detect NulInCStr error earlier. #119172

Detect NulInCStr error earlier. #119172

Conversation

nnethercote commented Dec 21, 2023

nnethercote commented Dec 21, 2023

nnethercote commented Dec 21, 2023

fee1-dead commented Dec 21, 2023

This comment has been minimized.

nnethercote commented Dec 21, 2023

nnethercote commented Dec 21, 2023

nnethercote commented Dec 21, 2023 • edited Loading

fee1-dead commented Dec 21, 2023

nnethercote commented Dec 21, 2023

traviscross commented Dec 21, 2023

joshtriplett commented Dec 25, 2023

This comment was marked as resolved.

rustbot commented Jan 12, 2024

nnethercote commented Jan 12, 2024

This comment has been minimized.

petrochenkov commented Jan 12, 2024

nnethercote commented Jan 12, 2024

petrochenkov commented Jan 17, 2024

bors commented Jan 17, 2024

ehuss commented Jan 17, 2024

nnethercote commented Jan 22, 2024

Detect `NulInCStr` error earlier. #119172

Detect `NulInCStr` error earlier. #119172

nnethercote commented Dec 21, 2023 •

edited

Loading