Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assert_eq! is not 100% hygienic #131446

Open
GrigorenkoPV opened this issue Oct 9, 2024 · 16 comments
Open

assert_eq! is not 100% hygienic #131446

GrigorenkoPV opened this issue Oct 9, 2024 · 16 comments
Labels
A-diagnostics Area: Messages for errors, warnings, and lints A-hygiene Area: Macro hygiene A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) C-bug Category: This is a bug. D-confusing Diagnostics: Confusing error or lint that should be reworked. D-terse Diagnostics: An error or lint that doesn't give enough information about the problem at hand. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@GrigorenkoPV
Copy link
Contributor

Not sure if this should be considered a bug or a diagnostic issue.

Having a const left_val or const right_val declared breaks assert_eq!. This has to do with its expansion and Rust's rules for macro hygiene: https://sabrinajewson.org/blog/truly-hygienic-let

Consider this code

fn main() {
    let x: u8 = 0;
    assert_eq!(x, 0);
}

according to cargo expand it expands to

#![feature(prelude_import)]
#[prelude_import]
use std::prelude::rust_2021::*;
#[macro_use]
extern crate std;
fn main() {
    let x: u8 = 0;
    match (&x, &0) {
        (left_val, right_val) => {
            if !(*left_val == *right_val) {
                let kind = ::core::panicking::AssertKind::Eq;
                ::core::panicking::assert_failed(
                    kind,
                    &*left_val,
                    &*right_val,
                    ::core::option::Option::None,
                );
            }
        }
    };
}

Since assert_eq! wants to use the value of the provided expressions twice (once for comparison, once for printing the result on failure), but it only wants to evaluate each expression once, it does a match to bind them to a pattern (left_val, right_val). However, having a const named left_val or right_val in scope changes the meaning of the pattern.

fn main() {
    let x: u8 = 0;
    const left_val: i8 = -123;
    assert_eq!(x, 0);
}
error[E0308]: mismatched types
 --> src/main.rs:4:5
  |
3 |     const left_val: i8 = -123;
  |     ------------------ constant defined here
4 |     assert_eq!(x, 0);
  |     ^^^^^^^^^^^^^^^^
  |     |
  |     expected `&u8`, found `i8`
  |     this expression has type `(&u8, &{integer})`
  |     `left_val` is interpreted as a constant, not a new binding
  |     help: introduce a new binding instead: `other_left_val`
  |
  = note: this error originates in the macro `assert_eq` (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0614]: type `i8` cannot be dereferenced
 --> src/main.rs:4:5
  |
4 |     assert_eq!(x, 0);
  |     ^^^^^^^^^^^^^^^^
  |
  = note: this error originates in the macro `assert_eq` (in Nightly builds, run with -Z macro-backtrace for more info)

The error message, admittedly, is not very helpful.

Thankfully, you can't use this to make assert_eq pass/fail when it shouldn't. The worst you can achieve is a cryptic error message from the compiler. I think. So this "bug" is not really exploitable, plus chances of accidentally breaking this are probably pretty low (consts are usually named in UPPER_CASE in Rust), but the diagnostic is admittedly not very helpful.

The article I've linked above (https://sabrinajewson.org/blog/truly-hygienic-let) offers a potential solution for this. TL;DR: due to shadowing shenanigans having a function named left_val will prevent left_val from being interpreted as a const in patterns.

@rustbot label A-macros A-diagnostics C-bug D-confusing D-terse

@rustbot rustbot added needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. A-diagnostics Area: Messages for errors, warnings, and lints A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) C-bug Category: This is a bug. D-confusing Diagnostics: Confusing error or lint that should be reworked. D-terse Diagnostics: An error or lint that doesn't give enough information about the problem at hand. labels Oct 9, 2024
@GrigorenkoPV
Copy link
Contributor Author

The article I've linked above (https://sabrinajewson.org/blog/truly-hygienic-let) offers a potential solution for this. TL;DR: due to shadowing shenanigans having a function named left_val will prevent left_val from being interpreted as a const in patterns.

Unfortunately if we do this

--- a/library/core/src/macros/mod.rs
+++ b/library/core/src/macros/mod.rs
@@ -41,6 +41,11 @@ macro_rules! panic {
 #[allow_internal_unstable(panic_internals)]
 macro_rules! assert_eq {
     ($left:expr, $right:expr $(,)?) => {
+        {
+        #[expect(dead_code)]
+        fn left_val(){}
+        #[expect(dead_code)]
+        fn right_val(){}
         match (&$left, &$right) {
             (left_val, right_val) => {
                 if !(*left_val == *right_val) {
@@ -52,6 +57,7 @@ macro_rules! assert_eq {
                 }
             }
         }
+        }
     };
     ($left:expr, $right:expr, $($arg:tt)+) => {
         match (&$left, &$right) {

then this starts to compile

let x: () = ();
assert_eq!(x, {left_val()});

so still not 100% hygienic.

@jieyouxu jieyouxu added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Oct 9, 2024
@GrigorenkoPV
Copy link
Contributor Author

I see 3 ways of dealing with this:

  1. Just ignore it.
  2. Tweak the rules around hygiene in macros. Probably a nightmare for backwards compatibility.
  3. Introduce some way to specify that you explicitly want a pattern to not be a const patter. Some special syntax, or a keyword, or an attribute perhaps.

@jieyouxu jieyouxu removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 9, 2024
@dev-ardi
Copy link
Contributor

dev-ardi commented Oct 9, 2024

If this hasn't been a problem for the past 9 years we could change the name of left_val and right_val to __left_val and __right_val (like c++ does) to make it even less of a problem

@zachs18
Copy link
Contributor

zachs18 commented Oct 9, 2024

We could have assert_eq! (and other macros) use left_val@_ (etc) to give an explicit "match bindings cannot shadow constants" error if someone has a const left_val in scope (the left hand side of a @ pattern is (currently) always an identifer pattern; it is not superseded by path/const patterns like "bare" identifier patterns are).

@GrigorenkoPV
Copy link
Contributor Author

If this hasn't been a problem for the past 9 years we could change the name of left_val and right_val to __left_val and __right_val (like c++ does) to make it even less of a problem

Well, technically C++ also forbids user-created identifiers to start with _ IIRC, but yeah.

We could have assert_eq! (and other macros) use left_val@_ (etc) to give an explicit "match bindings cannot shadow constants" error if someone has a const left_val in scope (the left hand side of a @ pattern is (currently) always an identifer pattern; it is not superseded by path/const patterns like "bare" identifier patterns are).

This sounds like a good option. Not ideal, but given it is an obscure issue in the first place, probably good enough.

@danielhenrymantilla
Copy link
Contributor

fn feed<Arg, R>(
    arg: Arg,
    f: impl FnOnce(Arg) -> R,
) -> R {
    f(arg)
}

macro_rules! assert_eq {
    (
        $left:expr, $right:expr $(,)?
    ) => (
        $crate::feed(
            (&$left, &$right),
            {
                fn _left_val() {}
                fn _right_val() {}
                |(_left_val, _right_val)| {
                    if !(*_left_val == *_right_val) {
                        // …
                    }
                }
            }
        )
    );
}

😰

@Noratrieb Noratrieb added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Oct 14, 2024
@GrigorenkoPV
Copy link
Contributor Author

GrigorenkoPV commented Oct 19, 2024

fn feed<Arg, R>(
    arg: Arg,
    f: impl FnOnce(Arg) -> R,
) -> R {
    f(arg)
}

macro_rules! assert_eq {
    (
        $left:expr, $right:expr $(,)?
    ) => (
        $crate::feed(
            (&$left, &$right),
            {
                fn _left_val() {}
                fn _right_val() {}
                |(_left_val, _right_val)| {
                    if !(*_left_val == *_right_val) {
                        // …
                    }
                }
            }
        )
    );
}
😰

feed would need to be pub.

I wonder how it will play with the stability system.

@rpitasky
Copy link

What if the language had a way to create temporary variables in macros perfectly hygienically? Ex. Suppose we permit creating variables in macro_rules! that start with $, just like the macro arguments (i.e. let $left_val = 10;). These variables would have to have some clever scoping rules to prevent problems across macro expansions, but this would wholly eliminate this hygiene problem (at least, between macros and regular code, because $ is not a valid way to start an identifier in regular code).

I suppose this is a fourth way for your consideration, @GrigorenkoPV.

@GrigorenkoPV
Copy link
Contributor Author

What if the language had a way to create temporary variables in macros perfectly hygienically? Ex. Suppose we permit creating variables in macro_rules! that start with $, just like the macro arguments (i.e. let $left_val = 10;). These variables would have to have some clever scoping rules to prevent problems across macro expansions, but this would wholly eliminate this hygiene problem (at least, between macros and regular code, because $ is not a valid way to start an identifier in regular code).

I suppose this is a fourth way for your consideration, @GrigorenkoPV.

Or just a more concrete version of the 3rd. Technically, from what I can tell, in let a = b, a is treated like a pattern, so it can be

  • an identifier that is not eponymous to any const, in which case it gets used as a name for the new binding
  • an identifier that names come existing const, in which case it turns into a const pattern
  • literal like 1, 'a', or "foo"
  • Something like Typename(inner) or Typename{field: inner}
  • other things, see the reference

none of the above require any external information to disambiguate between, except the first two, which is our case. This, by itself, is not an issue 99.999% of the times, because you are the one who writes the code and controls what identifiers exist, but it turns out to not be the case inside macros.

So without any major changes to scoping rules, this can be fixed by any syntax that would allow to disambiguate the two without relying on the external info. For example, an attribute like #[rustc_this_is_not_const] or some symbol like ~ or $, which would say "never treat this as a const", would do.

As for the details of which exact spelling to use, $ would probably be conflicting with the existing macro syntax a bit.

@GrigorenkoPV
Copy link
Contributor Author

Anyways, cc #39412 as their declarative macros still suffer from this problem, but at least they are still unstable, so it is possible to do breaking changes to the scoping rules and whatnot.

#![feature(decl_macro)]

// const ident: u8 = 2;

macro hygeine($a:expr) {
    let ident = $a;
    println!("{ident}");
}

fn main() {
    hygeine!(1);
}

@zachs18
Copy link
Contributor

zachs18 commented Oct 21, 2024

Anyways, cc #39412 as their declarative macros still suffer from this problem, but at least they are still unstable, so it is possible to do breaking changes to the scoping rules and whatnot.

Actually, it looks like macros 2.0 handles this correctly (or at least, how I would expect)

This one I would expect to fail, since the macro definition also sees the const.

#![feature(decl_macro)]
const ident: u8 = 2;
macro hygeine($a:expr) {
    let ident = $a;
    println!("{ident}");
}
fn main() {
    hygeine!(1); // fails
}

This one works, since the macro definition does not see the const.

#![feature(decl_macro)]
macro hygeine($a:expr) {
    let ident = $a;
    println!("{ident}");
}
fn main() {
    const ident: u8 = 2;
    hygeine!(1); // works
}

@GrigorenkoPV
Copy link
Contributor Author

This one I would expect to fail, since the macro definition also sees the const.

Oh, yes, you are right. I have indeed misplaced it. Thank you for the correction.

That's good news that macros 2.0 solve this problem. So I guess given that this particular issue is probably of a rather low priority, we can afford to just wait however many years it takes for macros 2.0 to get stabilized. Or maybe we can rush ahead and change assert_eq to be a new shiny kind of macro now, but that can probably cause more breakage than it potentially fixes.

@traviscross
Copy link
Contributor

cc @rust-lang/wg-macros

@jhpratt
Copy link
Member

jhpratt commented Nov 15, 2024

My ideal solution to this (and other similar situations) is to give macro authors a way to generate a unique identifier out of thin air, presumably with some sort of seed so that it is deterministic.

@akarahdev
Copy link

My ideal solution to this (and other similar situations) is to give macro authors a way to generate a unique identifier out of thin air, presumably with some sort of seed so that it is deterministic.

I'm not the most versed on how rustc works, but I imagine it could work something like this:

pub macro some_macro() {
  let ${ident(x)} = 10;
  println!("{}", ${ident(x)}); 
}

The ${ident(...)} metavariable expression would allow you to generate a new identifier out of thin air. It could generate a new identifier that's guaranteed to be unique by hashing the identifier through some hashing method to a short string, then prepending various underscores until it's now a unique identifier.

Or, a whole new category of identifiers could be created specifically for this purpose. Since identifiers are a struct rustc_span::symbol::Ident, a new flag could be added to differentiate whether the identifier is coming from a macro or not.

pub struct Ident {
  pub name: String,
  pub span: Span,
  pub origin: Origin
}

enum Origin {
  Code,
  MacroGenerated {
    originating_macro: String,
    originating_id: u64
  }
}

The originating_macro in the MacroGenerated variant would be the name of the macro (or maybe the Id that refers to it in Macros 2.0) to minimize conflict by tracking origin. But what if the same macro nests itself? originating_id would do that. When a macro is expanding, each time it expands, a value somewhere is increased by 1, let's call it MACRO_TOTAL_DEPTH. After expanding, MACRO_TOTAL_DEPTH would not be reset to give each macro a unique expansion.

Again, I'm not entirely sure if this would work with how rustc works, or maybe I have a logic error. This is just an idea I have.

@Veykril
Copy link
Member

Veykril commented Dec 3, 2024

I don't quite see why we'd need to generate unique identifiers for this. The issue here lies with macro_rules macros having mixed site/semi-transparent hygiene which doesn't play well with patterns as is shown that they can refer to local variables (where they'd have def site/opaque hygiene) or consts/items (where they would have call site/transparent hygiene).

If one replaces the macro definition with the unstable macro which uses def site/opaque hygiene the error disappears https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=3eba965a7069578d09a4ba82c4c6016b

#![feature(decl_macro)]

macro assert_eq {
    ($left:expr, $right:expr $(,)?) => {
        match (&$left, &$right) {
            (left_val, right_val) => {
                if !(*left_val == *right_val) {
                    unimplemented!()
                }
            }
        }
    }
}

fn main() {
    let x: u8 = 0;
    const left_val: i8 = -123;
    assert_eq!(x, 0);
}

That is we already have a concept to fix this, its just not finished design wise. So there isn't really a need for any new idea here in my opinion. (Except for possibly a metavar expression to change hygiene of something, as macro as designed today is far too rigid wrt to hygiene to be useful)

@fmease fmease added the A-hygiene Area: Macro hygiene label Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints A-hygiene Area: Macro hygiene A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) C-bug Category: This is a bug. D-confusing Diagnostics: Confusing error or lint that should be reworked. D-terse Diagnostics: An error or lint that doesn't give enough information about the problem at hand. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests