Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New declarative macros, functions and fields not being recognized #91249

Open
BeastLe9enD opened this issue Nov 26, 2021 · 35 comments
Open

New declarative macros, functions and fields not being recognized #91249

BeastLe9enD opened this issue Nov 26, 2021 · 35 comments
Labels
A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) A-macros-2.0 Area: Declarative macros 2.0 (#39412) C-discussion Category: Discussion or questions that doesn't represent real issues. F-decl_macro `#![feature(decl_macro)]` requires-nightly This issue requires a nightly compiler in some way.

Comments

@BeastLe9enD
Copy link

I ran into some issues with rustc 1.58.0-nightly (b426445 2021-11-24).
I have the following code:

#![feature(decl_macro)]

pub macro generate_class_new($name: ident) {
    pub struct $name {
        x: i32
    }

    impl $name {
        pub fn new(x: i32) -> Self {
            Self { x }
        }
    }
}

generate_class_new!(MyTestClass);

fn main() {
    let instance = MyTestClass::new(3);
}

What I try to do here is creating a macro that just creates a struct with a given identifier as a name.
When I call MyTestClass::new(3), the compiler complains that this function does not exist:

error[E0599]: no function or associated item named `new` found for struct `MyTestClass` in the current scope
  --> src/main.rs:18:33
   |
4  |     pub struct $name {
   |     ---------------- function or associated item `new` not found for this
...
18 |     let instance = MyTestClass::new(3);
   |                                 ^^^ function or associated item not found in `MyTestClass`

running cargo expand produces code that should work in my opinion:

#![feature(prelude_import)]
#![feature(decl_macro)]
#[prelude_import]
use std::prelude::rust_2021::*;
#[macro_use]
extern crate std;
pub macro generate_class_new($ name : ident) {
    pub struct $name {
        x: i32,
    }
    impl $name {
        pub fn new(x: i32) -> Self {
            Self { x }
        }
    }
}
pub struct MyTestClass {
    x: i32,
}
impl MyTestClass {
    pub fn new(x: i32) -> Self {
        Self { x }
    }
}
fn main() {
    let instance = MyTestClass::new(3);
}

Not sure if I'm doing something wrong here, but it pretty much looks like a bug for me

@camelid camelid added A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) A-macros-2.0 Area: Declarative macros 2.0 (#39412) labels Nov 26, 2021
@camelid camelid changed the title New declerative macros, functions and fields not being recognized New declarative macros, functions and fields not being recognized Nov 26, 2021
@camelid
Copy link
Member

camelid commented Nov 26, 2021

My recollection is that declarative macros use "def-site" hygiene; that is, identifiers defined in the macro are scoped to the macro and cannot be referenced outside the macro.

Passing new as an argument to the macro makes the program compile successfully:

#![feature(decl_macro)]

pub macro generate_class_new($name: ident, $new: ident) {
    pub struct $name {
        x: i32
    }

    impl $name {
        pub fn $new(x: i32) -> Self {
            Self { x }
        }
    }
}

generate_class_new!(MyTestClass, new);

fn main() {
    let instance = MyTestClass::new(3);
}

@MSxDOS
Copy link

MSxDOS commented Nov 26, 2021

It works fine with old declarative macros: https://rust.godbolt.org/z/35oYrPv7E and that's how they're often used, including std.
So either it's a bug with declarative macros 2.0 or they're fundamentally useless.

@BeastLe9enD
Copy link
Author

Yes, with macro rules, its working as intended. But with the new gen 2.0 macros it doesn't.

@camelid
Copy link
Member

camelid commented Nov 26, 2021

IIRC, part of the reason for adding macros 2.0 was to have def-site hygiene. But I'm not sure, so I'll ask on Zulip.

@camelid camelid added the C-discussion Category: Discussion or questions that doesn't represent real issues. label Nov 26, 2021
@BeastLe9enD
Copy link
Author

Yea, wasn't really sure if its actually a bug or my mistake, but in my opinion it would be great if it would work because I think many people including me use macros for things like that, sure u could use macro_rules in that specific case, but perhaps one could initiate a little discussion in this context

@camelid
Copy link
Member

camelid commented Nov 26, 2021

I started a thread on Zulip about this.

@camelid
Copy link
Member

camelid commented Nov 26, 2021

@bjorn3 replied on Zulip and also thinks this is intended behavior. I'm going to nominate this for T-lang since the behavior seems to be surprising to users.

@camelid camelid added the I-lang-nominated Nominated for discussion during a lang team meeting. label Nov 26, 2021
@BeastLe9enD
Copy link
Author

I don't know if there is a better way to implement this, but you could put a $ in front of the identifier, if the identifier is not in the pattern, the identifier has the same * value * as its name and has a global scope

@camelid
Copy link
Member

camelid commented Nov 26, 2021

I don't know if there is a better way to implement this, but you could put a $ in front of the identifier, if the identifier is not in the pattern, the identifier has the same * value * as its name and has a global scope

I'm pretty sure $foo with no corresponding argument pattern is already accepted to mean $ foo.

@PatchMixolydic
Copy link
Contributor

Perhaps metavariable expression syntax could be used? This would give us something like pub fn ${escaping_ident(new)} (though the final product would probably have a better name).

I do feel that while the limitations are a bit surprising coming from macros 1.0, they ultimately make hygiene more consistent.

@inquisitivecrystal
Copy link
Contributor

It works fine with old declarative macros: https://rust.godbolt.org/z/35oYrPv7E and that's how they're often used, including std.
So either it's a bug with declarative macros 2.0 or they're fundamentally useless.

My impression is that one of the reasons macros 2.0 isn't stable is that their hygine isn't finished yet. I think currently the plain is to make def-site hygine the default, and add some way of distinguishing when you want different hygine. However, people haven't decided on what that way should be. Macros 2.0 haven't even been completely designed yet, let alone completely implemented, and it probably doesn't make sense to judge them on their current functionality.

@crlf0710 crlf0710 added F-decl_macro `#![feature(decl_macro)]` requires-nightly This issue requires a nightly compiler in some way. labels Nov 27, 2021
@BeastLe9enD
Copy link
Author

I don't think it was somebody's intention to judge the macros although they are not finished yet, it is much more about thinking about things and looking for a good solution together.

@lexi-lambda
Copy link

I think this issue is basically a question of “what does hygiene mean for a reference through a qualified name?”

When a macro expands to a free function definition, (definition site) hygiene is straightforward: bindings defined by the macro are not visible at the use site unless the binding identifier was passed as an argument to the macro. For example, if we adjust the program from the start of this thread to

#![feature(decl_macro)]

pub macro generate_struct_new($name: ident) {
    pub struct $name {
        x: i32
    }

    pub fn make_struct(x: i32) -> $name {
        $name { x }
    }
}

generate_struct_new!(MyTestStruct);

fn main() {
    let instance = make_struct(3);
}

then it should be utterly unsurprising that this program fails to compile under definition site hygiene. make_new is defined entirely by the macro and has no direct relationship to any identifier at the use site, so it is only visible to macro-generated code.

But when the declared identifier is a method, things become more complicated. Qualified names are built from multiple identifiers, so it isn’t immediately clear what effect hygiene should have on their scope. Personally, I think that MyTestClass::new should be visible at the use site, and here’s my reasoning:

  • Normally, the scope of a qualified reference depends on the visibility of the first segment of the path, e.g. foo::bar::baz depends on the presence of a foo identifier in scope, but not on the presence of bar or baz identifiers. Indeed, this is the whole point of using a qualified name! In this sense, qualified names are sort of like a foo.bar field access.

  • “Macro hygiene” just means that macros ought to respect the lexical scope of the program: scoping should be determined by the structure of the source code, not the expanded code, and the body of each macro definition has its own scope (in much the same way definitions in function bodies are scoped to the function).

  • In this program, the MyTestClass name is unambiguously in scope, so by the above reasoning, the scoping of new should not matter. To illustrate why, suppose we were to remove the macro altogether and simply define MyTestClass in a separate module:

    mod struct_defn {
      pub struct MyTestClass {
          x: i32
      }
    
      impl MyTestClass {
          pub fn new(x: i32) -> Self {
              Self { x }
          }
      }
    }
    
    use struct_defn::MyTestClass;
    
    fn main() {
        let instance = MyTestClass::new(3);
    }

    This program compiles just fine, as it should, as new is a pub fn, and MyTestClass is visible at the use site.

All this suggests to me that the current behavior is a bug, or at the very least a misfeature. The current behavior does not respect the lexical scope of the program, and therefore it does not respect hygiene.

@lexi-lambda
Copy link

An aside on the philosophy of hygienic macros

To elaborate on my previous comment a little more, I think it’s helpful to make a distinction between scope of bindings and the names of exports.

Macro hygiene is about the scope of bindings. The idea is that we intuitively think about variable scope lexically: variable uses reference the nearest enclosing definition with the same name, and both “nearest” and “enclosing” are words that describe the syntactic structure of the source program. When we add macros, we want that intuition to still apply, but that’s tricky, because macros can rearrange code in complex ways.

“Hygiene” just refers to a system that tracks enough information so that the compiler can figure out which identifiers came from which scopes in the source program and treat them accordingly. Hygiene is not about renaming identifiers, and it’s not about introducing more scoping rules to the language—it’s really about trying to simplify scoping rules by ensuring that macros respect the usual rules of lexical scope. So hygiene is really just a means to an end, and that end is “well-behaved macros”.

Many implementations of hygiene operate by enriching the representation of identifiers. In a language without macros, an identifier is just a symbolic name (i.e. a string), but to preserve hygiene, we need to keep track of extra information about where the identifier came from in the source program. This is the identifier’s “lexical context”, and in Rust’s proc_macro API, it’s stored in the identifier’s Span. This information is used to determine the lexical scope of every identifier at both its definition site and its use sites, which allows the lexical regions to be kept distinct.

But what about modules? Modules export a collection of names, and in a macro-enabled system, we must confront the question of whether modules’ exports correspond to identifiers, enriched with lexical context, or plain symbols, which are just strings. In both Racket and every Scheme implementation I’m aware of, the answer to that question is the latter: exported bindings are identified by plain symbols, not identifiers. This means that the rich, “three dimensional” scoping structure of identifiers is effectively “flattened” whenever identifiers cross a module boundary.

Implications

Racket employs the philosophy I outlined above uniformly. There is a firm separation between bindings, which are associated with identifiers, and exports, which are associated with symbolic names. This has a few implications:

  • Imports behave like definition forms for the purposes of scoping. When you write use foo::bar; in Rust (or (require (only-in foo bar)) in Racket), you are creating a new definition with the lexical scope of the use statement, and that definition refers to the bar export from the foo module.

    In effect, use statements bring flat module exports into the “rich” world of lexical bindings, and the lexical location of the use statement determines the binding’s lexical context. Since imports are like definitions, use statements can appear inside the body of a macro, and the bindings they introduce will be scoped to that macro in just the same way that any other definition would be.

  • Dually, exports behave just like variable uses for the purposes of scoping. In Racket, an identifier named foo is exported by writing (provide foo), which exports foo’s lexical binding at the location of the provide using the symbolic name foo—that is, it brings rich lexical bindings into the flat world of module exports.

    This subtlety is less relevant in Rust, because Rust exports bindings implicitly using pub declarations, but the general idea still applies.

  • Because modules bindings are flat symbols, a single module cannot exports two bindings with the same symbolic name, even if they are defined in different lexical scopes. For example, the following program is invalid:

    macro define_and_export() {
        pub const hello: u32 = 0;
    }
    
    define_and_export!();
    pub const hello: u32 = 1;

    If either of the pub keywords were dropped, this program would be legal, because the lexical bindings are in different scopes and do not conflict. But when they are exported, the lexical information is discarded, so the two definitions conflict—which makes sense, as it would be impossible for an importing module to specify of the two hello bindings it wants to import.

    (Note that this is not the choice implemented by rustc today, which accepts the program and simply makes the exported macro-defined hello binding inaccessible.)

  • In spite of the previous point, macros can expand to identifiers, which allows an identifier to directly refer to an identifier in a different module without going through the flat space of module bindings! For example, consider a program like this:

    mod m {
        macro define_get_hello() {
            const hello: u32 = 0;
    
            pub macro get_hello() {
                hello
            }
        }
    
        define_get_hello!();
        pub const hello: u32 = 1;
    }
    
    fn main() {
        println!("hello is: {}", m::get_hello!());
    }

    This program should print 0, not 1, even though the exported hello binding has the value 1, because the get_hello macro refers to the more-nested const hello: u32 = 0 binding. This is quite subtle, but it is crucial to preserve the idea that hygienic macros must respect lexical scope.

    (Unfortunately, this program does not currently compile at all, as the macro-defined get_hello macro is not visible at its use site for the reason described at the end of the previous bullet point.)

All together, this set of decisions appears to result in a fairly predictable, intuitive model of cross-module hygiene.

What Rust does today

Currently, Rust does not take the approach I’ve described so far. Instead, module exports are rich identifiers, not flat symbols. But as I described in my previous comment, I don’t just think that’s a different design decision, I actually think it’s arguably wrong, because it creates situations that don’t respect the usual scoping rules that apply elsewhere in the language.

I’d therefore propose that cross-module hygiene be revised to work more like it does in other systems, making exports flat symbols, not rich identifiers. However, I’m not sure what impact this would have on backwards compatibility, as it would be a fairly meaningful change in philosophy, and it’s currently unclear to me how much of this behavior affects macro_rules! versus how much is specific to “macros 2.0”.

@petrochenkov
Copy link
Contributor

petrochenkov commented Nov 28, 2021

@lexi-lambda

For example, the following program is invalid
If either of the pub keywords were dropped, this program would be legal

macro define_and_export() {
   pub const hello: u32 = 0;
}

define_and_export!();
pub const hello: u32 = 1;

How would the next example behave?

macro define_hello() {
    const hello: u32 = 0;
}

define_hello!();
const hello: u32 = 1;

mod inner {
    const use_hello: u32 = super::hello;
}

In Rust it's pretty deeply ingrained that items are "planted" (or "exported"/"provided" using the terminology above) into modules regardless of their visibility (i.e. pub or not).
So hello produced by a macro and not by a macro are supposed to conflict or not conflict regardless of their visibilities as well.

@petrochenkov
Copy link
Contributor

petrochenkov commented Nov 28, 2021

Regarding this specific issue, it doesn't look like a bug in the current system used by Rust, where resolution of module-relative paths (module::foo) and type-relative paths (Type::foo, like in this issue, resolved by type checker) is also affected by hygiene, like it was described above.

(Although implementation for the type-based resolution part in the compiler may be particularly underspecified and buggy compared to others.)

@petrochenkov
Copy link
Contributor

@MSxDOS

So either it's a bug with declarative macros 2.0 or they're fundamentally useless.

That's because a crucial part of the macro 2.0 story - a hygiene opt-out (call-site hygiene for declarative macros) - is still unimplemented. Without it all interactions with the call site has to be done through macro arguments, which is pretty inconvenient, no doubt.

@lexi-lambda
Copy link

How would the next example behave?

macro define_hello() {
    const hello: u32 = 0;
}

const hello: u32 = 1;

mod inner {
    const use_hello: u32 = super::hello;
}

I assume you meant to include a define_hello!() statement in the outer module body. Regardless, I think it’s a good example: it highlights that Rust’s notion of a “module” is not as cleanly decoupled from its notion of lexical scope as Racket modules are, which makes sense, as Rust uses nested modules much more pervasively than Racket does.

But, as it happens, Racket actually does have a direct analogue of a “lexically nested module”. In Racket, if you declare a submodule using the module+ form, it can reference all bindings in any of its parent modules without having to explicitly import them using require.

Using module+, we can construct a Racket program that is analogous to your Rust example:

#lang racket/base

(define-syntax-rule (define-hello)
  (define hello 0))

(define-hello)
(define hello 1)

(module+ inner
  (define use-hello hello))

This is a perfectly legal program, and use-hello holds the value 1, not 0. That’s because a module+ submodule has access to all of the bindings from the enclosing module because it is lexically contained inside it, so the reference to hello on the RHS of use-hello is a direct, lexical reference, and it doesn’t go through the “flat” space of module exports at all.

However, even in a module+ submodule, it’s possible to explicitly import the enclosing module by writing (require (submod "..")), where (submod "..") is like writing super in Rust. And doing that does a meaningfully different thing in Racket, because it accesses the enclosing module’s exports in the same fashion any other require would. So if we change the inner submodule to

(module+ inner
  (require (only-in (submod "..") hello))
  (define use-hello hello))

which explicitly imports hello from the parent module, the program is rejected with a compile-time error:

only-in: identifier `hello' not included in nested require spec
  in: (submod "..")

The error is justified, as the enclosing module has no such export named hello.


All of this illustrates a pretty clear difference between Racket and Rust, namely that Racket has a very explicit distinction between module imports and variable references, whereas Rust has no such distinction. Rather, modules are never explicitly “opened”, so all bindings in all modules are always implicitly “in scope”, but explicit qualification may be needed to reference them, and some of them are illegal to access from certain contexts because of visibility rules.

That difference is fine—obviously not all languages have to work like Racket or Scheme—it just means it’s that much more important to decide what hygiene and lexical scope mean in the context of Rust. Otherwise, after all, the behavior is arbitrary and impossible to really justify beyond “it does what it happens to be programmed to do”.

In my opinion, the behavior of OP’s program seems pretty difficult to justify under any interpretation of lexical scope for the reasons I gave in my first comment—the difference compared to a nested module makes the current behavior pretty intensely suspect—but it’s a strong opinion, weakly held. :) I’m open to being convinced otherwise. However, I don’t find this argument convincing at all:

Regarding this specific issue, it doesn't look like a bug in the current system used by Rust, where resolution of module-relative paths (module::foo) and type-relative paths (Type::foo, like in this issue, resolved by type checker) is also affected by hygiene, like it was described above.

My quibble is that saying something is “affected by hygiene” doesn’t really mean anything, because again, the idea of hygiene is to respect the lexical structure of the program. The whole challenge of pinning down what hygiene means is deciding what the right scoping structure is supposed to be, then designing an algorithm that can match those intuitions. Uniformly hiding identifiers without some principled justification is, in my mind, not really any better than having no hygiene at all and allowing everything to be visible all the time (and in fact I think hiding too much is possibly even less useful than hiding nothing at all).


So let’s ignore what Racket does for a moment—I don’t want to make it sound like I think Racket’s semantics are somehow the only right ones—and instead think about how scope works in Rust. For OP’s program, I already gave a Rust example without macros that illustrates how Rust’s scoping and visibility work, and how they’re inconsistent with the scoping of pub macro macros, which is unintuitive.

Let’s try and apply the same reasoning to your program:

macro define_hello() {
    const hello: u32 = 0;
}

define_hello!();
const hello: u32 = 1;

mod inner {
    const use_hello: u32 = super::hello;
}

I think, intuitively, Rust programmers would expect the use of super::hello to refer to the one defined directly, not the one defined by the macro, as the const hello: u32 = 0 statement clearly appears in a scope that does not enclose the use of super::hello. Therefore, to respect lexical scope, we must not pick the macro-defined definition.

One way to view this in the context of hygiene would be to say that references to enclosing modules actually are lexical references, and qualification just selects a scope to circumvent any shadowing. So far, this is consistent with what I said in my first comment:

Normally, the scope of a qualified reference depends on the visibility of the first segment of the path, e.g. foo::bar::baz depends on the presence of a foo identifier in scope, but not on the presence of bar or baz identifiers.

So super always refers to the enclosing module, wherever it lexically appears. This same style of reasoning also allows us to easily decide what super should do if it appears inside a macro:

const hello: u32 = 0;

mod a {
    pub macro get_hello() {
        super::hello
    }
}

mod b {
    const hello: u32 = 1;

    mod c {
        fn foo() -> u32 {
            super::super::a::get_hello!()
        }
    }
}

Should b::c::foo() return 0 or 1? If we’re being hygienic, there’s no question at all: it must return 0, because if it were to return 1, then the macro system would not be respecting lexical scope. Lexically, the super::hello reference appears inside module a, so it must reference the hello visible from that location.

Now, it’s totally possible that these rules don’t actually reconcile my previous comment suggesting that module exports be plain symbols. It’s possible that’s somehow incompatible with some scoping rules Rust has. But in that case, it’s still necessary to pin down what references to bindings in modules that don’t lexically enclose the current context means from the perspective of lexical scope, since otherwise, trying to come up with a hygiene system that respects it is hopeless.

@inquisitivecrystal
Copy link
Contributor

@lexi-lambda's points seem pretty convincing to me, though I'm by no means an expert on the issue. I also really appreciate the references to racket, as it was my favorite language for many years, even if I never learned how to use it at beyond a beginner level. Racket takes its macros about as seriously as Rust takes ownership and lifetimes. In any case, I'm really glad the lang team has been asked to take a look at this. ❤️

@yanchith
Copy link
Contributor

@lexi-lambda

Let’s try and apply the same reasoning to your program:

macro define_hello() {
    const hello: u32 = 0;
}

define_hello!();
const hello: u32 = 1;

mod inner {
    const use_hello: u32 = super::hello;
}

I think, intuitively, Rust programmers would expect the use of super::hello to refer to the one defined directly, not the one defined by the macro, as the const hello: u32 = 0 statement clearly appears in a scope that does not enclose the use of super::hello. Therefore, to respect lexical scope, we must not pick the macro-defined definition.

Might be just my misunderstanding (or I am missing the greater point), but should this compile at all regardless of mod inner? We are defining the same symbol twice. I imagine this was a stepping stone towards your later point about lexical scoping in macros (and the macro selecting const hello: u32 = 0), which intuitively does make sense:

const hello: u32 = 0;

mod a {
   pub macro get_hello() {
       super::hello
   }
}

mod b {
   const hello: u32 = 1;

   mod c {
       fn foo() -> u32 {
           super::super::a::get_hello!()
       }
   }
}

@lexi-lambda
Copy link

Might be just my misunderstanding (or I am missing the greater point), but should this compile at all regardless of mod inner? We are defining the same symbol twice.

Yes, it should absolutely compile, as we are not defining the same symbol twice! Or, at least, we aren’t defining the same identifier twice. In a hygienic system, two definitions can coexist with the same symbolic name as long as they have different identifiers (which goes right back to my comment from above about the difference between “flat” symbols and “rich” identifiers).

If that weren’t the case, then we’d have lost the key principle of hygienic macros, namely that macros respect lexical scope, since two bindings from two different lexical scopes would conflict with each other, forcing the programmer to worry about such details by manually generating fresh names. That’s not good!

But I think your comment is good, because it highlights a crucial point that has not yet been explicitly made in this conversation: hygienic macros are fundamentally at odds with the notion of canonical paths without enriching paths with lexical context. If path segments are flat symbols, two different bindings could easily have the same canonical path. In my mind, there are really only two ways to resolve this tension:

  1. Path segments must be rich identifiers, not flat symbols. My understanding is that this is the approach currently taken by “macros 2.0”, which works fine to avoid conflicting canonical names, but leads to the questionable behavior exhibited by OP’s program.

  2. Some definitions may lack canonical paths. For example, the macro-introduced hello definition in that example program might lack any canonical path at all. Such identifiers could be referenced lexically (possibly across modules via a macro-introduced lexical reference), but could not be referenced through a qualified path.

I suspect that, to many people, option 2 sounds much more dire than option 1, but I actually think it’s the better tradeoff. And again, the justification ultimately comes down to stepping back and thinking about what hygiene means, because without doing that, it’s all hopeless. So let’s consider, for a moment, a program like this:

fn define_hello() {
    const hello: u32 = 0;
}

Does this definition of hello have a canonical path? Of course not—it’s a local variable defined in a local scope. So, by analogy, a macro system that respects lexical scope should treat macro-local definitions the same way, and therefore there’s nothing wrong with those definitions lacking canonical paths.

To reiterate: programmers who have not spent much time working with truly hygienic macro systems have gotten used to reasoning about macros in terms of what they expand into, but this is really at odds with what hygiene is all about. Hygiene is about making it possible to reason locally about macros just by looking at the structure of the source code, the same way we reason about functions locally by looking at their source code and assuming they respect lexical scope. That goal must always be the ground truth when talking about hygiene—anything else puts the cart before the horse.

@petrochenkov
Copy link
Contributor

petrochenkov commented Nov 29, 2021

Does this definition of hello have a canonical path? Of course not

Of course it has! UPD: Didn't notice it was a function.
The current system was made to fit naturally into the rest of the language, I have hard time imagining how you'd implement the new proposed system (although you probably can, with some extra work and increased complexity).

ultimately comes down to stepping back and thinking about what hygiene means

Hey, I'm pretty sure jseyfried (and reviewers) did think about what hygiene means when implementing the logic for multi-segment paths :)
It's just the conclusion were different - it's not only about lexical scope.
I'll try to go through git blame and find the relevant PRs.
(I admit, a lot of the compiler code was written by people who didn't know what they were doing, but this is not the case.)

@lexi-lambda
Copy link

Does this definition of hello have a canonical path? Of course not

Of course it has!

No, it does not. Note that the example I gave there is a function definition, not a module definition. The section of the Rust Reference I linked is quite unambiguous about this:

Implementations and use declarations do not have canonical paths, although the items that implementations define do have them. Items defined in block expressions do not have canonical paths.

It would, in my mind, be quite a departure to say that local definitions inside macro bodies have canonical paths even though local definitions inside functions do not (and hygiene means we really want macros to be as “function-like” in their scoping rules as possible).

@petrochenkov
Copy link
Contributor

petrochenkov commented Nov 29, 2021

Note that the example I gave there is a function definition

Yep, you are right, didn't notice it was a function.
(Items defined in in {} blocks, including function bodies, can only be found by lexical search.)

I still disagree that macros are supposed to be this close to functions, they are supposed to generate pieces of code inserted into arbitrary places after all, including items available to other modules (nested or not).

@lexi-lambda
Copy link

I still disagree that macros are supposed to be this close to functions, they are supposed to generate pieces of code inserted into arbitrary places after all, including items available to other modules (nested or not).

Yes, that’s fair—and to be honest, after spending most of the past 6 hours thinking about this and trying out various examples… I think I’m wrong! Because sometimes you really do want such names to be kept distinct, like in a macro like this:

macro define_struct($name: ident, $field_name: ident, $make_name: ident, { $($defn: item)* }) {
    struct $name { $field_name: i32 }

    impl $name {
        pub fn secret() -> Self {
            $name { $field_name: 42 }
        }

        $($defn)*
    }

    fn $make_name() -> $name {
        $name::secret()
    }
}

define_struct!(S, x, make_s, {
    pub fn get_x(self) -> i32 {
        self.x
    }
});

fn main() {
    println!("{}", make_s().get_x());
}

In this case, input to the macro is intermingled with macro-generated definitions inside the impl block, and it is valuable for them to remain distinct. What bothers me is that this means macro scopes behave meaningfully differently from module scopes, as definitions in implementations are visible everywhere, regardless of what module they happen to have been defined within. But I think I’ve convinced myself that’s really a property of modules being unusual, not macros being unusual, and the behavior that macros have now is probably the right one (albeit not particularly helpful without some way to selectively break/bend hygiene).

So I’ll rescind my quibbles for now and think about things more—sorry about the noise. That said, I did find some genuine bugs in the current implementation during my experimentation. For example, I think this program does the wrong thing:

mod m {
    pub const x: i32 = 0;
}

macro mac_a($mac_b: ident) {
    mod m {
        pub const x: i32 = 1;
    }

    macro $mac_b() {
        crate::m::x + m::x
    }
}

mac_a!(mac_b);

fn main() {
    println!("{}", mac_b!());
}

I think it should print 2, but currently it prints 1, which suggests that crate::-qualified paths do in fact resolve non-lexically, whereas they ought to resolve lexically… but that is unrelated to this issue, anyway.

@petrochenkov
Copy link
Contributor

petrochenkov commented Nov 29, 2021

which suggests that crate::-qualified paths do in fact resolve non-lexically

The general logic is that the identifier foo in base_path::foo is "adjusted to the base_path" before resolving so paths from macros can refer to any kinds of items from the outer world, like standard library names or whatever, with full paths.
I don't quite remember the details of this operation though (especially in case of type-relative paths).

@danielhenrymantilla
Copy link
Contributor

The current behavior, despite annoying, seems to be the most consistent one: in Rust, all the identifiers / path segments are "rich" (to use @lexi-lambda's terminology) / Span::resolve_at-imbued. pub and other privacy questions are thus orthogonal: "rich" path segments are used to know which item one is talking about, and then privacy rules come into play to verify we are allowed to name that path / have the permission to do so. That being said, I'll be using "private-to-the-macro" terminology to talk about hygiene-hidden paths / identifiers.

  • Granted, making an "exception" for the specific case of associated items (e.g., methods) within an impl or trait definition seems plausible at first glance (in other words: it could seem plausible to think that a macro would not need such a helper definition; that it could do with free items only). But I disagree: having "private-to-the-macro" associated items would not only be often a nice thing to have (no need to reiterate all the generics and churn), it could even become mandatory-ish for things like:

    impl ... {
        def_type!(Foo);
    }

    to be able to define helper assoc items potentially used within the type Foo definition, with such helper assoc items being "private-to-the-macro".

So:

The behavior observed is not a bug, even if cumbersome

The behavior observed is indeed cumbersome, and there should be a way to alleviate / palliate it

This is where "there must be a hygiene opt-out" mechanism for such macros becomes relevant (nowadays, I'd say that the opt-out is to use a macro_rules! macro 🙃(or circumventing the issue altogether with macros fed as input)), such as some special sigil syntax, like fn ${new}, or the already accept macro operators. And at this point I think we're back to discussions / comments already present in the tracking issue for macros 2.0 (so I personally find that this issue could be closed).


FWIW, assuming the existence of match!, for immediately called anonymous macros, I could imagine doing:

match! (new) { ($new:ident) => (
    pub macro generate_class_new($name: ident) {
        pub struct $name {
            x: i32
        }

        impl $name {
            pub fn $new(x: i32) -> Self {
                Self { x }
            }
        }
    }
)}

as a way to solve this identifier issue without polluting all the callsites, and without requiring any extra features.

That being said, this one currently does not work (and this could be considered a bug!): Playground

@nikomatsakis
Copy link
Contributor

We discussed this in our lang team meeting today. I've added a reference to this issue from the tracking issue (#39412) and I'm going to remove the lang team nomination. We are not actively working on the design of hygiene for macros 2.0 right now. I think the general sense is that yes, we will need some form of way to have methods that are "publicly exported" from a macro, but the precise design of such a mechanism is precisely what the macros 2.0 work is all about!

@nikomatsakis nikomatsakis removed the I-lang-nominated Nominated for discussion during a lang team meeting. label Dec 14, 2021
@Ygg01
Copy link

Ygg01 commented Jul 11, 2022

Damn, wish this was the default macro usage.

Looking at:

pub macro generate_class_new($name: ident) {
    pub struct $name {
        x: i32
    }

    impl $name {
        pub fn new(x: i32) -> Self {
            Self { x }
        }
    }
}

Anyway, is there ever a logical reason to write pub fn in a macro without wanting to bypass hygiene rules?

@bjorn3
Copy link
Member

bjorn3 commented Jul 11, 2022

There is for non-pub fn. Macro hygiene can't distinguish between pub and non-pub items, so pub items need to use the same hygiene.

@danielhenrymantilla
Copy link
Contributor

We could imagine something like:

pub macro generate_class_new($name: ident) {
    pub struct $name {
        x: i32
    }

    impl $name {
        pub /* for macro */ fn new(x: i32) -> Self {
            Self { x }
        }
    }
    
    pub macro $name() {
        $name::new(42)
    }
}

which defines the struct Foo, as well as a Foo! constructor. If the Foo! constructor is used from a downstream crate, then the new() it calls would need to be pub.

Oftentimes macro_rules! macros which want to do this thus use:

#[doc(hidden)] /* Not part of the public API */
pub fn __new()

which is less clean and actually brittle if multiple macros pick the same __new() name and collide. Hygiene is the perfect fit and solution for these situations, and as my example showcased, the situation can be needed even in the presence of pub privacy.


More generally, macros operate on tokens, not on semantics, so it would be a very surprising thing to have pub semantics, which are resolved after macro expansion, somehow being able to affect hygiene semantics, which are involved during macro expansion.


But I do agree that there are cases where we want to disregard hygiene, to opt-out of the stricted def-site hygiene semantics, so there should be some mechanism to express this opt-out 💯 (just not a pub beforehand, it's not the right tool for the job)

@Ygg01
Copy link

Ygg01 commented Jul 12, 2022

So what would be best venue to discuss decl macros is Github preferred or Zulip or Discourse?

EDIT: Although one question would be universal. What is the maximum granularity of macro hygiene bending?

  1. Do we want to define it when calling macro?
  2. Define it in the macro itself?
    • If yes - then on macro definition? Method definition? impl block-level definition?

@MatrixDev
Copy link

MatrixDev commented Jul 25, 2022

Maybe we can just "reuse" export keyword inside macro? This way behaviour will be self-documenting and will do almost the sample thing as it does for the macro_rules.

Also errors can just say something like "Can't access non-exported item. Did you forget to add export?" when identifier is requested and present only in the generated code but not exported.

pub macro generate_class_new($name: ident) {
    pub struct $name {
        x: i32
    }

    impl $name {
        pub export fn new(x: i32) -> Self {
            Self { x }
        }
    }
}

@JustusFluegel
Copy link

JustusFluegel commented Apr 23, 2024

First of all I really like the export idea and that seems intuitive to me 👍

But fwiw I would expect the export keyword to come first and not the visibility quantifier, since in my mind the export "annotates" the whole item and basically ( I know that this is probably not what really happens but I think it's a useful analogy ) "copies" the annotated code into call-site scope.

so for me pub export xyz reads more like the export is pub of some kind (which doesn't really make sense imo) and not that we are "exporting" a pub xyz item like export pub xyz does.

tldr; I prefer

macro something() {
    export pub fn hello_world() -> &'static str {
        "hello world"
    }
}

over

macro something() {
    pub export fn hello_world() -> &'static str {
        "hello world"
    }
}

PS: I am not really involved in compiler dev here so I don't know if this is even still being worked on or something completely else was decided so if that is the case just ignore my comments here :)

PPS: If there is like no development happening on this anymore and this is still something that is probably a good idea and could be further pursued and only needs people to work on it, I probably will have a lot of free time on my hands in the next few months and would like to learn a bit more about compiler internal stuff in that time (with the goal of contributing) then I could probably take a look at this in the future. Don't want to thread on anyone's feet here tho, so if someone knows more than me about the current state of things it would be amazing to get a few pointers in the right direction :)

@A4-Tacks
Copy link

over

macro something() {
    pub export fn hello_world() -> &'static str {
        "hello world"
    }
}

Perhaps a method similar to #83527 would be more ideal, like ${export(hello_world)}

macro something() {
    pub fn ${export(hello_world)}() -> &'static str {
        "hello world"
    }
}

Because this directly targets the ident

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) A-macros-2.0 Area: Declarative macros 2.0 (#39412) C-discussion Category: Discussion or questions that doesn't represent real issues. F-decl_macro `#![feature(decl_macro)]` requires-nightly This issue requires a nightly compiler in some way.
Projects
None yet
Development

No branches or pull requests