rustdoc: Stop cloning the resolver #83761

jyn514 · 2021-04-01T22:24:23Z

Rustdoc currently creates a copy of the resolver to use for intra-doc links:

Lines 350 to 354 in d474075

    
           crate fn create_resolver<'a>( 
        
               externs: config::Externs, 
        
               queries: &Queries<'a>, 
        
               sess: &Session, 
        
           ) -> Rc<RefCell<interface::BoxedResolver>> {

This is a Terrible, Horrible, No Good, Very Bad Idea. In particular, it causes rustdoc's copy of the resolver and the TyCtxt to disagree about what crates exist:

It's also distorting rustc_resolve, since not all of the outputs make sense to clone in the first place: #65625 (comment)

We should refactor rustdoc somehow to allow getting rid of Resolver::clone_outputs. @petrochenkov suggests moving anything that needs to touch the resolver before HIR lowering: #68427 (comment).

@petrochenkov what do you think about @eddyb's comment in #65625 (comment) ?

Why would cloning be needed? Is this that support for resolving things after rustc_resolve finishes?
In that case I'm not sure why we need to clone the outputs - is it to be able to keep the original resolver around?

Would it be possible for lower_to_hir to stop stealing the resolver?

rust/compiler/rustc_interface/src/queries.rs

Line 227 in d474075

let resolver = peeked.1.steal();

Then rustdoc wouldn't need to clone it in the first place, it could just call queries.expansion().peek().1 whenever it needs access to the resolver.

See #68427 for previous discussion.

Implementation History

Initial "early crate loader": rustdoc: Don't load all extern crates unconditionally #83738
Don't require clean::Items before resolving intra-doc links: rustdoc: Store intra-doc links in Cache instead of on items directly #83833
Separate type-relative resolution from path resolution (at least somewhat): rustdoc: Cleanup handling of associated items for intra-doc links #83849
Move early loader into collect_intra_doc_links: rustdoc: Move crate loader to collect_intra_doc_links::early #84101

Move all path resolution from late to early module and add new field on DocContext for partially resolved links

Change preprocess_link to also take into account the hacks for Self:: and crate:

rust/src/librustdoc/passes/collect_intra_doc_links.rs

Line 880 in ef52471

    
           // FIXME(jynelson): this shouldn't go through stringification, rustdoc should just use the DefId directly

,

rust/src/librustdoc/passes/collect_intra_doc_links.rs

Lines 1124 to 1153 in ef52471

    
           let resolved_self; 
        
           // replace `Self` with suitable item's parent name 
        
           let is_lone_self = path_str == "Self"; 
        
           let is_lone_crate = path_str == "crate"; 
        
           if path_str.starts_with("Self::") || is_lone_self { 
        
               if let Some(ref name) = self_name { 
        
                   if is_lone_self { 
        
                       path_str = name; 
        
                   } else { 
        
                       resolved_self = format!("{}::{}", name, &path_str[6..]); 
        
                       path_str = &resolved_self; 
        
                   } 
        
               } 
        
           } else if path_str.starts_with("crate::") || is_lone_crate { 
        
               use rustc_span::def_id::CRATE_DEF_INDEX; 
        
               // HACK(jynelson): rustc_resolve thinks that `crate` is the crate currently being documented. 
        
               // But rustdoc wants it to mean the crate this item was originally present in. 
        
               // To work around this, remove it and resolve relative to the crate root instead. 
        
               // HACK(jynelson)(2): If we just strip `crate::` then suddenly primitives become ambiguous 
        
               // (consider `crate::char`). Instead, change it to `self::`. This works because 'self' is now the crate root. 
        
               // FIXME(#78696): This doesn't always work. 
        
               if is_lone_crate { 
        
                   path_str = "self"; 
        
               } else { 
        
                   resolved_self = format!("self::{}", &path_str["crate::".len()..]); 
        
                   path_str = &resolved_self; 
        
               } 
        
               module_id = DefId { krate, index: CRATE_DEF_INDEX }; 
        
           }

The text was updated successfully, but these errors were encountered:

petrochenkov · 2021-04-02T06:00:25Z

I'm not sure what do you mean by freezing the resolver outputs, but I meant the regular rustc's mode of operation

resolver is unique and is not kept under Rc
therefor to_resolver_outputs always results in calling Resolver::into_outputs rather than Resolver::clone_outputs
Resolver::into_outputs consumes the resolver with ResolverOutputs being the only thing left from it, i.e. it's a "frozen" summary of the resolver's work at a certain point

So we need to achieve "freezing" the resolver outputs in rustdoc rather than to avoid it.

Regarding eddyb's comment.
Yes, rustdoc resolves things after creating tcx (which needs ResolverOutputs to be created), so we have to keep the resolver.
This needs to be avoided by moving all name resolution activity in rustdoc to an earlier point.

Would it be possible for lower_to_hir to stop stealing the resolver?
Then rustdoc wouldn't need to clone it in the first place, it could just call queries.expansion().peek().1 whenever it needs access to the resolver.

Not sure about the details of rustc_interface's work, but yes, resolver should be unique and should not be cloned.

jyn514 · 2021-04-02T06:29:19Z

So we need to achieve "freezing" the resolver outputs in rustdoc rather than to avoid it.

Ok, I see, I updated the title.

Yes, rustdoc resolves things after creating tcx (which needs ResolverOutputs to be created), so we have to keep the resolver. This needs to be avoided by moving all name resolution activity in rustdoc to an earlier point.

I'm not quite sure I follow - is there a reason the tcx has to freeze the resolver into ResolverOutputs? Or could the tcx store the resolver itself? If it stored the resolver rustdoc could stop cloning it without major changes.

To be clear, I don't yet know how much work it would be for rustdoc to move all resolution before creating a TyCtxt, but the model where TyCtxt stores a resolver makes more sense in my head; or at least, it's more flexible.

Possible scheme for rustdoc to move resolution before TyCtxt

Remove clean::Attributes::links
Define a new type:

enum EarlyIntraDocLink {
  Resolved(clean::ItemLink),
  AssocItem { base_ty: Res, item_name: Symbol, ns: Namespace, /* possibly more fields are necessary */ },
  Variant { parent_def: Res, variant: Symbol }, // this might be able to be resolved early since the resolver implements DefIdTree, see https://github.com/rust-lang/rust/blob/d1065e6cefa41fe6c55c9819552cdd61529096fc/src/librustdoc/passes/collect_intra_doc_links.rs#L2121 for the current logic
  // This can't be emitted early because it's a HIR lint, it needs a TyCtxt available.
  Error { kind: ResolutionFailure, diag: DiagnosticInfo, path_str: String, disambiguator: Disambiguator },
}

Add a new field to DocContext: intra_doc_links: FxHashMap<DefId, Vec<EarlyIntraDocLink>>
Expand IntraDocCrateLoader (from rustdoc: Don't load all extern crates unconditionally #83738) to fill out that field by handling more things that are currently in collect_intra_doc_links. I don't think it will be able to handle anything more than resolve_path:

rust/src/librustdoc/passes/collect_intra_doc_links.rs

Line 469 in d1065e6

fn resolve_path(&self, path_str: &str, ns: Namespace, module_id: DefId) -> Option<Res> {
Change collect_intra_doc_links to remove all uses of the resolver.

For context, these are the current uses of the resolver in collect_intra_doc_links:

rust/src/librustdoc/passes/collect_intra_doc_links.rs

Lines 762 to 772 in d1065e6

    
           let mut resolver = cx.resolver.borrow_mut(); 
        
           let in_scope_traits = cx.module_trait_cache.entry(module).or_insert_with(|| { 
        
               resolver.access(|resolver| { 
        
                   let parent_scope = &ParentScope::module(resolver.get_module(module), resolver); 
        
                   resolver 
        
                       .traits_in_scope(None, parent_scope, SyntaxContext::root(), None) 
        
                       .into_iter() 
        
                       .map(|candidate| candidate.def_id) 
        
                       .collect() 
        
               }) 
        
           });

(shouldn't need access to tcx)

rust/src/librustdoc/passes/collect_intra_doc_links.rs

Lines 425 to 431 in d1065e6

    
           fn resolve_macro( 
        
               &self, 
        
               path_str: &'a str, 
        
               module_id: DefId, 
        
           ) -> Result<Res, ResolutionFailure<'a>> { 
        
               let path = ast::Path::from_ident(Ident::from_str(path_str)); 
        
               self.cx.enter_resolver(|resolver| {

(doesn't need access to tcx)

rust/src/librustdoc/passes/collect_intra_doc_links.rs

Lines 319 to 323 in d1065e6

    
           let ty_res = self 
        
               .cx 
        
               .enter_resolver(|resolver| { 
        
                   resolver.resolve_str_path_error(DUMMY_SP, &path, TypeNS, module_id) 
        
               })

(only needs tcx after using resolver)

rust/src/librustdoc/passes/collect_intra_doc_links.rs

Lines 469 to 472 in d1065e6

    
           fn resolve_path(&self, path_str: &str, ns: Namespace, module_id: DefId) -> Option<Res> { 
        
               let result = self.cx.enter_resolver(|resolver| { 
        
                   resolver 
        
                       .resolve_str_path_error(DUMMY_SP, &path_str, ns, module_id)

(doesn't need tcx)

petrochenkov · 2021-04-02T21:26:50Z

Or could the tcx store the resolver itself?

It may be possible.
Resolver contains a lot of early compilation stuff like NodeIds that is out of place in tcx though.
Also it may be preferable to finish all the resolution activities as early as possible (at least those that may result in loading new crates).
I think we may end up with some scheme working similarly to this if incremental compilation and queries advance so much that we'll start skipping resolution of paths if they are not actually used, but that would probably be a very different resolver.

For now, I'd prefer to normalize to what rustc does and have a single scheme with resolver working early instead.
Paths in doc comments are not fundamentally different from paths in e.g. expressions, and if we visit and resolve paths early in general, then paths in doc comments can be visited and resolved early as well.
(If we migrate to something like "resolver in tcx" in the future, then rustc and rustdoc will also migrate to it together.)

jyn514 · 2021-04-04T02:21:24Z

Remove clean::Attributes::links

Done in #83833 (although I need to clean it up before it can be merged).

jyn514 · 2021-04-04T03:01:57Z

Expand IntraDocCrateLoader (from #83738) to fill out that field by handling more things that are currently in collect_intra_doc_links. I don't think it will be able to handle anything more than resolve_path:

This isn't currently possible because resolving Self goes through the TyCtxt to get the name instead of the resolver:

rust/src/librustdoc/passes/collect_intra_doc_links.rs

Lines 868 to 885 in 2616ab1

    
           // FIXME(jynelson): this shouldn't go through stringification, rustdoc should just use the DefId directly 
        
           let self_name = self_id.and_then(|self_id| { 
        
               if matches!(self.cx.tcx.def_kind(self_id), DefKind::Impl) { 
        
                   // using `ty.to_string()` (or any variant) has issues with raw idents 
        
                   let ty = self.cx.tcx.type_of(self_id); 
        
                   let name = match ty.kind() { 
        
                       ty::Adt(def, _) => Some(self.cx.tcx.item_name(def.did).to_string()), 
        
                       other if other.is_primitive() => Some(ty.to_string()), 
        
                       _ => None, 
        
                   }; 
        
                   debug!("using type_of(): {:?}", name); 
        
                   name 
        
               } else { 
        
                   let name = self.cx.tcx.opt_item_name(self_id).map(|sym| sym.to_string()); 
        
                   debug!("using item_name(): {:?}", name); 
        
                   name 
        
               } 
        
           });

Here is a fix (it uses a tcx currently, but the same API is available on CStore): 5d5be02

Original ramblings before I realized this was easy

There are two possible ways forward:

Fix the FIXME at the top and use the DefId directly instead of stringifying. That requires finding a way to call .def_kind with only a Resolver - looks like I can do that with CStore::def_kind, and I can get a CStore from Resolver::cstore.
Find a way to get the stringified name of Self with the resolver directly (just an equivalent of opt_item_name should be enough). (This is 5d5be02)

jyn514 · 2021-04-04T03:05:16Z

Oh wait no I'm dumb, 5d5be02 only helps if the item isn't an impl block. @petrochenkov Is there a way to resolve the Self type of an impl block? I tried resolver.resolve_str_path_error(DUMMY_SP, "Self", TypeNS, self_id), but that panics because it's not a module.

If not it's not a giant deal, I'll just have to figure out how to fix the FIXME on line 868.

jyn514 · 2021-04-04T04:13:08Z

Ugh, I have to figure out how to replicate this bit somehow:

rust/src/librustdoc/passes/collect_intra_doc_links.rs

Lines 1824 to 1833 in 2616ab1

    
           for &ns in &[TypeNS, ValueNS, MacroNS] { 
        
               if let Some(res) = 
        
                   collector.check_full_res(ns, &start, module_id, &None) 
        
               { 
        
                   debug!("found partial_res={:?}", res); 
        
                   *partial_res = Some(res); 
        
                   *unresolved = end.into(); 
        
                   break 'outer; 
        
               } 
        
           }

Maybe the early pass could do that? I think the late pass for type-relative paths will need to have a partial resolution anyway, so it shouldn't be too much more work.

petrochenkov · 2021-04-04T08:22:53Z

Is there a way to resolve the Self type of an impl block?

rustc keeps the "current impl" (or other item providing Self) in the late resolution visitor walking AST (see fn with_self_rib_ns).
The analogous visitor in rustdoc is IntraLinkCrateLoader right now.

I'm not sure how rustdoc resolves Self now, but it cannot do it through Resolver because resolver doesn't provide any methods able to resolve it. So it's probably fine for rustdoc to just continue doing what it's doing now.

jyn514 · 2021-04-04T12:37:56Z

rustc keeps the "current impl" (or other item providing Self) in the late resolution visitor walking AST (see fn with_self_rib_ns).

Thanks, this is helpful!

I'm not sure how rustdoc resolves Self now, but it cannot do it through Resolver because resolver doesn't provide any methods able to resolve it. So it's probably fine for rustdoc to just continue doing what it's doing now.

Hmm, ok. It can't do precisely the same thing because it uses the tyctxt to look up the type - maybe it could look up the DefKind while visiting the parent item? I have a few different ideas that could work (but I will probably end up just going with "use the DefId directly instead of stringifying" since I think I'll need it anyway).

rustdoc: Cleanup handling of associated items for intra-doc links Helps with rust-lang#83761 (right now the uses of the resolver are all intermingled with uses of the tyctxt). Best reviewed one commit at a time. r? `@bugadani` maybe? Feel free to reassign :)

rustdoc: Cleanup handling of associated items for intra-doc links Helps with rust-lang#83761 (right now the uses of the resolver are all intermingled with uses of the tyctxt). Best reviewed one commit at a time. r? ``@bugadani`` maybe? Feel free to reassign :)

rustdoc: Cleanup handling of associated items for intra-doc links Helps with rust-lang#83761 (right now the uses of the resolver are all intermingled with uses of the tyctxt). Best reviewed one commit at a time. r? ```@bugadani``` maybe? Feel free to reassign :)

rustdoc: Cleanup handling of associated items for intra-doc links Helps with rust-lang#83761 (right now the uses of the resolver are all intermingled with uses of the tyctxt). Best reviewed one commit at a time. r? `@bugadani` maybe? Feel free to reassign :)

rustdoc: Cleanup handling of associated items for intra-doc links Helps with rust-lang#83761 (right now the uses of the resolver are all intermingled with uses of the tyctxt). Best reviewed one commit at a time. r? ``@bugadani`` maybe? Feel free to reassign :)

rustdoc: Cleanup handling of associated items for intra-doc links Helps with rust-lang#83761 (right now the uses of the resolver are all intermingled with uses of the tyctxt). Best reviewed one commit at a time. r? ```@bugadani``` maybe? Feel free to reassign :)

rustdoc: Store intra-doc links in Cache instead of on items directly Items are first built after rustdoc creates the TyCtxt. To allow resolving the links before the TyCtxt is built, the links can't be stored on `clean::Item` directly. Helps with rust-lang#83761. Opening this early because I think it might decrease memory usage.

mentioning rust-lang#83761

This commit implements MCP rust-lang/compiler-team#584 It also removes code that is no longer used, and that includes code cloning resolver, so issue rust-lang#83761 is fixed.

jyn514 added C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Apr 1, 2021

jyn514 mentioned this issue Apr 1, 2021

rustdoc: Don't load all extern crates unconditionally #83738

Merged

jyn514 added the A-resolve Area: Name resolution label Apr 2, 2021

jyn514 changed the title ~~rustdoc: Don't freeze resolver outputs~~ rustdoc: Stop cloning the resolver Apr 2, 2021

jyn514 self-assigned this Apr 2, 2021

jyn514 mentioned this issue Apr 4, 2021

rustdoc: Store intra-doc links in Cache instead of on items directly #83833

Merged

jyn514 mentioned this issue Apr 4, 2021

rustdoc: Cleanup handling of associated items for intra-doc links #83849

Merged

jyn514 mentioned this issue Apr 10, 2021

Rustdoc ICE: Referencing a workspace crate's root in an intra-doc link triggers an ICE #84046

Closed

jyn514 added a commit to jyn514/rust that referenced this issue Apr 11, 2021

[very broken] Fixup resolve() in late pass

d1b781b

mentioning rust-lang#83761

petrochenkov mentioned this issue Jan 12, 2023

give the resolver access to TyCtxt #105462

Merged

petrochenkov mentioned this issue Jan 31, 2023

MCP: Resolve documentation links in rustc and store the results in metadata rust-lang/compiler-team#584

Closed

3 tasks

bors closed this as completed in 5b45024 Feb 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rustdoc: Stop cloning the resolver #83761

rustdoc: Stop cloning the resolver #83761

jyn514 commented Apr 1, 2021 •

edited

Loading

petrochenkov commented Apr 2, 2021

jyn514 commented Apr 2, 2021 •

edited

Loading

petrochenkov commented Apr 2, 2021 •

edited

Loading

jyn514 commented Apr 4, 2021

jyn514 commented Apr 4, 2021

jyn514 commented Apr 4, 2021

jyn514 commented Apr 4, 2021

petrochenkov commented Apr 4, 2021

jyn514 commented Apr 4, 2021

rustdoc: Stop cloning the resolver #83761

rustdoc: Stop cloning the resolver #83761

Comments

jyn514 commented Apr 1, 2021 • edited Loading

Implementation History

petrochenkov commented Apr 2, 2021

jyn514 commented Apr 2, 2021 • edited Loading

petrochenkov commented Apr 2, 2021 • edited Loading

jyn514 commented Apr 4, 2021

jyn514 commented Apr 4, 2021

jyn514 commented Apr 4, 2021

jyn514 commented Apr 4, 2021

petrochenkov commented Apr 4, 2021

jyn514 commented Apr 4, 2021

jyn514 commented Apr 1, 2021 •

edited

Loading

jyn514 commented Apr 2, 2021 •

edited

Loading

petrochenkov commented Apr 2, 2021 •

edited

Loading