Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustdoc: Stop cloning the resolver #83761

Closed
4 of 6 tasks
jyn514 opened this issue Apr 1, 2021 · 13 comments · Fixed by #94857
Closed
4 of 6 tasks

rustdoc: Stop cloning the resolver #83761

jyn514 opened this issue Apr 1, 2021 · 13 comments · Fixed by #94857
Assignees
Labels
A-resolve Area: Name resolution C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.

Comments

@jyn514
Copy link
Member

jyn514 commented Apr 1, 2021

Rustdoc currently creates a copy of the resolver to use for intra-doc links:

rust/src/librustdoc/core.rs

Lines 350 to 354 in d474075

crate fn create_resolver<'a>(
externs: config::Externs,
queries: &Queries<'a>,
sess: &Session,
) -> Rc<RefCell<interface::BoxedResolver>> {

This is a Terrible, Horrible, No Good, Very Bad Idea. In particular, it causes rustdoc's copy of the resolver and the TyCtxt to disagree about what crates exist:

It's also distorting rustc_resolve, since not all of the outputs make sense to clone in the first place: #65625 (comment)

We should refactor rustdoc somehow to allow getting rid of Resolver::clone_outputs. @petrochenkov suggests moving anything that needs to touch the resolver before HIR lowering: #68427 (comment).

@petrochenkov what do you think about @eddyb's comment in #65625 (comment) ?

Why would cloning be needed? Is this that support for resolving things after rustc_resolve finishes?
In that case I'm not sure why we need to clone the outputs - is it to be able to keep the original resolver around?

Would it be possible for lower_to_hir to stop stealing the resolver?

let resolver = peeked.1.steal();

Then rustdoc wouldn't need to clone it in the first place, it could just call queries.expansion().peek().1 whenever it needs access to the resolver.

See #68427 for previous discussion.


Implementation History

  • Initial "early crate loader": rustdoc: Don't load all extern crates unconditionally #83738
  • Don't require clean::Items before resolving intra-doc links: rustdoc: Store intra-doc links in Cache instead of on items directly #83833
  • Separate type-relative resolution from path resolution (at least somewhat): rustdoc: Cleanup handling of associated items for intra-doc links #83849
  • Move early loader into collect_intra_doc_links: rustdoc: Move crate loader to collect_intra_doc_links::early  #84101
  • Move all path resolution from late to early module and add new field on DocContext for partially resolved links
    • Change preprocess_link to also take into account the hacks for Self:: and crate:
      // FIXME(jynelson): this shouldn't go through stringification, rustdoc should just use the DefId directly
      ,
      let resolved_self;
      // replace `Self` with suitable item's parent name
      let is_lone_self = path_str == "Self";
      let is_lone_crate = path_str == "crate";
      if path_str.starts_with("Self::") || is_lone_self {
      if let Some(ref name) = self_name {
      if is_lone_self {
      path_str = name;
      } else {
      resolved_self = format!("{}::{}", name, &path_str[6..]);
      path_str = &resolved_self;
      }
      }
      } else if path_str.starts_with("crate::") || is_lone_crate {
      use rustc_span::def_id::CRATE_DEF_INDEX;
      // HACK(jynelson): rustc_resolve thinks that `crate` is the crate currently being documented.
      // But rustdoc wants it to mean the crate this item was originally present in.
      // To work around this, remove it and resolve relative to the crate root instead.
      // HACK(jynelson)(2): If we just strip `crate::` then suddenly primitives become ambiguous
      // (consider `crate::char`). Instead, change it to `self::`. This works because 'self' is now the crate root.
      // FIXME(#78696): This doesn't always work.
      if is_lone_crate {
      path_str = "self";
      } else {
      resolved_self = format!("self::{}", &path_str["crate::".len()..]);
      path_str = &resolved_self;
      }
      module_id = DefId { krate, index: CRATE_DEF_INDEX };
      }
@jyn514 jyn514 added C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Apr 1, 2021
@jyn514 jyn514 added the A-resolve Area: Name resolution label Apr 2, 2021
@petrochenkov
Copy link
Contributor

I'm not sure what do you mean by freezing the resolver outputs, but I meant the regular rustc's mode of operation

  • resolver is unique and is not kept under Rc
  • therefor to_resolver_outputs always results in calling Resolver::into_outputs rather than Resolver::clone_outputs
  • Resolver::into_outputs consumes the resolver with ResolverOutputs being the only thing left from it, i.e. it's a "frozen" summary of the resolver's work at a certain point

So we need to achieve "freezing" the resolver outputs in rustdoc rather than to avoid it.

Regarding eddyb's comment.
Yes, rustdoc resolves things after creating tcx (which needs ResolverOutputs to be created), so we have to keep the resolver.
This needs to be avoided by moving all name resolution activity in rustdoc to an earlier point.

Would it be possible for lower_to_hir to stop stealing the resolver?
Then rustdoc wouldn't need to clone it in the first place, it could just call queries.expansion().peek().1 whenever it needs access to the resolver.

Not sure about the details of rustc_interface's work, but yes, resolver should be unique and should not be cloned.

@jyn514 jyn514 changed the title rustdoc: Don't freeze resolver outputs rustdoc: Stop cloning the resolver Apr 2, 2021
@jyn514
Copy link
Member Author

jyn514 commented Apr 2, 2021

So we need to achieve "freezing" the resolver outputs in rustdoc rather than to avoid it.

Ok, I see, I updated the title.

Yes, rustdoc resolves things after creating tcx (which needs ResolverOutputs to be created), so we have to keep the resolver. This needs to be avoided by moving all name resolution activity in rustdoc to an earlier point.

I'm not quite sure I follow - is there a reason the tcx has to freeze the resolver into ResolverOutputs? Or could the tcx store the resolver itself? If it stored the resolver rustdoc could stop cloning it without major changes.

To be clear, I don't yet know how much work it would be for rustdoc to move all resolution before creating a TyCtxt, but the model where TyCtxt stores a resolver makes more sense in my head; or at least, it's more flexible.

Possible scheme for rustdoc to move resolution before TyCtxt
  1. Remove clean::Attributes::links
  2. Define a new type:
enum EarlyIntraDocLink {
  Resolved(clean::ItemLink),
  AssocItem { base_ty: Res, item_name: Symbol, ns: Namespace, /* possibly more fields are necessary */ },
  Variant { parent_def: Res, variant: Symbol }, // this might be able to be resolved early since the resolver implements DefIdTree, see https://github.com/rust-lang/rust/blob/d1065e6cefa41fe6c55c9819552cdd61529096fc/src/librustdoc/passes/collect_intra_doc_links.rs#L2121 for the current logic
  // This can't be emitted early because it's a HIR lint, it needs a TyCtxt available.
  Error { kind: ResolutionFailure, diag: DiagnosticInfo, path_str: String, disambiguator: Disambiguator },
}
  1. Add a new field to DocContext: intra_doc_links: FxHashMap<DefId, Vec<EarlyIntraDocLink>>
  2. Expand IntraDocCrateLoader (from rustdoc: Don't load all extern crates unconditionally #83738) to fill out that field by handling more things that are currently in collect_intra_doc_links. I don't think it will be able to handle anything more than resolve_path:
    fn resolve_path(&self, path_str: &str, ns: Namespace, module_id: DefId) -> Option<Res> {
  3. Change collect_intra_doc_links to remove all uses of the resolver.

For context, these are the current uses of the resolver in collect_intra_doc_links:

  • let mut resolver = cx.resolver.borrow_mut();
    let in_scope_traits = cx.module_trait_cache.entry(module).or_insert_with(|| {
    resolver.access(|resolver| {
    let parent_scope = &ParentScope::module(resolver.get_module(module), resolver);
    resolver
    .traits_in_scope(None, parent_scope, SyntaxContext::root(), None)
    .into_iter()
    .map(|candidate| candidate.def_id)
    .collect()
    })
    });
    (shouldn't need access to tcx)
  • fn resolve_macro(
    &self,
    path_str: &'a str,
    module_id: DefId,
    ) -> Result<Res, ResolutionFailure<'a>> {
    let path = ast::Path::from_ident(Ident::from_str(path_str));
    self.cx.enter_resolver(|resolver| {
    (doesn't need access to tcx)
  • let ty_res = self
    .cx
    .enter_resolver(|resolver| {
    resolver.resolve_str_path_error(DUMMY_SP, &path, TypeNS, module_id)
    })
    (only needs tcx after using resolver)
  • fn resolve_path(&self, path_str: &str, ns: Namespace, module_id: DefId) -> Option<Res> {
    let result = self.cx.enter_resolver(|resolver| {
    resolver
    .resolve_str_path_error(DUMMY_SP, &path_str, ns, module_id)
    (doesn't need tcx)

@petrochenkov
Copy link
Contributor

petrochenkov commented Apr 2, 2021

Or could the tcx store the resolver itself?

It may be possible.
Resolver contains a lot of early compilation stuff like NodeIds that is out of place in tcx though.
Also it may be preferable to finish all the resolution activities as early as possible (at least those that may result in loading new crates).
I think we may end up with some scheme working similarly to this if incremental compilation and queries advance so much that we'll start skipping resolution of paths if they are not actually used, but that would probably be a very different resolver.

For now, I'd prefer to normalize to what rustc does and have a single scheme with resolver working early instead.
Paths in doc comments are not fundamentally different from paths in e.g. expressions, and if we visit and resolve paths early in general, then paths in doc comments can be visited and resolved early as well.
(If we migrate to something like "resolver in tcx" in the future, then rustc and rustdoc will also migrate to it together.)

@jyn514
Copy link
Member Author

jyn514 commented Apr 4, 2021

Remove clean::Attributes::links

Done in #83833 (although I need to clean it up before it can be merged).

@jyn514
Copy link
Member Author

jyn514 commented Apr 4, 2021

Expand IntraDocCrateLoader (from #83738) to fill out that field by handling more things that are currently in collect_intra_doc_links. I don't think it will be able to handle anything more than resolve_path:

This isn't currently possible because resolving Self goes through the TyCtxt to get the name instead of the resolver:

// FIXME(jynelson): this shouldn't go through stringification, rustdoc should just use the DefId directly
let self_name = self_id.and_then(|self_id| {
if matches!(self.cx.tcx.def_kind(self_id), DefKind::Impl) {
// using `ty.to_string()` (or any variant) has issues with raw idents
let ty = self.cx.tcx.type_of(self_id);
let name = match ty.kind() {
ty::Adt(def, _) => Some(self.cx.tcx.item_name(def.did).to_string()),
other if other.is_primitive() => Some(ty.to_string()),
_ => None,
};
debug!("using type_of(): {:?}", name);
name
} else {
let name = self.cx.tcx.opt_item_name(self_id).map(|sym| sym.to_string());
debug!("using item_name(): {:?}", name);
name
}
});

Here is a fix (it uses a tcx currently, but the same API is available on CStore): 5d5be02

Original ramblings before I realized this was easy

There are two possible ways forward:

  1. Fix the FIXME at the top and use the DefId directly instead of stringifying. That requires finding a way to call .def_kind with only a Resolver - looks like I can do that with CStore::def_kind, and I can get a CStore from Resolver::cstore.
  2. Find a way to get the stringified name of Self with the resolver directly (just an equivalent of opt_item_name should be enough). (This is 5d5be02)

@jyn514
Copy link
Member Author

jyn514 commented Apr 4, 2021

Oh wait no I'm dumb, 5d5be02 only helps if the item isn't an impl block. @petrochenkov Is there a way to resolve the Self type of an impl block? I tried resolver.resolve_str_path_error(DUMMY_SP, "Self", TypeNS, self_id), but that panics because it's not a module.

If not it's not a giant deal, I'll just have to figure out how to fix the FIXME on line 868.

@jyn514
Copy link
Member Author

jyn514 commented Apr 4, 2021

Ugh, I have to figure out how to replicate this bit somehow:

for &ns in &[TypeNS, ValueNS, MacroNS] {
if let Some(res) =
collector.check_full_res(ns, &start, module_id, &None)
{
debug!("found partial_res={:?}", res);
*partial_res = Some(res);
*unresolved = end.into();
break 'outer;
}
}

Maybe the early pass could do that? I think the late pass for type-relative paths will need to have a partial resolution anyway, so it shouldn't be too much more work.

@petrochenkov
Copy link
Contributor

Is there a way to resolve the Self type of an impl block?

rustc keeps the "current impl" (or other item providing Self) in the late resolution visitor walking AST (see fn with_self_rib_ns).
The analogous visitor in rustdoc is IntraLinkCrateLoader right now.

I'm not sure how rustdoc resolves Self now, but it cannot do it through Resolver because resolver doesn't provide any methods able to resolve it. So it's probably fine for rustdoc to just continue doing what it's doing now.

@jyn514
Copy link
Member Author

jyn514 commented Apr 4, 2021

rustc keeps the "current impl" (or other item providing Self) in the late resolution visitor walking AST (see fn with_self_rib_ns).

Thanks, this is helpful!

I'm not sure how rustdoc resolves Self now, but it cannot do it through Resolver because resolver doesn't provide any methods able to resolve it. So it's probably fine for rustdoc to just continue doing what it's doing now.

Hmm, ok. It can't do precisely the same thing because it uses the tyctxt to look up the type - maybe it could look up the DefKind while visiting the parent item? I have a few different ideas that could work (but I will probably end up just going with "use the DefId directly instead of stringifying" since I think I'll need it anyway).

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Apr 5, 2021
rustdoc: Cleanup handling of associated items for intra-doc links

Helps with rust-lang#83761 (right now the uses of the resolver are all intermingled with uses of the tyctxt). Best reviewed one commit at a time.

r? `@bugadani` maybe? Feel free to reassign :)
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Apr 5, 2021
rustdoc: Cleanup handling of associated items for intra-doc links

Helps with rust-lang#83761 (right now the uses of the resolver are all intermingled with uses of the tyctxt). Best reviewed one commit at a time.

r? ``@bugadani`` maybe? Feel free to reassign :)
jyn514 pushed a commit to jyn514/rust that referenced this issue Apr 5, 2021
rustdoc: Cleanup handling of associated items for intra-doc links

Helps with rust-lang#83761 (right now the uses of the resolver are all intermingled with uses of the tyctxt). Best reviewed one commit at a time.

r? ```@bugadani``` maybe? Feel free to reassign :)
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Apr 5, 2021
rustdoc: Cleanup handling of associated items for intra-doc links

Helps with rust-lang#83761 (right now the uses of the resolver are all intermingled with uses of the tyctxt). Best reviewed one commit at a time.

r? `@bugadani` maybe? Feel free to reassign :)
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Apr 5, 2021
rustdoc: Cleanup handling of associated items for intra-doc links

Helps with rust-lang#83761 (right now the uses of the resolver are all intermingled with uses of the tyctxt). Best reviewed one commit at a time.

r? ``@bugadani`` maybe? Feel free to reassign :)
JohnTitor added a commit to JohnTitor/rust that referenced this issue Apr 5, 2021
rustdoc: Cleanup handling of associated items for intra-doc links

Helps with rust-lang#83761 (right now the uses of the resolver are all intermingled with uses of the tyctxt). Best reviewed one commit at a time.

r? ```@bugadani``` maybe? Feel free to reassign :)
bors added a commit to rust-lang-ci/rust that referenced this issue Apr 7, 2021
rustdoc: Store intra-doc links in Cache instead of on items directly

Items are first built after rustdoc creates the TyCtxt. To allow
resolving the links before the TyCtxt is built, the links can't be
stored on `clean::Item` directly.

Helps with rust-lang#83761. Opening this early because I think it might decrease memory usage.
jyn514 added a commit to jyn514/rust that referenced this issue Apr 11, 2021
petrochenkov added a commit to petrochenkov/rust that referenced this issue Feb 10, 2023
This commit implements MCP rust-lang/compiler-team#584

It also removes code that is no longer used, and that includes code cloning resolver, so issue rust-lang#83761 is fixed.
@bors bors closed this as completed in 5b45024 Feb 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-resolve Area: Name resolution C-cleanup Category: PRs that clean code up or issues documenting cleanup. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants