Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustdoc-search: simplify rules for generics and type params #127589

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

notriddle
Copy link
Contributor

@notriddle notriddle commented Jul 10, 2024

Heads up!: This PR is a follow-up that depends on #124544. It adds 12dc24f, a change to the filtering behavior, and 9900ea4, a minor ranking tweak.

Part of rust-lang/rust-project-goals#112

This PR overturns #109802

Preview

image

Description

This commit is a response to feedback on the displayed type signatures results, by making generics act stricter.

  • Order within generics is significant. This means Vec<Allocator> now matches only with a true vector of allocators, instead of matching the second type param. It also makes unboxing within generics stricter, so Result<A, B> only matches if B is in the error type and A is in the success type. The top level of the function search is unaffected.
  • Generics are only "unboxed" if a type is explicitly opted into it. References and tuples are hardcoded to allow unboxing, and Box, Rc, Arc, Option, Result, and Future are opted in with an unstable attribute. Search result unboxing is the process that allows you to search for i32 -> str and get back a function with the type signature &Future<i32> -> Box<str>.
  • Instead of ranking by set overlap, it ranks by the number of items in the type signature. This makes it easier to find single type signatures like transmute.

Find the discussion on

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Jul 10, 2024
@rustbot
Copy link
Collaborator

rustbot commented Jul 10, 2024

Some changes occurred in HTML/CSS/JS.

cc @GuillaumeGomez, @jsha

Some changes occurred in GUI tests.

cc @GuillaumeGomez

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Jul 27, 2024

☔ The latest upstream changes (presumably #128253) made this pull request unmergeable. Please resolve the merge conflicts.

@camelid
Copy link
Member

camelid commented Jul 31, 2024

Generics are tightened by making order significant. This means Vec<Allocator> now matches only with a true vector of allocators, instead of matching the second type param. It also makes unboxing within generics stricter, so Result<A, B> only matches if B is in the error type and A is in the success type. The top level of the function search is unaffected.

Excellent, this is great—thanks! Although do you mind clarifying what you mean by "The top level of the function search is unaffected"? Could you give an example?

Type params are loosened by allowing a single function type param to match multiple query params. In other words, the relationship is loosened to N:1 instead of the older 1:1 rule. This change also allows a type param to be unboxed in one spot and matched in another.

Hmm, this seems like the opposite of what we discussed in the meeting. The goal is to have a 1:1 matching, not N:1, so (using your example) Box<[A]> -> Vec<B> would return nothing because the query asks for something more powerful (transforming any A into a potentially different B). However, Box<[A]> -> Vec<A> would work because the function it describes does not transform the inner element. Functions like transmute would be the only ones matched by a query like A -> B. This is what @fmease described in his Zulip post that you linked to:

I would've expected Box<[T]> -> Vec (etc.) to return no results at all because there is no such signature. Only Box<[T]> → Vec would match.

The older rule seems to be N:1 too based on checking the nightly std docs, so I'm not sure what change this PR makes. Please correct me if I misunderstood.

One more thing: What does it mean for a generic parameter to be "unboxed" as compared to being "matched"?

@notriddle
Copy link
Contributor Author

notriddle commented Jul 31, 2024

Excellent, this is great—thanks! Although do you mind clarifying what you mean by "The top level of the function search is unaffected"? Could you give an example?

The actual function parameters can be given in any order. vec<t>, usize -> t gives you remove, swap_remove, split_off, and drain. usize, vec<t> -> t gives the exact same set of results.

The stuff within the <> is required to be in a the same order, but the function parameters themselves are order-agnostic.

Hmm, this seems like the opposite of what we discussed in the meeting. The goal is to have a 1:1 matching, not N:1, so (using your example) Box<[A]> -> Vec<B> would return nothing because the query asks for something more powerful (transforming any A into a potentially different B).

That's how it currently works in the master branch, but #124544 (comment) by @jsha asked for this change. "One last suggestion: What if we removed the feature of requiring T and Z to match distinct types?"

The older rule seems to be N:1 too based on checking the nightly std docs, so I'm not sure what change this PR makes. Please correct me if I misunderstood.

The older rule was 1:1. I suspect that almost all of the confusion about rustdoc's type-driven search comes from reordering, which is why I'm happy to remove it.

One more thing: What does it mean for a generic parameter to be "unboxed" as compared to being "matched"?

Given a function with this signature: fn run<I: Iterator<Item=u32>>(i: I) -> bool

I can search for it with this query: iterator<item=u32> -> bool

That is, the generic parameter I has been "unboxed" into its where clause.

@camelid
Copy link
Member

camelid commented Jul 31, 2024

The actual function parameters can be given in any order. vec<t>, usize -> t gives you remove, swap_remove, split_off, and drain. usize, vec<t> -> t gives the exact same set of results.

The stuff within the <> is required to be in a the same order, but the function parameters themselves are order-agnostic.

Sounds good 👍

That's how it currently works in the master branch, but #124544 (comment) by @jsha asked for this change. "One last suggestion: What if we removed the feature of requiring T and Z to match distinct types?"

Hmm, seems like we'll need to have more team discussion on this point then. It sounds like fmease and I are in favor of the stricter "T and Z are parametrically polymorphic" approach (like in Rust generics, where we must consider each to be instantiated any way the user likes), whereas jsha prefers the looser "T and Z are just placeholders" approach. I see advantages both ways, but I prefer the former because it matches the language itself better, and it also hews more closely to prior art (like Hoogle).

However, it seems that implementing the former approach properly might require rethinking some aspects of the search design and implementation. Namely, it seems that currently A is considered a placeholder for a type and may match a concrete type rather than just a parameter. I expected type-based search to behave more like Hoogle (note: the following links are to equivalent Hoogle searches), where Vec<T> -> A matches nothing (except transmute, or in Hoogle's case also a version of len that is generic over the number), Vec<T> -> T matches things like get but not len, and Vec<T> -> usize matches len.

This may all be too difficult to implement especially given the constraints of using browser JS. Or we may decide that we want a looser behavior in fact. But I wanted to lay this out to make sure we're explicit about the decision.

The older rule was 1:1. I suspect that almost all of the confusion about rustdoc's type-driven search comes from reordering, which is why I'm happy to remove it.

That is probably true, and I agree that removing reordering is the most important change. Figuring out how exactly generics are interpreted is a bigger discussion that may be best handled in a future PR.

Given a function with this signature: fn run<I: Iterator<Item=u32>>(i: I) -> bool

I can search for it with this query: iterator<item=u32> -> bool

That is, the generic parameter I has been "unboxed" into its where clause.

Thanks 👍

@camelid
Copy link
Member

camelid commented Jul 31, 2024

Btw there does seem to be a bug in the implementation of this PR since A is actually matching VecDeque<T, A>.

image

We may also consider hiding parameters that have defaults (like Vec's A) that were not matched in the query.

@camelid
Copy link
Member

camelid commented Jul 31, 2024

That's how it currently works in the master branch, but #124544 (comment) by @jsha asked for this change. "One last suggestion: What if we removed the feature of requiring T and Z to match distinct types?"

After posting my reply, I went to read @jsha's original comment, and it made me wonder if what actually bothered him was reordering. I suspect that after reordering is removed, it behaves fine in general and with defaulted type parameters. IMO, requiring them to match different types is also not unusual since it more closely matches (though definitely not completely) the normal semantics of generics in Rust. Allowing T and Z to match the same type makes them behave more like existentially-quantified types (i.e., dyn Trait) than universally-quantified types (i.e., fn foo<A>(x: A)).

One last suggestion: What if we removed the feature of requiring T and Z to match distinct types? I get that it's more expressive, but it's unusual, and it interacts poorly with other features like reordering and defaulted type parameters (like Vec's Allocator parameter, which people don't often think about).

@notriddle
Copy link
Contributor Author

Namely, it seems that currently A is considered a placeholder for a type and may match a concrete type rather than just a parameter.

It only matches parameters. For example, Arc<File> -> bool has two results, while Arc<File> -> T has none.

Btw there does seem to be a bug in the implementation of this PR since A is actually matching VecDeque<T, A>.

As it says in the where clause, "A matches T", the VecDequeue's first type parameter, not VecDequeue itself. That's also why the word "VecDequeue" is not bold.

@notriddle
Copy link
Contributor Author

Figuring out how exactly generics are interpreted is a bigger discussion that may be best handled in a future PR.

We're going to need to decide this at some point. This PR is as good a point as any, provided the other PR, adding the user-visible type signatures to the search results, isn't blocked.

@camelid
Copy link
Member

camelid commented Jul 31, 2024

It only matches parameters. For example, Arc<File> -> bool has two results, while Arc<File> -> T has none.

Ah, I think I was confused because A -> B has a lot of results that are not just transmute. This is probably because we encode Self as a generic rather than its own thing. I could open a PR to fix that if you think that'd make sense?

As it says in the where clause, "A matches T", the VecDequeue's first type parameter, not VecDequeue itself. That's also why the word "VecDequeue" is not bold.

So when we search, it can match something inside another type? That seems a bit counterintuitive to me and like it would generate excess results. Was this added so that T would show results with Option<T>? If so perhaps it might make more sense to just special-case a few types like Option and Result.

@notriddle
Copy link
Contributor Author

So when we search, it can match something inside another type? That seems a bit counterintuitive to me and like it would generate excess results. Was this added so that T would show results with Option<T>? If so perhaps it might make more sense to just special-case a few types like Option and Result.

You're basically correct about the reasoning, yes.

Hoogle mentioned it in their design docs. https://ndmitchell.com/downloads/slides-hoogle_finding_functions_from_types-16_may_2011.pdf "A note on 'boxing'". For example, Num i => [a] -> i includes a result with a Maybe return value.

However, it looks like they only do it with special-cased types? https://github.com/ndmitchell/hoogle/blob/8149c93c40a542bf8f098047e1acbc347fc9f4e6/docs/TypeSearch.md#rewrites describes a special rule, and when I try to test it, I can only see boxing with Maybe and IO.

I didn't special-case anything because it's not clear which things should be special-cased for Rust.

Ah, I think I was confused because A -> B has a lot of results that are not just transmute. This is probably because we encode Self as a generic rather than its own thing. I could open a PR to fix that if you think that'd make sense?

Yup, that's bad. The blanket impl is being translated wrong.

ChrisDenton added a commit to ChrisDenton/rust that referenced this pull request Aug 5, 2024
rustdoc: Fix handling of `Self` type in search index and refactor its representation

### Summary

- Add enum variant `clean::Type::SelfTy` and use it instead of `clean::Type::Generic(kw::SelfUpper)`.
- Stop treating `Self` as a generic in the search index.
- Remove struct formerly known as `clean::SelfTy` (constructed as representation of function receiver type). We're better off without it.

### Before

![image](https://github.com/user-attachments/assets/d257bdd8-3a62-4c71-84a5-9c950f2e4f00)

### After

![image](https://github.com/user-attachments/assets/8f6d3f22-92c1-41e3-9ab8-a881b66816c0)

r? `@notriddle`
cc rust-lang#127589 (comment)
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 5, 2024
rustdoc: Fix handling of `Self` type in search index and refactor its representation

### Summary

- Add enum variant `clean::Type::SelfTy` and use it instead of `clean::Type::Generic(kw::SelfUpper)`.
- Stop treating `Self` as a generic in the search index.
- Remove struct formerly known as `clean::SelfTy` (constructed as representation of function receiver type). We're better off without it.

### Before

![image](https://github.com/user-attachments/assets/d257bdd8-3a62-4c71-84a5-9c950f2e4f00)

### After

![image](https://github.com/user-attachments/assets/8f6d3f22-92c1-41e3-9ab8-a881b66816c0)

r? ``@notriddle``
cc rust-lang#127589 (comment)
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 5, 2024
rustdoc: Fix handling of `Self` type in search index and refactor its representation

### Summary

- Add enum variant `clean::Type::SelfTy` and use it instead of `clean::Type::Generic(kw::SelfUpper)`.
- Stop treating `Self` as a generic in the search index.
- Remove struct formerly known as `clean::SelfTy` (constructed as representation of function receiver type). We're better off without it.

### Before

![image](https://github.com/user-attachments/assets/d257bdd8-3a62-4c71-84a5-9c950f2e4f00)

### After

![image](https://github.com/user-attachments/assets/8f6d3f22-92c1-41e3-9ab8-a881b66816c0)

r? ```@notriddle```
cc rust-lang#127589 (comment)
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Aug 5, 2024
Rollup merge of rust-lang#128471 - camelid:rustdoc-self, r=notriddle

rustdoc: Fix handling of `Self` type in search index and refactor its representation

### Summary

- Add enum variant `clean::Type::SelfTy` and use it instead of `clean::Type::Generic(kw::SelfUpper)`.
- Stop treating `Self` as a generic in the search index.
- Remove struct formerly known as `clean::SelfTy` (constructed as representation of function receiver type). We're better off without it.

### Before

![image](https://github.com/user-attachments/assets/d257bdd8-3a62-4c71-84a5-9c950f2e4f00)

### After

![image](https://github.com/user-attachments/assets/8f6d3f22-92c1-41e3-9ab8-a881b66816c0)

r? ```@notriddle```
cc rust-lang#127589 (comment)
@jsha
Copy link
Contributor

jsha commented Aug 12, 2024

To clarify my previous comment:

  • If A shows up in a search, it should match the same thing each time. For instance, Vec<A> -> Option<A> should find Vec::first but not (e.g.) Vec::as_ascii.
  • If A and B show up in a search, they should match for both A == B and A != B. For instance, Vec<A> -> Option<B> should find Vec::first and Vec::as_ascii. My understanding is that the the current logic requires A != B.

After posting my reply, I went to read @jsha's original comment, and it made me wonder if what actually bothered him was reordering. I suspect that after reordering is removed, it behaves fine in general and with defaulted type parameters.

Maybe after removing reordering, the A != B requirement doesn't wind up being that important either way. My main motivation here is to look for ways to keep the search rules simple enough for people to understand and learn easily. And also simple enough to implement correctly and keep correct even as we add features).

IMO, requiring them to match different types is also not unusual since it more closely matches (though definitely not completely) the normal semantics of generics in Rust.

I don't understand this. It seems to me the opposite is true. I wrote previously:

if I wrote struct Point<T, U>, it would be valid for T and U to be the same type.

So for instance let x: Point<u32, u32> = ... would be valid.

@rust-log-analyzer

This comment has been minimized.

@notriddle
Copy link
Contributor Author

I've added the unboxing flag feature in another commit (and also rebased onto head). It now does unboxing on Box, &, Future, Option, Result, and (Tuple), but not anything else.

As for the A != B constraint, the reason I originally wrote it that way was the broad idea that you aren't required to specify generics at all, so if you are specifying them, you probably have some specific meaning in mind.

  • If A shows up in a search, it should match the same thing each time. For instance, Vec<A> -> Option<A> should find Vec::first but not (e.g.) Vec::as_ascii.
  • If A and B show up in a search, they should match for both A == B and A != B. For instance, Vec<A> -> Option<B> should find Vec::first and Vec::as_ascii. My understanding is that the the current logic requires A != B.

It sounds like you also want type params in the search query to match with concrete types in the function, and not just with the function's own type params.

I'm not sure that's very useful. First of all, it's not how Hoogle works. If I run a search ParsecT [Tok] s m Tok, I get back pEscaped. If I turn one of the Tok into t (which turns it into a type param), making ParsecT [Tok] s m t, I get no results.

Second of all, the use case I have in mind is that someone's searching for a function because they plan to call it, so the stuff before the arrow is what they have, while the stuff after the arrow is what they want to get. If they're writing a type param, it's probably because they are themselves authoring a generic function.

@bors
Copy link
Contributor

bors commented Aug 31, 2024

☔ The latest upstream changes (presumably #129809) made this pull request unmergeable. Please resolve the merge conflicts.

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Oct 1, 2024

☔ The latest upstream changes (presumably #131078) made this pull request unmergeable. Please resolve the merge conflicts.

@lolbinarycat
Copy link
Contributor

It sounds like you also want type params in the search query to match with concrete types in the function, and not just with the function's own type params.

this is ultimately an issue of universal qualification vs existential qualification vs some halfway compromise that may be less confusing for new users, but also lacks formal backing

formally speaking, there are two ways type variables can exist:

there's the regular "for all T" type variables that rust tends to use (universal qualification)

there's also "there exists some T" type variables (existential qualification). in rust these corrospond to impl Trait in return position

if rustdoc search treats type variables as universally quantified, it makes sense for them to match only other generics, and also to have the A != B

if rustdoc search treats them as existential, removing A != B makes sense, but it would also make sense for them to be able to match concrete types

This reduces code size while still matching the common case
for plain, concrete types.
This commit is a response to feedback on the displayed type
signatures results, by making generics act stricter.

Generics are tightened by making order significant. This means
`Vec<Allocator>` now matches only with a true vector of allocators,
instead of matching the second type param. It also makes unboxing
within generics stricter, so `Result<A, B>` only matches if `B`
is in the error type and `A` is in the success type. The top level
of the function search is unaffected.

Find the discussion on:

* <https://rust-lang.zulipchat.com/#narrow/stream/393423-t-rustdoc.2Fmeetings/topic/meeting.202024-07-08/near/449965149>
* <rust-lang#124544 (comment)>
* <https://rust-lang.zulipchat.com/#narrow/channel/266220-t-rustdoc/topic/deciding.20on.20semantics.20of.20generics.20in.20rustdoc.20search/near/476841363>
@notriddle
Copy link
Contributor Author

It seems like everyone but @jsha wants A != B semantics for search. Am I understanding that correctly?

@rfcbot
Copy link

rfcbot commented Oct 30, 2024

Team member @notriddle has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Oct 30, 2024
@fmease
Copy link
Member

fmease commented Oct 31, 2024

@rfcbot reviewed

@camelid
Copy link
Member

camelid commented Nov 1, 2024

@rfcbot reviewed

Thanks for the hard work! The results look great.

@rfcbot rfcbot added the final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. label Nov 1, 2024
@rfcbot
Copy link

rfcbot commented Nov 1, 2024

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot removed the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants