Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustdoc-search: count path edits with separate edit limit #119331

Merged
merged 1 commit into from
Dec 28, 2023

Conversation

notriddle
Copy link
Contributor

Avoids strange-looking results like this one, where the path component seems to be ignored:

image

Since the two are counted separately elsewhere, they should get their own limits, too. The biggest problem with combining them is that paths are loosely checked by not requiring every component to match, which means that if they are short and matched loosely, they can easily find "drunk typist" matches that make no sense, like this old result:

std::collections::btree_map::itermut matching slice::itermut
maxEditDistance = ("slice::itermut".length) / 3 = 14 / 3 = 4
editDistance("std", "slice") = 4
editDistance("itermut", "itermut") = 0
    4 + 0 <= 4 PASS

Of course, slice::itermut should not match stuff from btreemap. slice should not match std.

The new result counts them separately:

maxPathEditDistance = "slice".length / 3 = 5 / 3 = 1
maxEditDistance = "itermut".length / 3 = 7 / 3 = 2
editDistance("std", "slice") = 4
    4 <= 1 FAIL

Effectively, this makes path queries less "typo-resistant". It's not zero, but it means vec won't match the v1 prelude.

This commit also adds substring matching to paths. It's stricter than the substring matching in the main part, but loose enough that what I expect to match does.

Queries without parent paths are unchanged.

@rustbot
Copy link
Collaborator

rustbot commented Dec 27, 2023

r? @jsha

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Dec 27, 2023
@rustbot
Copy link
Collaborator

rustbot commented Dec 27, 2023

Some changes occurred in HTML/CSS/JS.

cc @GuillaumeGomez, @jsha

@rust-log-analyzer

This comment has been minimized.

Since the two are counted separately elsewhere, they should get
their own limits, too. The biggest problem with combining them
is that paths are loosely checked by not requiring every component
to match, which means that if they are short and matched loosely,
they can easily find "drunk typist" matches that make no sense,
like this old result:

    std::collections::btree_map::itermut matching slice::itermut
    maxEditDistance = ("slice::itermut".length) / 3 = 14 / 3 = 4
    editDistance("std", "slice") = 4
    editDistance("itermut", "itermut") = 0
        4 + 0 <= 4 PASS

Of course, `slice::itermut` should not match stuff from btreemap.
`slice` should not match `std`.

The new result counts them separately:

    maxPathEditDistance = "slice".length / 3 = 5 / 3 = 1
    maxEditDistance = "itermut".length / 3 = 7 / 3 = 2
    editDistance("std", "slice") = 4
        4 <= 1 FAIL

Effectively, this makes path queries less "typo-resistant".
It's not zero, but it means `vec` won't match the `v1` prelude.

Queries without parent paths are unchanged.
@GuillaumeGomez
Copy link
Member

Change look good to me. Can you run JS perf check too please?

@notriddle
Copy link
Contributor Author

@GuillaumeGomez
Copy link
Member

There isn't really a case "like std" right?

@notriddle
Copy link
Contributor Author

I haven't included the standard library in this benchmark tool, because rigging it up would be complicated and not worth it.

The complexity comes from it being developed alongside the compiler, which means if I want to do a comparison benchmark across a bootstrap compiler version bump, I can't build the same version of the standard library with two different versions of rustdoc (unstable features are unstable) but I also shouldn't try to compare two different versions of the standard library (not an apples-to-apples comparison of the search engine changes).

The other reason is that the standard library isn't small enough to be a microbenchmark, or big enough to be a stress test.

Project Search index size
ripgrep benchmark 552K
standard library docs 4.8M
arti benchmark 1 9.1M
compiler-docs 2 11M
https://doc.servo.org/search-index.js 3 2 38M
stm32f4 benchmark 1 61M

Footnotes

  1. this benchmark includes all transitive dependencies and all features; the arti docs published by the Tor project, and stm32f4 docs on docs.rs, are smaller because they don't 2

  2. not currently part of the benchmark suite, because it also requires unstable compiler features, and I've been avoiding unstable features in the benchmark suite to make my life easier 2

  3. servo actually does include all of their transitive dependencies in their officially-published docs

@GuillaumeGomez
Copy link
Member

That's a good point. Well, let's consider the bench results not changing as a green light for going forward with it then. Thanks!

@bors r+ rollup

@bors
Copy link
Contributor

bors commented Dec 28, 2023

📌 Commit 0ea58e2 has been approved by GuillaumeGomez

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 28, 2023
@bors
Copy link
Contributor

bors commented Dec 28, 2023

⌛ Testing commit 0ea58e2 with merge 5c0907b...

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 28, 2023
…nce, r=GuillaumeGomez

rustdoc-search: count path edits with separate edit limit

Avoids strange-looking results like this one, where the path component seems to be ignored:

![image](https://github.com/rust-lang/rust/assets/1593513/f0ef077a-6e09-4d67-a29d-8cabc1495f66)

Since the two are counted separately elsewhere, they should get their own limits, too. The biggest problem with combining them is that paths are loosely checked by not requiring every component to match, which means that if they are short and matched loosely, they can easily find "drunk typist" matches that make no sense, like this old result:

    std::collections::btree_map::itermut matching slice::itermut
    maxEditDistance = ("slice::itermut".length) / 3 = 14 / 3 = 4
    editDistance("std", "slice") = 4
    editDistance("itermut", "itermut") = 0
        4 + 0 <= 4 PASS

Of course, `slice::itermut` should not match stuff from btreemap. `slice` should not match `std`.

The new result counts them separately:

    maxPathEditDistance = "slice".length / 3 = 5 / 3 = 1
    maxEditDistance = "itermut".length / 3 = 7 / 3 = 2
    editDistance("std", "slice") = 4
        4 <= 1 FAIL

Effectively, this makes path queries less "typo-resistant". It's not zero, but it means `vec` won't match the `v1` prelude.

This commit also adds substring matching to paths. It's stricter than the substring matching in the main part, but loose enough that what I expect to match does.

Queries without parent paths are unchanged.
@bors
Copy link
Contributor

bors commented Dec 28, 2023

💥 Test timed out

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Dec 28, 2023
@notriddle
Copy link
Contributor Author

@bors retry

dist-apple-various timed out on #119359 too.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 28, 2023
@rust-log-analyzer
Copy link
Collaborator

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 28, 2023
…iaskrgr

Rollup of 5 pull requests

Successful merges:

 - rust-lang#119331 (rustdoc-search: count path edits with separate edit limit)
 - rust-lang#119359 (Simplify Parser::ident_or_error)
 - rust-lang#119376 (Add regression test for rust-lang#106630)
 - rust-lang#119379 (Update `parse_seq` doc)
 - rust-lang#119380 (Don't suggest writing a bodyless arm if the pattern can never be a never pattern)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 2f51bad into rust-lang:master Dec 28, 2023
11 of 12 checks passed
@rustbot rustbot added this to the 1.77.0 milestone Dec 28, 2023
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Dec 28, 2023
Rollup merge of rust-lang#119331 - notriddle:notriddle/maxpatheditdistance, r=GuillaumeGomez

rustdoc-search: count path edits with separate edit limit

Avoids strange-looking results like this one, where the path component seems to be ignored:

![image](https://github.com/rust-lang/rust/assets/1593513/f0ef077a-6e09-4d67-a29d-8cabc1495f66)

Since the two are counted separately elsewhere, they should get their own limits, too. The biggest problem with combining them is that paths are loosely checked by not requiring every component to match, which means that if they are short and matched loosely, they can easily find "drunk typist" matches that make no sense, like this old result:

    std::collections::btree_map::itermut matching slice::itermut
    maxEditDistance = ("slice::itermut".length) / 3 = 14 / 3 = 4
    editDistance("std", "slice") = 4
    editDistance("itermut", "itermut") = 0
        4 + 0 <= 4 PASS

Of course, `slice::itermut` should not match stuff from btreemap. `slice` should not match `std`.

The new result counts them separately:

    maxPathEditDistance = "slice".length / 3 = 5 / 3 = 1
    maxEditDistance = "itermut".length / 3 = 7 / 3 = 2
    editDistance("std", "slice") = 4
        4 <= 1 FAIL

Effectively, this makes path queries less "typo-resistant". It's not zero, but it means `vec` won't match the `v1` prelude.

This commit also adds substring matching to paths. It's stricter than the substring matching in the main part, but loose enough that what I expect to match does.

Queries without parent paths are unchanged.
@notriddle notriddle deleted the notriddle/maxpatheditdistance branch December 28, 2023 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants