Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc_errors: use perfect hashing for character replacements #128463

Closed
wants to merge 2 commits into from

Conversation

GrigorenkoPV
Copy link
Contributor

@GrigorenkoPV GrigorenkoPV commented Jul 31, 2024

The correctness of code in #128200 relies on an array being sorted (so that it can be used in binary search later), which is currently enforced with // tidy-alphabetical (and characters being written in \u{XXXX} form), as well as lack of duplicate entries with conflicting keys, which is not currently enforced.

A const assert or a test can be added checking that (implemented in #128465).

But this PR tries to use perfect hashing instead.

The performance implications are unclear. Asymptotically it's faster, but in reality we should just benchmark. Plus if there are no significant performance wins, this entire things is probably not even worse the additional dependencies it brings.

UPD: funnily enough, there's a PR optimizing the binary search implementation (#128254) in the queue right now. So I guess we have to wait until that is merged too before benchmarking this.

@rustbot
Copy link
Collaborator

rustbot commented Jul 31, 2024

r? @fmease

rustbot has assigned @fmease.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added A-testsuite Area: The testsuite used to check the correctness of rustc S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 31, 2024
@rustbot
Copy link
Collaborator

rustbot commented Jul 31, 2024

Failed to set assignee to ghost: invalid assignee

Note: Only org members with at least the repository "read" role, users with write permissions, or people who have commented on the PR may be assigned.

@GrigorenkoPV
Copy link
Contributor Author

r? @ghost

@rustbot
Copy link
Collaborator

rustbot commented Jul 31, 2024

Failed to set assignee to ghost: invalid assignee

Note: Only org members with at least the repository "read" role, users with write permissions, or people who have commented on the PR may be assigned.

1 similar comment
@rustbot
Copy link
Collaborator

rustbot commented Jul 31, 2024

Failed to set assignee to ghost: invalid assignee

Note: Only org members with at least the repository "read" role, users with write permissions, or people who have commented on the PR may be assigned.

@GrigorenkoPV
Copy link
Contributor Author

Failed to set assignee to ghost: invalid assignee

Note: Only org members with at least the repository "read" role, users with write permissions, or people who have commented on the PR may be assigned.

So I guess it only works when it is present in the first version of the OP. Cool.

@workingjubilee
Copy link
Member

was about to queue this and noticed that the other PR still hadn't landed...

@workingjubilee workingjubilee added the S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. label Aug 6, 2024
@GrigorenkoPV GrigorenkoPV marked this pull request as ready for review August 6, 2024 06:34
@rustbot
Copy link
Collaborator

rustbot commented Aug 6, 2024

These commits modify the Cargo.lock file. Unintentional changes to Cargo.lock can be introduced when switching branches and rebasing PRs.

If this was unintentional then you should revert the changes before this PR is merged.
Otherwise, you can ignore this comment.

The list of allowed third-party dependencies may have been modified! You must ensure that any new dependencies have compatible licenses before merging.

cc @davidtwco, @wesleywiser

@GrigorenkoPV GrigorenkoPV marked this pull request as draft August 6, 2024 06:35
@GrigorenkoPV
Copy link
Contributor Author

was about to queue this and noticed that the other PR still hadn't landed...

Now it has! Could you please queue it for benchmarking? I do not think I have enough rights to do it myself, nor do I really remember how it is done.

@estebank
Copy link
Contributor

estebank commented Aug 6, 2024

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Aug 6, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 6, 2024
rustc_errors: use perfect hashing for character replacements

The correctness of code in rust-lang#128200 relies on an array being sorted (so that it can be used in binary search later), which is currently enforced with `// tidy-alphabetical` (and characters being written in `\u{XXXX}` form), as well as lack of duplicate entries with conflicting keys, which is not currently enforced.

A const assert or a test can be added checking that (implemented in rust-lang#128465).

But this PR tries to use [perfect hashing](https://en.wikipedia.org/wiki/Perfect_hash_function) instead.

The performance implications are unclear. Asymptotically it's faster, but in reality we should just benchmark. Plus if there are no significant performance wins, this entire things is probably not even worse the additional dependencies it brings.

UPD: funnily enough, there's a PR optimizing the binary search implementation (rust-lang#128254) in the queue right now. So I guess we have to wait until that is merged too before benchmarking this.
@bors
Copy link
Contributor

bors commented Aug 6, 2024

⌛ Trying commit 4108ac4 with merge 1de00dc...

@bors
Copy link
Contributor

bors commented Aug 6, 2024

☀️ Try build successful - checks-actions
Build commit: 1de00dc (1de00dcc88616d20eed9528aee78fef6370f7cf0)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (1de00dc): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.0% [0.2%, 1.8%] 11
Regressions ❌
(secondary)
0.4% [0.4%, 0.4%] 3
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.0% [0.2%, 1.8%] 11

Max RSS (memory usage)

Results (primary 2.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.1% [2.1%, 2.1%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.1% [2.1%, 2.1%] 1

Cycles

Results (primary -4.2%, secondary 0.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
5.1% [4.7%, 5.5%] 2
Improvements ✅
(primary)
-4.2% [-4.2%, -4.2%] 1
Improvements ✅
(secondary)
-2.5% [-2.8%, -2.3%] 3
All ❌✅ (primary) -4.2% [-4.2%, -4.2%] 1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 760.99s -> 765.201s (0.55%)
Artifact size: 336.87 MiB -> 336.89 MiB (0.01%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Aug 6, 2024
@GrigorenkoPV
Copy link
Contributor Author

GrigorenkoPV commented Aug 6, 2024

Well, the regressions are on syn again (#128169 (comment)), and there are no improvements whatsoever. At least on the instruction count. Number of cycles seem to have gotten a bit better overall, but not significantly, so probably not worth all the additional dependencies.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 8, 2024
Some `const { }` asserts for rust-lang#128200

The correctness of code in rust-lang#128200 relies on an array being sorted (so that it can be used in binary search later), which is currently enforced with `// tidy-alphabetical` (and characters being written in `\u{XXXX}` form), as well as lack of duplicate entries with conflicting keys, which is not currently enforced.

This PR changes it to using a `const{ }` assertion (and also checks for duplicate entries). Sadly, we cannot use the recently-stabilized `is_sorted_by_key` here, because it is not const (but it would not allow us to check for uniqueness anyways). Instead, let's write a manual loop.

Alternative approach (perfect hash function): rust-lang#128463

r? `@ghost`
bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 8, 2024
Some `const { }` asserts for rust-lang#128200

The correctness of code in rust-lang#128200 relies on an array being sorted (so that it can be used in binary search later), which is currently enforced with `// tidy-alphabetical` (and characters being written in `\u{XXXX}` form), as well as lack of duplicate entries with conflicting keys, which is not currently enforced.

This PR changes it to using a `const{ }` assertion (and also checks for duplicate entries). Sadly, we cannot use the recently-stabilized `is_sorted_by_key` here, because it is not const (but it would not allow us to check for uniqueness anyways). Instead, let's write a manual loop.

Alternative approach (perfect hash function): rust-lang#128463

r? `@ghost`
@GrigorenkoPV GrigorenkoPV deleted the perfect-hash branch August 10, 2024 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testsuite Area: The testsuite used to check the correctness of rustc perf-regression Performance regression. S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants