rustc_errors: use perfect hashing for character replacements #128463

GrigorenkoPV · 2024-07-31T21:35:08Z

The correctness of code in #128200 relies on an array being sorted (so that it can be used in binary search later), which is currently enforced with // tidy-alphabetical (and characters being written in \u{XXXX} form), as well as lack of duplicate entries with conflicting keys, which is not currently enforced.

A const assert or a test can be added checking that (implemented in #128465).

But this PR tries to use perfect hashing instead.

The performance implications are unclear. Asymptotically it's faster, but in reality we should just benchmark. Plus if there are no significant performance wins, this entire things is probably not even worse the additional dependencies it brings.

UPD: funnily enough, there's a PR optimizing the binary search implementation (#128254) in the queue right now. So I guess we have to wait until that is merged too before benchmarking this.

rustbot · 2024-07-31T21:35:16Z

r? @fmease

rustbot has assigned @fmease.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

rustbot · 2024-07-31T21:35:40Z

Failed to set assignee to ghost: invalid assignee

Note: Only org members with at least the repository "read" role, users with write permissions, or people who have commented on the PR may be assigned.

GrigorenkoPV · 2024-07-31T22:13:54Z

r? @ghost

rustbot · 2024-07-31T22:13:57Z

Failed to set assignee to ghost: invalid assignee

Note: Only org members with at least the repository "read" role, users with write permissions, or people who have commented on the PR may be assigned.

rustbot · 2024-07-31T22:15:03Z

Failed to set assignee to ghost: invalid assignee

Note: Only org members with at least the repository "read" role, users with write permissions, or people who have commented on the PR may be assigned.

GrigorenkoPV · 2024-07-31T22:16:49Z

Failed to set assignee to ghost: invalid assignee

Note: Only org members with at least the repository "read" role, users with write permissions, or people who have commented on the PR may be assigned.

So I guess it only works when it is present in the first version of the OP. Cool.

workingjubilee · 2024-08-06T04:56:16Z

was about to queue this and noticed that the other PR still hadn't landed...

rustbot · 2024-08-06T06:35:00Z

These commits modify the Cargo.lock file. Unintentional changes to Cargo.lock can be introduced when switching branches and rebasing PRs.

If this was unintentional then you should revert the changes before this PR is merged.
Otherwise, you can ignore this comment.

The list of allowed third-party dependencies may have been modified! You must ensure that any new dependencies have compatible licenses before merging.

cc @davidtwco, @wesleywiser

GrigorenkoPV · 2024-08-06T06:37:16Z

was about to queue this and noticed that the other PR still hadn't landed...

Now it has! Could you please queue it for benchmarking? I do not think I have enough rights to do it myself, nor do I really remember how it is done.

estebank · 2024-08-06T16:39:02Z

@bors try @rust-timer queue

rustc_errors: use perfect hashing for character replacements The correctness of code in rust-lang#128200 relies on an array being sorted (so that it can be used in binary search later), which is currently enforced with `// tidy-alphabetical` (and characters being written in `\u{XXXX}` form), as well as lack of duplicate entries with conflicting keys, which is not currently enforced. A const assert or a test can be added checking that (implemented in rust-lang#128465). But this PR tries to use [perfect hashing](https://en.wikipedia.org/wiki/Perfect_hash_function) instead. The performance implications are unclear. Asymptotically it's faster, but in reality we should just benchmark. Plus if there are no significant performance wins, this entire things is probably not even worse the additional dependencies it brings. UPD: funnily enough, there's a PR optimizing the binary search implementation (rust-lang#128254) in the queue right now. So I guess we have to wait until that is merged too before benchmarking this.

bors · 2024-08-06T16:40:14Z

⌛ Trying commit 4108ac4 with merge 1de00dc...

bors · 2024-08-06T18:34:53Z

☀️ Try build successful - checks-actions
Build commit: 1de00dc (1de00dcc88616d20eed9528aee78fef6370f7cf0)

rust-timer · 2024-08-06T20:23:57Z

Finished benchmarking commit (1de00dc): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.0%	[0.2%, 1.8%]	11
Regressions ❌ (secondary)	0.4%	[0.4%, 0.4%]	3
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	1.0%	[0.2%, 1.8%]	11

Max RSS (memory usage)

Results (primary 2.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.1%	[2.1%, 2.1%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	2.1%	[2.1%, 2.1%]	1

Cycles

Results (primary -4.2%, secondary 0.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	5.1%	[4.7%, 5.5%]	2
Improvements ✅ (primary)	-4.2%	[-4.2%, -4.2%]	1
Improvements ✅ (secondary)	-2.5%	[-2.8%, -2.3%]	3
All ❌✅ (primary)	-4.2%	[-4.2%, -4.2%]	1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 760.99s -> 765.201s (0.55%)
Artifact size: 336.87 MiB -> 336.89 MiB (0.01%)

GrigorenkoPV · 2024-08-06T20:31:45Z

Well, the regressions are on syn again (#128169 (comment)), and there are no improvements whatsoever. At least on the instruction count. Number of cycles seem to have gotten a bit better overall, but not significantly, so probably not worth all the additional dependencies.

Some `const { }` asserts for rust-lang#128200 The correctness of code in rust-lang#128200 relies on an array being sorted (so that it can be used in binary search later), which is currently enforced with `// tidy-alphabetical` (and characters being written in `\u{XXXX}` form), as well as lack of duplicate entries with conflicting keys, which is not currently enforced. This PR changes it to using a `const{ }` assertion (and also checks for duplicate entries). Sadly, we cannot use the recently-stabilized `is_sorted_by_key` here, because it is not const (but it would not allow us to check for uniqueness anyways). Instead, let's write a manual loop. Alternative approach (perfect hash function): rust-lang#128463 r? `@ghost`

rustbot assigned fmease Jul 31, 2024

GrigorenkoPV force-pushed the perfect-hash branch from b8da67c to 03906c5 Compare July 31, 2024 22:12

GrigorenkoPV mentioned this pull request Jul 31, 2024

Some const { } asserts for #128200 #128465

Merged

lqd unassigned fmease Jul 31, 2024

GrigorenkoPV mentioned this pull request Aug 1, 2024

Change output normalization logic to be linear against size of output #128200

Merged

workingjubilee closed this Aug 6, 2024

workingjubilee reopened this Aug 6, 2024

workingjubilee added the S-blocked Status: Blocked on something else such as an RFC or other implementation work. label Aug 6, 2024

GrigorenkoPV added 2 commits August 6, 2024 09:34

rustc_errors: use perfect hashing for character replacements

789baed

rustc_errors: fix inaccurate comment

4108ac4

GrigorenkoPV force-pushed the perfect-hash branch from 03906c5 to 4108ac4 Compare August 6, 2024 06:34

GrigorenkoPV marked this pull request as ready for review August 6, 2024 06:34

GrigorenkoPV marked this pull request as draft August 6, 2024 06:35

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Aug 6, 2024

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Aug 6, 2024

GrigorenkoPV closed this Aug 6, 2024

GrigorenkoPV deleted the perfect-hash branch August 10, 2024 20:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rustc_errors: use perfect hashing for character replacements #128463

rustc_errors: use perfect hashing for character replacements #128463

GrigorenkoPV commented Jul 31, 2024 •

edited

Loading

rustbot commented Jul 31, 2024

rustbot commented Jul 31, 2024

GrigorenkoPV commented Jul 31, 2024

rustbot commented Jul 31, 2024

rustbot commented Jul 31, 2024

GrigorenkoPV commented Jul 31, 2024

workingjubilee commented Aug 6, 2024

rustbot commented Aug 6, 2024

GrigorenkoPV commented Aug 6, 2024

estebank commented Aug 6, 2024

This comment has been minimized.

bors commented Aug 6, 2024

bors commented Aug 6, 2024

This comment has been minimized.

rust-timer commented Aug 6, 2024

GrigorenkoPV commented Aug 6, 2024 •

edited

Loading

rustc_errors: use perfect hashing for character replacements #128463

rustc_errors: use perfect hashing for character replacements #128463

Conversation

GrigorenkoPV commented Jul 31, 2024 • edited Loading

rustbot commented Jul 31, 2024

rustbot commented Jul 31, 2024

GrigorenkoPV commented Jul 31, 2024

rustbot commented Jul 31, 2024

rustbot commented Jul 31, 2024

GrigorenkoPV commented Jul 31, 2024

workingjubilee commented Aug 6, 2024

rustbot commented Aug 6, 2024

GrigorenkoPV commented Aug 6, 2024

estebank commented Aug 6, 2024

This comment has been minimized.

bors commented Aug 6, 2024

bors commented Aug 6, 2024

This comment has been minimized.

rust-timer commented Aug 6, 2024

Overall result: ❌ regressions - ACTION NEEDED

Instruction count

Max RSS (memory usage)

Cycles

Binary size

GrigorenkoPV commented Aug 6, 2024 • edited Loading

GrigorenkoPV commented Jul 31, 2024 •

edited

Loading

GrigorenkoPV commented Aug 6, 2024 •

edited

Loading