Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store core::str::CharSearcher::utf8_size as u8 #119808

Merged
merged 1 commit into from
Feb 19, 2024

Conversation

GnomedDev
Copy link
Contributor

@GnomedDev GnomedDev commented Jan 10, 2024

This is already relied on being smaller than u8 due to the safety invariant: utf8_size must be less than 5, so this helps LLVM optimize and maybe improve copies due to padding instead of unused bytes.

@rustbot
Copy link
Collaborator

rustbot commented Jan 10, 2024

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @thomcc (or someone else) soon.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 10, 2024
@GnomedDev GnomedDev force-pushed the encode-charsearcher-size-in-type branch 3 times, most recently from adddd38 to 71a1558 Compare January 10, 2024 12:58
Comment on lines 352 to 358
enum Utf8Size {
// Values are indexes, so `- 1`
One = 0,
Two = 1,
Three = 2,
Four = 3,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If wanted, I can change this to use a rustc_scalar_valid_range attribute, but it seems safer to use the workaround that everyone else in the ecosystem has to perform.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be One = 1 and so on, skipping 0. I don't know if that makes this clearer or less clear.

@thomcc
Copy link
Member

thomcc commented Feb 1, 2024

I'm going to be away for a few months, so I'm rerolling my PRs so that folks don't have to wait for me. Sorry/thanks.

r? libs

@rustbot rustbot assigned joshtriplett and unassigned thomcc Feb 1, 2024
@joshtriplett
Copy link
Member

r? libs

@Mark-Simulacrum
Copy link
Member

Is there some evidence of downstream (i.e., runtime) benefits from this change? My sense is that the added code complexity isn't a great tradeoff without clear justification - the safety invariant is pretty easy to show being true with or without this.

@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 11, 2024
@GnomedDev
Copy link
Contributor Author

No, I simply saw an opportunity where there was an inefficiency and fixed it. If you want me to reduce code complexity, I can simply make this a one line PR to change the int size to u8, but that loses the safety benefit.

@Mark-Simulacrum
Copy link
Member

I'm ok landing just the u8, but in general it feels like this kind of optimization isn't worthwhile to me without a clear justification, given that it's not changing to overall size of the type.

@GnomedDev
Copy link
Contributor Author

As I have mentioned in other PRs, I'm a massive fan of small incremental improvement. Reducing this int size may allow reducing the type size in the future, or at least allows more optimized codegen.

@Noratrieb
Copy link
Member

When these changes increase code complexity, it's a tradeoff that often goes the way of rejecting them. But changing a obviously bigger-than-necessary int to a smaller one doesn't increase complexity, so that sounds like a good idea. I'm not sure whether it's necessarily the best use of our limited review bandwidth though.
As Mark said, changing this PR to only change the int type to a smaller one is the way forward here.

@GnomedDev GnomedDev closed this Feb 13, 2024
@GnomedDev GnomedDev force-pushed the encode-charsearcher-size-in-type branch from 71a1558 to e927184 Compare February 13, 2024 18:26
@GnomedDev GnomedDev changed the title Encode core::str::CharSearcher::utf8_size as enum Store core::str::CharSearcher::utf8_size as u8 Feb 13, 2024
@GnomedDev GnomedDev reopened this Feb 13, 2024
@GnomedDev GnomedDev force-pushed the encode-charsearcher-size-in-type branch from 4893842 to 601f2d1 Compare February 13, 2024 18:29
@GnomedDev
Copy link
Contributor Author

Done, after messing up by forgetting to commit then force pushing, this is now just moving to u8 so much less complexity.

@GnomedDev
Copy link
Contributor Author

@rustbot review

@rustbot rustbot removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Feb 18, 2024
@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 18, 2024
@@ -40,6 +40,7 @@

use crate::cmp;
use crate::cmp::Ordering;
use crate::convert::TryInto as _;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this is necessary, TryInto should be in the prelude.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like core isn't using edition 2021? It's quite weird, might have just been rust-analyzer weirdness in the Rust codebase though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mark-Simulacrum
Copy link
Member

@bors r+ rollup

@bors
Copy link
Contributor

bors commented Feb 18, 2024

📌 Commit 601f2d1 has been approved by Mark-Simulacrum

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 18, 2024
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Feb 19, 2024
…-in-type, r=Mark-Simulacrum

Store core::str::CharSearcher::utf8_size as u8

This is already relied on being smaller than u8 due to the `safety invariant: utf8_size must be less than 5`, so this helps LLVM optimize and maybe improve copies due to padding instead of unused bytes.
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 19, 2024
…iaskrgr

Rollup of 6 pull requests

Successful merges:

 - rust-lang#119808 (Store core::str::CharSearcher::utf8_size as u8)
 - rust-lang#121032 (Continue reporting remaining errors instead of silently dropping them)
 - rust-lang#121041 (Add `Future` and `IntoFuture` to the 2024 prelude)
 - rust-lang#121230 (Extend Level API)
 - rust-lang#121272 (Add diagnostic items for legacy numeric constants)
 - rust-lang#121275 (add test for panicking attribute macros)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit c5da038 into rust-lang:master Feb 19, 2024
11 checks passed
@rustbot rustbot added this to the 1.78.0 milestone Feb 19, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Feb 19, 2024
Rollup merge of rust-lang#119808 - GnomedDev:encode-charsearcher-size-in-type, r=Mark-Simulacrum

Store core::str::CharSearcher::utf8_size as u8

This is already relied on being smaller than u8 due to the `safety invariant: utf8_size must be less than 5`, so this helps LLVM optimize and maybe improve copies due to padding instead of unused bytes.
@GnomedDev GnomedDev deleted the encode-charsearcher-size-in-type branch February 19, 2024 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants