Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the codebase's use of chars & put range asserts on chars #12086

Merged
merged 4 commits into from
Feb 8, 2014

Conversation

huonw
Copy link
Member

@huonw huonw commented Feb 7, 2014

The lexer and json were using transmute(-1): char as a sentinel value for EOF, which is invalid since char is strictly a unicode codepoint.

Fixing this allows for range asserts on chars since they always lie between 0 and 0x10FFFF.

}
bump(rdr);
accum_int *= 16;
accum_int += hex_digit_val(n);
i -= 1u;
}
if i != 0 && is_eof(rdr) {
fatal_span(rdr, start_bpos, rdr.last_pos.get(),
~"unterminated numeric characer escape");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

@lilyball
Copy link
Contributor

lilyball commented Feb 8, 2014

LGTM, r=me after squash

huonw added 4 commits February 8, 2014 12:13
Avoid using -1 as a char sentinel, when Option<char> is the perfect
thing.
The transmute was unsound.

There are many instances of .unwrap_or('\x00') for "ignoring" EOF which
either do not make the situation worse than it was (well, actually make
it better, since it's easy to grep for places that don't handle EOF) or
can never ever be read.

Fixes rust-lang#8971.
A `char` is a Unicode codepoint, and so ranges from 0--0x10FFFF (with
the surrogate gaps): we may as well inform LLVM of this.
Apparently loading them signed will break if/when they become i1.
bors added a commit that referenced this pull request Feb 8, 2014
The lexer and json were using `transmute(-1): char` as a sentinel value for EOF, which is invalid since `char` is strictly a unicode codepoint.

Fixing this allows for range asserts on chars since they always lie between 0 and 0x10FFFF.
@bors bors closed this Feb 8, 2014
@bors bors merged commit 5e2de79 into rust-lang:master Feb 8, 2014
@SimonSapin
Copy link
Contributor

What’s a range assert and when is it used? Should it also exclude the 0xD800 to 0xDFFF range (surrogate code points), matching std::char::from_u32?

@huonw huonw deleted the safe-json branch February 13, 2014 01:53
@huonw
Copy link
Member Author

huonw commented Feb 13, 2014

@SimonSapin A range assert tells LLVM that a char is always in [0-0x10FFFF] which allows LLVM to possibly perform other optimisations, i.e. it is purely an optimisation, not a semantic thing (although it does mean attempts to (unsafely) put a value outside this range in a char become undefined behaviour).

Excluding the surrogate range is possible, but it seems unlikely that there will be much performance gained from just excluding an interior range, while telling LLVM that char only has 21 used bits (which the range assert here does) apparently does give higher performance in some scenarios.

@SimonSapin
Copy link
Contributor

Perfect answer, thanks @huonw!

bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 25, 2022
infer from RPIT bounds of _this_ function

Collect obligations from RPITs (Return Position `impl Trait`) of a function which is being inferred.
This allows inferring {unknown}s from RPIT bounds.

Closes rust-lang#8403
flip1995 pushed a commit to flip1995/rust that referenced this pull request Jan 11, 2024
Update copyright year for Clippy (2024 edition)

Our license was outdated by a year (+ change in the main README.md)

changelog:none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants