Regenerate character tables for Unicode 12.1 #62641

cuviper · 2019-07-12T23:50:49Z

No description provided.

rust-highfive · 2019-07-12T23:50:52Z

r? @bluss

(rust_highfive has picked a reviewer for you, use r? to override)

cuviper · 2019-07-22T21:14:09Z

r? @SimonSapin

matklad · 2019-07-24T10:30:39Z

Let's just r+ this? I am not an expert in unicode, but this seems straightforward and blocks progress on #62848 :)

@bors r+ rollup

bors · 2019-07-24T10:30:41Z

📌 Commit de1e489 has been approved by matklad

Regenerate character tables for Unicode 12.1

@ghost

Rollup of 10 pull requests Successful merges: - rust-lang#62641 (Regenerate character tables for Unicode 12.1) - rust-lang#62716 (state also in the intro that UnsafeCell has no effect on &mut) - rust-lang#62738 (Remove uses of mem::uninitialized from std::sys::cloudabi) - rust-lang#62772 (Suggest trait bound on type parameter when it is unconstrained) - rust-lang#62890 (Normalize use of backticks in compiler messages for libsyntax/*) - rust-lang#62905 (Normalize use of backticks in compiler messages for doc) - rust-lang#62916 (Add test `self-in-enum-definition`) - rust-lang#62917 (Always emit trailing slash error) - rust-lang#62926 (Fix typo in mem::uninitialized doc) - rust-lang#62927 (use PanicMessage in MIR, kill InterpError::description) Failed merges: r? @ghost

@ghost

Rollup of 10 pull requests Successful merges: - #62641 (Regenerate character tables for Unicode 12.1) - #62716 (state also in the intro that UnsafeCell has no effect on &mut) - #62738 (Remove uses of mem::uninitialized from std::sys::cloudabi) - #62772 (Suggest trait bound on type parameter when it is unconstrained) - #62890 (Normalize use of backticks in compiler messages for libsyntax/*) - #62905 (Normalize use of backticks in compiler messages for doc) - #62916 (Add test `self-in-enum-definition`) - #62917 (Always emit trailing slash error) - #62926 (Fix typo in mem::uninitialized doc) - #62927 (use PanicMessage in MIR, kill InterpError::description) Failed merges: r? @ghost

@matklad

Use unicode-xid crate instead of libcore This PR proposes to remove `char::is_xid_start` and `char::is_xid_continue` functions from `libcore` and use `unicode_xid` crate from crates.io (note that this crate is already present in rust-lang/rust's Cargo.lock). Reasons to do this: * removing rustc-binary-specific stuff from libcore * making sure that, across the ecosystem, there's a single definition of what rust identifier is (`unicode-xid` has almost 10 million downs, as a `proc_macro2` dependency) * making it easier to share `rustc_lexer` crate with rust-analyzer: no need to `#[cfg]` if we are building as a part of the compiler Reasons not to do this: * increased maintenance burden: we'll need to upgrade unicode version both in libcore and in unicode-xid. However, this shouldn't be a too heavy burden: just running `./unicode.py` after new unicode version. I (@matklad) am ready to be a t-compiler side maintainer of unicode-xid. Moreover, given that xid-unicode is an important dependency of syn, *someone* needs to maintain it anyway. * xid-unicode implementation is significantly slower. It uses a more compact table with binary search, instead of a trie. However, this shouldn't matter in practice, because we have fast-path for ascii anyway, and code size savings is a plus. Moreover, in rust-lang#59706 not using libcore turned out to be *faster*, presumably beacause checking for whitespace with match is even faster. <details> <summary>old description</summary> Followup to rust-lang#59706 r? @eddyb Note that this doesn't actually remove tables from libcore, to avoid conflict with rust-lang#62641. cc unicode-rs/unicode-xid#11 </details>

@matklad

Use unicode-xid crate instead of libcore This PR proposes to remove `char::is_xid_start` and `char::is_xid_continue` functions from `libcore` and use `unicode_xid` crate from crates.io (note that this crate is already present in rust-lang/rust's Cargo.lock). Reasons to do this: * removing rustc-binary-specific stuff from libcore * making sure that, across the ecosystem, there's a single definition of what rust identifier is (`unicode-xid` has almost 10 million downs, as a `proc_macro2` dependency) * making it easier to share `rustc_lexer` crate with rust-analyzer: no need to `#[cfg]` if we are building as a part of the compiler Reasons not to do this: * increased maintenance burden: we'll need to upgrade unicode version both in libcore and in unicode-xid. However, this shouldn't be a too heavy burden: just running `./unicode.py` after new unicode version. I (@matklad) am ready to be a t-compiler side maintainer of unicode-xid. Moreover, given that xid-unicode is an important dependency of syn, *someone* needs to maintain it anyway. * xid-unicode implementation is significantly slower. It uses a more compact table with binary search, instead of a trie. However, this shouldn't matter in practice, because we have fast-path for ascii anyway, and code size savings is a plus. Moreover, in rust-lang#59706 not using libcore turned out to be *faster*, presumably beacause checking for whitespace with match is even faster. <details> <summary>old description</summary> Followup to rust-lang#59706 r? @eddyb Note that this doesn't actually remove tables from libcore, to avoid conflict with rust-lang#62641. cc unicode-rs/unicode-xid#11 </details>

@matklad

Use unicode-xid crate instead of libcore This PR proposes to remove `char::is_xid_start` and `char::is_xid_continue` functions from `libcore` and use `unicode_xid` crate from crates.io (note that this crate is already present in rust-lang/rust's Cargo.lock). Reasons to do this: * removing rustc-binary-specific stuff from libcore * making sure that, across the ecosystem, there's a single definition of what rust identifier is (`unicode-xid` has almost 10 million downs, as a `proc_macro2` dependency) * making it easier to share `rustc_lexer` crate with rust-analyzer: no need to `#[cfg]` if we are building as a part of the compiler Reasons not to do this: * increased maintenance burden: we'll need to upgrade unicode version both in libcore and in unicode-xid. However, this shouldn't be a too heavy burden: just running `./unicode.py` after new unicode version. I (@matklad) am ready to be a t-compiler side maintainer of unicode-xid. Moreover, given that xid-unicode is an important dependency of syn, *someone* needs to maintain it anyway. * xid-unicode implementation is significantly slower. It uses a more compact table with binary search, instead of a trie. However, this shouldn't matter in practice, because we have fast-path for ascii anyway, and code size savings is a plus. Moreover, in rust-lang#59706 not using libcore turned out to be *faster*, presumably beacause checking for whitespace with match is even faster. <details> <summary>old description</summary> Followup to rust-lang#59706 r? @eddyb Note that this doesn't actually remove tables from libcore, to avoid conflict with rust-lang#62641. cc unicode-rs/unicode-xid#11 </details>

cuviper added 2 commits July 12, 2019 16:28

Update unicode scripts for the current coding style

76128c3

Regenerate character tables for Unicode 12.1

de1e489

rust-highfive assigned bluss Jul 12, 2019

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 12, 2019

matklad mentioned this pull request Jul 21, 2019

Use unicode-xid crate instead of libcore #62848

Merged

rust-highfive assigned SimonSapin and unassigned bluss Jul 22, 2019

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 24, 2019

Centril added a commit to Centril/rust that referenced this pull request Jul 24, 2019

Rollup merge of rust-lang#62641 - cuviper:unicode-12.1, r=matklad

21caaba

Regenerate character tables for Unicode 12.1

Centril mentioned this pull request Jul 24, 2019

Rollup of 10 pull requests #62935

Merged

bors merged commit de1e489 into rust-lang:master Jul 24, 2019

cuviper deleted the unicode-12.1 branch April 3, 2020 18:40

KamilaBorowska mentioned this pull request Jul 22, 2020

Support std::char::UNICODE_VERSION jhpratt/standback#9

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regenerate character tables for Unicode 12.1 #62641

Regenerate character tables for Unicode 12.1 #62641

cuviper commented Jul 12, 2019

rust-highfive commented Jul 12, 2019

cuviper commented Jul 22, 2019

matklad commented Jul 24, 2019

bors commented Jul 24, 2019

Regenerate character tables for Unicode 12.1 #62641

Regenerate character tables for Unicode 12.1 #62641

Conversation

cuviper commented Jul 12, 2019

rust-highfive commented Jul 12, 2019

cuviper commented Jul 22, 2019

matklad commented Jul 24, 2019

bors commented Jul 24, 2019