Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the size of static data in std_unicode::tables #38781

Merged
merged 2 commits into from
Jan 7, 2017

Conversation

SimonSapin
Copy link
Contributor

BoolTrie works well for sets of code points spread out through most of Unicode’s range, but is uses a lot of space for sets with few, mostly low, code points.

This switches a few of its instances to a similar but simpler trie data structure.

CC @raphlinus, who wrote the original BoolTrie.

Before

size_of::<BoolTrie>() is 1552, which is added to table.r3.len() * 8 + t.r5.len() + t.r6.len() * 8:

  • Cc_table: 1632
  • White_Space_table: 1656
  • Pattern_White_Space_table: 1640
  • Total: 4928 bytes

After

size_of::<SmallBoolTrie>() is 32, which is added to t.r1.len() + t.r2.len() * 8:

  • Cc_table: 51
  • White_Space_table: 273
  • Pattern_White_Space_table: 193
  • Total: 517 bytes

Difference

Every Rust program with std statically linked should be about 4 KB smaller.

It was used to measure before/after size in cfaf66c.
@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @nikomatsakis (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

Copy link
Contributor

@raphlinus raphlinus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I thought about doing this but was thinking more about code complexity than static size. The latter is important.

pub_string = ""
if is_pub:
pub_string = "pub "
f.write(" %sconst %s: &'static super::SmallBoolTrie = &super::SmallBoolTrie {\n" % (pub_string, name))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/checkout/src/etc/unicode.py:461: line longer than 100 chars

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

`BoolTrie` works well for sets of code points spread out through
most of Unicode’s range, but is uses a lot of space for sets
with few, mostly low, code points.

This switches a few of its instances to a similar but simpler trie
data structure.

 ## Before

`size_of::<BoolTrie>()` is 1552, which is added to
`table.r3.len() * 8 + t.r5.len() + t.r6.len() * 8`:

* `Cc_table`: 1632
* `White_Space_table`: 1656
* `Pattern_White_Space_table`: 1640
* Total: 4928 bytes

 ## After

`size_of::<SmallBoolTrie>()` is 32, which is added to
`t.r1.len() + t.r2.len() * 8`:

* `Cc_table`: 51
* `White_Space_table`: 273
* `Pattern_White_Space_table`: 193
* Total: 517 bytes

 ## Difference

Every Rust program with `std` statically linked should be about 4 KB smaller.
@nikomatsakis
Copy link
Contributor

This seems good to me, but I feel like someone from @rust-lang/libs should weigh in.

r? @alexcrichton

@alexcrichton
Copy link
Member

@bors: r+

Awesome wins, thanks @SimonSapin! Also thanks for the sanity check as well @raphlinus :)

@bors
Copy link
Contributor

bors commented Jan 3, 2017

📌 Commit 3b208d2 has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Jan 5, 2017

⌛ Testing commit 3b208d2 with merge 582733d...

@bors
Copy link
Contributor

bors commented Jan 5, 2017

💔 Test failed - status-travis

@alexcrichton
Copy link
Member

alexcrichton commented Jan 5, 2017 via email

@bors
Copy link
Contributor

bors commented Jan 5, 2017

⌛ Testing commit 3b208d2 with merge e9a3cb4...

@bors
Copy link
Contributor

bors commented Jan 5, 2017

💔 Test failed - status-travis

@alexcrichton
Copy link
Member

@bors: retry

  • network failure

@bors
Copy link
Contributor

bors commented Jan 7, 2017

⌛ Testing commit 3b208d2 with merge 0896826...

bors added a commit that referenced this pull request Jan 7, 2017
Reduce the size of static data in std_unicode::tables

`BoolTrie` works well for sets of code points spread out through most of Unicode’s range, but is uses a lot of space for sets with few, mostly low, code points.

This switches a few of its instances to a similar but simpler trie data structure.

CC @raphlinus, who wrote the original `BoolTrie`.

## Before

`size_of::<BoolTrie>()` is 1552, which is added to `table.r3.len() * 8 + t.r5.len() + t.r6.len() * 8`:

* `Cc_table`: 1632
* `White_Space_table`: 1656
* `Pattern_White_Space_table`: 1640
* Total: 4928 bytes

## After

`size_of::<SmallBoolTrie>()` is 32, which is added to `t.r1.len() + t.r2.len() * 8`:

* `Cc_table`: 51
* `White_Space_table`: 273
* `Pattern_White_Space_table`: 193
* Total: 517 bytes

## Difference

Every Rust program with `std` statically linked should be about 4 KB smaller.
@bors
Copy link
Contributor

bors commented Jan 7, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 0896826 to master...

@bors bors merged commit 3b208d2 into rust-lang:master Jan 7, 2017

bytes_old = 0
bytes_new = 0
import fileinput, re, os, sys, operator, math

preamble = '''// Copyright 2012-2016 The Rust Project Developers. See the COPYRIGHT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, looks like the copyrights are lagging behind the new year.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter? Also, this PR is merged, so if you want something to happen you should open a new issue or PR.

@SimonSapin SimonSapin deleted the unishrink branch January 11, 2017 21:40
jsonn pushed a commit to jsonn/pkgsrc that referenced this pull request Mar 20, 2017
Version 1.16.0 (2017-03-16)
===========================

Language
--------

* Lifetimes in statics and consts default to `'static`. [RFC 1623]
* [The compiler's `dead_code` lint now accounts for type aliases][38051].
* [Uninhabitable enums (those without any variants) no longer permit wildcard
  match patterns][38069]
* [Clean up semantics of `self` in an import list][38313]
* [`Self` may appear in `impl` headers][38920]
* [`Self` may appear in struct expressions][39282]

Compiler
--------

* [`rustc` now supports `--emit=metadata`, which causes rustc to emit
  a `.rmeta` file containing only crate metadata][38571]. This can be
  used by tools like the Rust Language Service to perform
  metadata-only builds.
* [Levenshtein based typo suggestions now work in most places, while
  previously they worked only for fields and sometimes for local
  variables][38927]. Together with the overhaul of "no
  resolution"/"unexpected resolution" errors (#[38154]) they result in
  large and systematic improvement in resolution diagnostics.
* [Fix `transmute::<T, U>` where `T` requires a bigger alignment than
  `U`][38670]
* [rustc: use -Xlinker when specifying an rpath with ',' in it][38798]
* [`rustc` no longer attempts to provide "consider using an explicit
  lifetime" suggestions][37057]. They were inaccurate.

Stabilized APIs
---------------

* [`VecDeque::truncate`]
* [`VecDeque::resize`]
* [`String::insert_str`]
* [`Duration::checked_add`]
* [`Duration::checked_sub`]
* [`Duration::checked_div`]
* [`Duration::checked_mul`]
* [`str::replacen`]
* [`str::repeat`]
* [`SocketAddr::is_ipv4`]
* [`SocketAddr::is_ipv6`]
* [`IpAddr::is_ipv4`]
* [`IpAddr::is_ipv6`]
* [`Vec::dedup_by`]
* [`Vec::dedup_by_key`]
* [`Result::unwrap_or_default`]
* [`<*const T>::wrapping_offset`]
* [`<*mut T>::wrapping_offset`]
* `CommandExt::creation_flags`
* [`File::set_permissions`]
* [`String::split_off`]

Libraries
---------

* [`[T]::binary_search` and `[T]::binary_search_by_key` now take
  their argument by `Borrow` parameter][37761]
* [All public types in std implement `Debug`][38006]
* [`IpAddr` implements `From<Ipv4Addr>` and `From<Ipv6Addr>`][38327]
* [`Ipv6Addr` implements `From<[u16; 8]>`][38131]
* [Ctrl-Z returns from `Stdin.read()` when reading from the console on
  Windows][38274]
* [std: Fix partial writes in `LineWriter`][38062]
* [std: Clamp max read/write sizes on Unix][38062]
* [Use more specific panic message for `&str` slicing errors][38066]
* [`TcpListener::set_only_v6` is deprecated][38304]. This
  functionality cannot be achieved in std currently.
* [`writeln!`, like `println!`, now accepts a form with no string
  or formatting arguments, to just print a newline][38469]
* [Implement `iter::Sum` and `iter::Product` for `Result`][38580]
* [Reduce the size of static data in `std_unicode::tables`][38781]
* [`char::EscapeDebug`, `EscapeDefault`, `EscapeUnicode`,
  `CaseMappingIter`, `ToLowercase`, `ToUppercase`, implement
  `Display`][38909]
* [`Duration` implements `Sum`][38712]
* [`String` implements `ToSocketAddrs`][39048]

Cargo
-----

* [The `cargo check` command does a type check of a project without
  building it][cargo/3296]
* [crates.io will display CI badges from Travis and AppVeyor, if
  specified in Cargo.toml][cargo/3546]
* [crates.io will display categories listed in Cargo.toml][cargo/3301]
* [Compilation profiles accept integer values for `debug`, in addition
  to `true` and `false`. These are passed to `rustc` as the value to
  `-C debuginfo`][cargo/3534]
* [Implement `cargo --version --verbose`][cargo/3604]
* [All builds now output 'dep-info' build dependencies compatible with
  make and ninja][cargo/3557]
* [Build all workspace members with `build --all`][cargo/3511]
* [Document all workspace members with `doc --all`][cargo/3515]
* [Path deps outside workspace are not members][cargo/3443]

Misc
----

* [`rustdoc` has a `--sysroot` argument that, like `rustc`, specifies
  the path to the Rust implementation][38589]
* [The `armv7-linux-androideabi` target no longer enables NEON
  extensions, per Google's ABI guide][38413]
* [The stock standard library can be compiled for Redox OS][38401]
* [Rust has initial SPARC support][38726]. Tier 3. No builds
  available.
* [Rust has experimental support for Nvidia PTX][38559]. Tier 3. No
  builds available.
* [Fix backtraces on i686-pc-windows-gnu by disabling FPO][39379]

Compatibility Notes
-------------------

* [Uninhabitable enums (those without any variants) no longer permit wildcard
  match patterns][38069]
* In this release, references to uninhabited types can not be
  pattern-matched. This was accidentally allowed in 1.15.
* [The compiler's `dead_code` lint now accounts for type aliases][38051].
* [Ctrl-Z returns from `Stdin.read()` when reading from the console on
  Windows][38274]
* [Clean up semantics of `self` in an import list][38313]

[37057]: rust-lang/rust#37057
[37761]: rust-lang/rust#37761
[38006]: rust-lang/rust#38006
[38051]: rust-lang/rust#38051
[38062]: rust-lang/rust#38062
[38062]: rust-lang/rust#38622
[38066]: rust-lang/rust#38066
[38069]: rust-lang/rust#38069
[38131]: rust-lang/rust#38131
[38154]: rust-lang/rust#38154
[38274]: rust-lang/rust#38274
[38304]: rust-lang/rust#38304
[38313]: rust-lang/rust#38313
[38314]: rust-lang/rust#38314
[38327]: rust-lang/rust#38327
[38401]: rust-lang/rust#38401
[38413]: rust-lang/rust#38413
[38469]: rust-lang/rust#38469
[38559]: rust-lang/rust#38559
[38571]: rust-lang/rust#38571
[38580]: rust-lang/rust#38580
[38589]: rust-lang/rust#38589
[38670]: rust-lang/rust#38670
[38712]: rust-lang/rust#38712
[38726]: rust-lang/rust#38726
[38781]: rust-lang/rust#38781
[38798]: rust-lang/rust#38798
[38909]: rust-lang/rust#38909
[38920]: rust-lang/rust#38920
[38927]: rust-lang/rust#38927
[39048]: rust-lang/rust#39048
[39282]: rust-lang/rust#39282
[39379]: rust-lang/rust#39379
[`<*const T>::wrapping_offset`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.wrapping_offset
[`<*mut T>::wrapping_offset`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.wrapping_offset
[`Duration::checked_add`]: https://doc.rust-lang.org/std/time/struct.Duration.html#method.checked_add
[`Duration::checked_div`]: https://doc.rust-lang.org/std/time/struct.Duration.html#method.checked_div
[`Duration::checked_mul`]: https://doc.rust-lang.org/std/time/struct.Duration.html#method.checked_mul
[`Duration::checked_sub`]: https://doc.rust-lang.org/std/time/struct.Duration.html#method.checked_sub
[`File::set_permissions`]: https://doc.rust-lang.org/std/fs/struct.File.html#method.set_permissions
[`IpAddr::is_ipv4`]: https://doc.rust-lang.org/std/net/enum.IpAddr.html#method.is_ipv4
[`IpAddr::is_ipv6`]: https://doc.rust-lang.org/std/net/enum.IpAddr.html#method.is_ipv6
[`Result::unwrap_or_default`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.unwrap_or_default
[`SocketAddr::is_ipv4`]: https://doc.rust-lang.org/std/net/enum.SocketAddr.html#method.is_ipv4
[`SocketAddr::is_ipv6`]: https://doc.rust-lang.org/std/net/enum.SocketAddr.html#method.is_ipv6
[`String::insert_str`]: https://doc.rust-lang.org/std/string/struct.String.html#method.insert_str
[`String::split_off`]: https://doc.rust-lang.org/std/string/struct.String.html#method.split_off
[`Vec::dedup_by_key`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.dedup_by_key
[`Vec::dedup_by`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.dedup_by
[`VecDeque::resize`]:  https://doc.rust-lang.org/std/collections/vec_deque/struct.VecDeque.html#method.resize
[`VecDeque::truncate`]: https://doc.rust-lang.org/std/collections/vec_deque/struct.VecDeque.html#method.truncate
[`str::repeat`]: https://doc.rust-lang.org/std/primitive.str.html#method.repeat
[`str::replacen`]: https://doc.rust-lang.org/std/primitive.str.html#method.replacen
[cargo/3296]: rust-lang/cargo#3296
[cargo/3301]: rust-lang/cargo#3301
[cargo/3443]: rust-lang/cargo#3443
[cargo/3511]: rust-lang/cargo#3511
[cargo/3515]: rust-lang/cargo#3515
[cargo/3534]: rust-lang/cargo#3534
[cargo/3546]: rust-lang/cargo#3546
[cargo/3557]: rust-lang/cargo#3557
[cargo/3604]: rust-lang/cargo#3604
[RFC 1623]: https://github.com/rust-lang/rfcs/blob/master/text/1623-static.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants