Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase Span from 4 bytes to 8 bytes. #59693

Merged
merged 1 commit into from
Apr 14, 2019
Merged

Conversation

nnethercote
Copy link
Contributor

This increases the size of some important types, such as ast::Expr and
mir::Statement. However, it drastically reduces how much the interner
is used, and the fields are more natural sizes that don't require bit
operations to extract.

As a result, instruction counts drop across a range of workloads, by as
much as 10% for script-servo incremental builds.

Peak memory usage goes up a little for some cases, but down by more for
some other cases -- as much as 18% for non-incremental builds of
packed-simd.

The commit also:

  • removes the repr(packed), because it has negligible effect, but can
    cause undefined behaviour;
  • replaces explicit impls of common traits (Copy, PartialEq, etc.)
    with derived ones.

r? @petrochenkov

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 4, 2019
@nnethercote
Copy link
Contributor Author

nnethercote commented Apr 4, 2019

I was pleasantly surprised how large the improvements are. Some instruction counts results for Check builds:

script-servo-check
        avg: -7.3%      min: -10.4%     max: -4.6%
inflate-check
        avg: -0.1%      min: -5.4%      max: 1.2%
crates.io-check
        avg: -2.4%      min: -4.5%      max: -1.1%
packed-simd-check
        avg: -1.3%      min: -4.3%      max: -0.1%
serde-check
        avg: -1.0%      min: -3.6%      max: -0.0%
syn-check
        avg: -1.9%      min: -3.4%      max: -1.0%
helloworld-check
        avg: -2.8%      min: -2.9%      max: -2.6%
unify-linearly-check
        avg: -1.9%      min: -2.8%      max: -1.3%
deeply-nested-check
        avg: -1.7%      min: -2.8%      max: -1.4%
tokio-webpush-simple-check
        avg: -1.3%      min: -2.6%      max: -0.7%
sentry-cli-check
        avg: -1.3%      min: -2.5%      max: -0.6%
issue-46449-check
        avg: -1.8%      min: -2.4%      max: -1.5%
hyper-check
        avg: -1.3%      min: -2.3%      max: -0.7%
keccak-check
        avg: 0.7%       min: -1.8%      max: 1.6%
encoding-check
        avg: -1.0%      min: -1.7%      max: -0.5%
regression-31157-check
        avg: -0.8%      min: -1.6%      max: -0.2%
unicode_normalization-check
        avg: 0.9%       min: 0.1%       max: 1.2%
style-servo-check
        avg: -0.4%      min: -1.1%      max: -0.2%

Here is part of a Cachegrind diff showing the changes for a Check-CleanIncr build of crates.io, a workload showing one of the biggest improvements:

--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
 -438,273,987 (100.0%)  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir                      file:function
--------------------------------------------------------------------------------
 -111,549,470 (25.45%)  /build/glibc-B9XfQf/glibc-2.28/elf/../elf/dl-tls.c:_dl_update_slotinfo
  -50,377,180 (11.49%)  /build/glibc-B9XfQf/glibc-2.28/elf/../elf/dl-tls.c:update_get_addr
  -39,659,873 ( 9.05%)  /build/glibc-B9XfQf/glibc-2.28/elf/../sysdeps/x86_64/tls_get_addr.S:__tls_get_addr
  -38,864,570 ( 8.87%)  /home/njn/moz/rustN/src/libstd/collections/hash/map.rs:syntax_pos::span_encoding::SpanInterner::intern
  -33,838,278 ( 7.72%)  /home/njn/moz/rustN/src/libstd/collections/hash/table.rs:syntax_pos::span_encoding::SpanInterner::intern
  -18,501,545 ( 4.22%)  /home/njn/moz/rustN/src/libcore/num/mod.rs:syntax_pos::span_encoding::SpanInterner::intern
  -18,263,306 ( 4.17%)  /home/njn/moz/rustN/src/libsyntax_pos/span_encoding.rs:syntax_pos::span_encoding::SpanInterner::intern
  -17,991,850 ( 4.11%)  /build/glibc-B9XfQf/glibc-2.28/elf/../sysdeps/x86_64/dl-tls.c:__tls_get_addr_slow
  -15,465,924 ( 3.53%)  /home/njn/moz/rustN/<::std::thread::local::__thread_local_inner macros>:syntax_pos::GLOBALS::FOO::__getit
  -15,317,482 ( 3.49%)  /home/njn/moz/rustN/src/libcore/intrinsics.rs:syntax_pos::span_encoding::SpanInterner::intern
  -14,302,079 ( 3.26%)  /home/njn/moz/rustN/src/libcore/ptr.rs:syntax_pos::span_encoding::SpanInterner::intern
  -11,684,952 ( 2.67%)  /home/njn/moz/rustN/src/libstd/thread/local.rs:syntax_pos::span_encoding::Span::data
   10,634,440 (-2.43%)  /build/glibc-B9XfQf/glibc-2.28/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms
  -10,149,203 ( 2.32%)  /home/njn/moz/rustN/src/libstd/collections/hash/table.rs:std::collections::hash::map::HashMap<K,V,S>::try_resize
    9,268,103 (-2.11%)  /home/njn/moz/rustN/src/libsyntax_pos/span_encoding.rs:syntax_pos::span_encoding::Span::new
   -9,263,604 ( 2.11%)  /home/njn/moz/rustN/src/libsyntax_pos/symbol.rs:syntax_pos::symbol::Symbol::as_u32
   -8,866,856 ( 2.02%)  /home/njn/moz/rustN/src/libstd/collections/hash/map.rs:std::collections::hash::map::HashMap<K,V,S>::try_resize
   -8,355,096 ( 1.91%)  /build/glibc-B9XfQf/glibc-2.28/string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:__memset_avx2_erms
   -7,787,340 ( 1.78%)  /home/njn/moz/rustN/src/libcore/cell.rs:syntax_pos::span_encoding::Span::data
   -7,717,192 ( 1.76%)  /home/njn/moz/rustN/src/libsyntax_pos/span_encoding.rs:syntax_pos::span_encoding::Span::data
    6,211,410 (-1.42%)  /home/njn/moz/rustN/src/libsyntax_pos/symbol.rs:syntax_pos::symbol::Ident::modern
    5,992,135 (-1.37%)  /home/njn/moz/rustN/src/libsyntax/ext/tt/macro_parser.rs:syntax::ext::tt::macro_parser::parse
   -5,842,476 ( 1.33%)  /home/njn/moz/rustN/src/libcore/slice/mod.rs:syntax_pos::span_encoding::Span::data
   -5,042,332 ( 1.15%)  /home/njn/moz/rustN/src/libcore/ptr.rs:std::collections::hash::map::HashMap<K,V,S>::try_resize
   -4,837,572 ( 1.10%)  /home/njn/moz/rustN/src/libcore/cell.rs:syntax_pos::span_encoding::Span::new
   -4,641,315 ( 1.06%)  /home/njn/moz/rustN/src/libstd/thread/local.rs:syntax_pos::span_encoding::Span::new

Mostly TLS stuff and interning stuff, which makes sense.

@nnethercote
Copy link
Contributor Author

@bors try

@bors
Copy link
Contributor

bors commented Apr 4, 2019

⌛ Trying commit 7f2ca81 with merge cfa7765...

bors added a commit that referenced this pull request Apr 4, 2019
Increase `Span` from 4 bytes to 8 bytes.

This increases the size of some important types, such as `ast::Expr` and
`mir::Statement`. However, it drastically reduces how much the interner
is used, and the fields are more natural sizes that don't require bit
operations to extract.

As a result, instruction counts drop across a range of workloads, by as
much as 10% for `script-servo` incremental builds.

Peak memory usage goes up a little for some cases, but down by more for
some other cases -- as much as 18% for non-incremental builds of
`packed-simd`.

The commit also:
- removes the `repr(packed)`, because it has negligible effect, but can
  cause undefined behaviour;
- replaces explicit impls of common traits (`Copy`, `PartialEq`, etc.)
  with derived ones.

r? @petrochenkov
@nnethercote
Copy link
Contributor Author

It's interesting that TLS is so expensive. Makes me wonder if other TLS data (e.g. symbols) could be done some other way.

@nnethercote
Copy link
Contributor Author

Actually, I don't understand scoped_thread_local!... why is it needed around a global variable like GLOBALS?

src/libsyntax_pos/span_encoding.rs Outdated Show resolved Hide resolved
src/libsyntax_pos/span_encoding.rs Outdated Show resolved Hide resolved
src/libsyntax_pos/span_encoding.rs Outdated Show resolved Hide resolved
src/libsyntax_pos/span_encoding.rs Outdated Show resolved Hide resolved
src/libsyntax_pos/span_encoding.rs Outdated Show resolved Hide resolved
src/libsyntax_pos/span_encoding.rs Outdated Show resolved Hide resolved
@petrochenkov
Copy link
Contributor

Previous discussion for reference - #58458.

@petrochenkov
Copy link
Contributor

@nnethercote

Actually, I don't understand scoped_thread_local!... why is it needed around a global variable like GLOBALS?

That's entirely @Zoxc's creation, I don't understand our threading setup.
AFAIR, GLOBALS is not entirely global.

@petrochenkov
Copy link
Contributor

@nnethercote
Any specific motivation for donating the tag bit from len rather than from ctxt?

@Zoxc
Copy link
Contributor

Zoxc commented Apr 4, 2019

GLOBALS is just a thread-local reference (think &Globals) to the real globals.

Is the interned case uncommon enough to throw likely! and cold_path at this?

@bors
Copy link
Contributor

bors commented Apr 4, 2019

☀️ Try build successful - checks-travis
Build commit: cfa7765

@nnethercote
Copy link
Contributor Author

Any specific motivation for donating the tag bit from len rather than from ctxt?

Yes! It's very rare for len to hit 16 bits in any code. But ctxt depends on the crate size, so in very large crates a decent number of ctxts could require 16 bits. In fact, ideally I'd probably use something like 11 bits for len and 20 for ctxt, but those sizes aren't as natural.

I'll add a comment about this.

@nnethercote
Copy link
Contributor Author

Is the interned case uncommon enough to throw likely! and cold_path at this?

It's fine by me, though I thought they were generally frowned upon in Rust code...

@nnethercote
Copy link
Contributor Author

@rust-timer build cfa7765

@rust-timer
Copy link
Collaborator

Success: Queued cfa7765 with parent f717b58, comparison URL.

@petrochenkov
Copy link
Contributor

petrochenkov commented Apr 4, 2019

@nnethercote

But ctxt depends on the crate size, so in very large crates a decent number of ctxts could require 16 bits.

That's what I expected, ctxt behaves in a similar way to base and can just grow and grow, and that's different from len.
When I measured all ctxts in rustc/std fit into 14 bits, but lens sometimes required 24 bits, but larger crates can turn the picture given the nature of ctxt growth, so it's pretty reasonable to give it more bits.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit cfa7765

@nnethercote
Copy link
Contributor Author

The CI results are even better than what I saw locally, yay.

This increases the size of some important types, such as `ast::Expr` and
`mir::Statement`. However, it drastically reduces how much the interner
is used, and the fields are more natural sizes that don't require bit
operations to extract.

As a result, instruction counts drop across a range of workloads, by as
much as 12% for incremental "check" builds of `script-servo`.

Peak memory usage goes up a little for some cases, but down by more for
some other cases -- as much as 18% for non-incremental builds of
`packed-simd`.

The commit also:
- removes the `repr(packed)`, because it has negligible effect, but can
  cause undefined behaviour;
- replaces explicit impls of common traits (`Copy`, `PartialEq`, etc.)
  with derived ones.
@nnethercote
Copy link
Contributor Author

@petrochenkov: I have updated the comments as per the suggestions above.

And thanks for suggesting 64-bits in the first place! :)

@petrochenkov
Copy link
Contributor

petrochenkov commented Apr 5, 2019

The previous measurements (#44646 (comment)) for 64 bits looked very different (that's why 32 was selected in the first place).

I wonder what changed since Sep 2017.
I see that the span interner used thread_local instead of scoped_thread_local at least, and was a standalone static rather than a part of GLOBALS.
Or maybe spans are used (decoded) more often since Idents got them, not sure.

(I'll review the code this evening.)

@petrochenkov
Copy link
Contributor

Code LGTM, but I'd want to run #59749 through perf before merging this.

@petrochenkov petrochenkov added S-blocked Status: Blocked on something else such as an RFC or other implementation work. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 6, 2019
@petrochenkov
Copy link
Contributor

I see, then I'm not sure why it's per-session and not global.

@Zoxc
Copy link
Contributor

Zoxc commented Apr 12, 2019

There can be multiple rustc sessions in a process. RLS and rustdoc does that.

@petrochenkov
Copy link
Contributor

petrochenkov commented Apr 12, 2019

I mean, why can't multiple sessions share a single interner?
It doesn't need any session-specific configuration and can live for the lifetime of the process.

@Mark-Simulacrum
Copy link
Member

@bors rollup

Centril added a commit to Centril/rust that referenced this pull request Apr 12, 2019
…enkov

Increase `Span` from 4 bytes to 8 bytes.

This increases the size of some important types, such as `ast::Expr` and
`mir::Statement`. However, it drastically reduces how much the interner
is used, and the fields are more natural sizes that don't require bit
operations to extract.

As a result, instruction counts drop across a range of workloads, by as
much as 10% for `script-servo` incremental builds.

Peak memory usage goes up a little for some cases, but down by more for
some other cases -- as much as 18% for non-incremental builds of
`packed-simd`.

The commit also:
- removes the `repr(packed)`, because it has negligible effect, but can
  cause undefined behaviour;
- replaces explicit impls of common traits (`Copy`, `PartialEq`, etc.)
  with derived ones.

r? @petrochenkov
@nnethercote
Copy link
Contributor Author

It's too late now, but for next time: PRs with significant perf effects are better landed on their own, rather than in a rollup, to keep the perf data clear. For example, if some other PR in the rollup introduces a perf regression, that would be masked by the improvement in this PR.

@petrochenkov
Copy link
Contributor

@bors rollup-
(In case the rollup fails.)

@Centril
Copy link
Contributor

Centril commented Apr 13, 2019

@bors p=1

bors added a commit that referenced this pull request Apr 14, 2019
Increase `Span` from 4 bytes to 8 bytes.

This increases the size of some important types, such as `ast::Expr` and
`mir::Statement`. However, it drastically reduces how much the interner
is used, and the fields are more natural sizes that don't require bit
operations to extract.

As a result, instruction counts drop across a range of workloads, by as
much as 10% for `script-servo` incremental builds.

Peak memory usage goes up a little for some cases, but down by more for
some other cases -- as much as 18% for non-incremental builds of
`packed-simd`.

The commit also:
- removes the `repr(packed)`, because it has negligible effect, but can
  cause undefined behaviour;
- replaces explicit impls of common traits (`Copy`, `PartialEq`, etc.)
  with derived ones.

r? @petrochenkov
@bors
Copy link
Contributor

bors commented Apr 14, 2019

⌛ Testing commit fd7f605 with merge c7aa8af...

@bors
Copy link
Contributor

bors commented Apr 14, 2019

💔 Test failed - checks-travis

@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-aux of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[01:57:32]   Downloaded thread-id v3.0.0
[01:57:32]   Downloaded adler32 v0.3.0
[01:57:33]   Downloaded expat-sys v2.1.4
[01:57:40]   Downloaded term_size v0.3.0
[01:57:59] warning: spurious network error (2 tries remaining): failed to get 200 response from `https://static.crates.io/crates/plane-split/plane-split-0.6.0.crate`, got 504
[01:57:59] warning: spurious network error (1 tries remaining): failed to get 200 response from `https://static.crates.io/crates/plane-split/plane-split-0.6.0.crate`, got 504
[01:57:59] warning: spurious network error (2 tries remaining): failed to get 200 response from `https://static.crates.io/crates/ron/ron-0.1.3.crate`, got 504
[01:58:00] error: failed to download from `https://crates.io/api/v1/crates/plane-split/0.6.0/download`
[01:58:00] Caused by:
[01:58:00]   failed to get 200 response from `https://static.crates.io/crates/plane-split/plane-split-0.6.0.crate`, got 504
[01:58:00] thread 'main' panicked at 'tests failed for https://github.com/servo/webrender', src/tools/cargotest/main.rs:88:9
[01:58:00] note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
[01:58:00] note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
[01:58:00] 
[01:58:00] 
[01:58:00] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/cargotest" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "/checkout/obj/build/ct"
[01:58:00] expected success, got: exit code: 101
[01:58:00] 
[01:58:00] 
[01:58:00] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/test/run-pass/pretty src/test/run-fail/pretty src/test/run-pass-valgrind/pretty src/test/run-pass-fulldeps/pretty src/tools/cargo src/tools/cargotest
[01:58:00] Build completed unsuccessfully in 0:33:46
[01:58:00] make: *** [check-aux] Error 1
[01:58:00] Makefile:50: recipe for target 'check-aux' failed
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:070e52e2
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Sun Apr 14 05:57:48 UTC 2019
---
travis_time:end:07a8627f:start=1555221471783419688,finish=1555221471793654426,duration=10234738
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:00df7b86
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:start:crashlog
obj/cores/core.8939.!checkout!obj!build!x86_64-unknown-linux-gnu!stage2!bin!rustc
[New LWP 8942]
[New LWP 8939]
warning: Could not load shared library symbols for 8 libraries, e.g. /lib/x86_64-linux-gnu/libc.so.6.
Use the "info sharedlibrary" command to see the complete listing.
Do you need "set solib-search-path" or "set sysroot"?
Core was generated by `rustc --crate-name foo src/lib.rs --color never --crate-type lib --emit=dep-inf'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fe0a15ca428 in ?? ()
[Current thread is 1 (LWP 8942)]
#0  0x00007fe0a15ca428 in ?? ()
#1  0x00007fe0a15cc02a in ?? ()
#2  0x0000000000000020 in ?? ()
#3  0x0000000000000000 in ?? ()
travis_time:end:00df7b86:start=1555221471799456023,finish=1555221473931422896,duration=2131966873
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:1be93acc
travis_time:start:1be93acc
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:202cae64
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Apr 14, 2019
@nnethercote
Copy link
Contributor Author

"warning: spurious network error"

@bors retry

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 14, 2019
@bors
Copy link
Contributor

bors commented Apr 14, 2019

⌛ Testing commit fd7f605 with merge 60076bb...

bors added a commit that referenced this pull request Apr 14, 2019
Increase `Span` from 4 bytes to 8 bytes.

This increases the size of some important types, such as `ast::Expr` and
`mir::Statement`. However, it drastically reduces how much the interner
is used, and the fields are more natural sizes that don't require bit
operations to extract.

As a result, instruction counts drop across a range of workloads, by as
much as 10% for `script-servo` incremental builds.

Peak memory usage goes up a little for some cases, but down by more for
some other cases -- as much as 18% for non-incremental builds of
`packed-simd`.

The commit also:
- removes the `repr(packed)`, because it has negligible effect, but can
  cause undefined behaviour;
- replaces explicit impls of common traits (`Copy`, `PartialEq`, etc.)
  with derived ones.

r? @petrochenkov
@bors
Copy link
Contributor

bors commented Apr 14, 2019

☀️ Test successful - checks-travis, status-appveyor
Approved by: petrochenkov
Pushing 60076bb to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 14, 2019
@bors bors merged commit fd7f605 into rust-lang:master Apr 14, 2019
@nnethercote nnethercote deleted the 64-bit-Spans branch April 14, 2019 23:50
@eddyb
Copy link
Member

eddyb commented Apr 16, 2019

Btw, there may be a way to fit SpanData value in 8 bytes: #53560.
(and maybe go back to a 4 bytes compression for Span, if we want to at all)

The reason is that we could allocate new "files" (or maybe even just one chunk, since it should often suffice) for the version with a SyntaxContext (we wouldn't duplicate any of the original file contents or precomputed data, just point to it).

And so two values (file + SyntaxContext) would be combined into one index space, split into chunks, and SpanData would just be Range<ChunkAndOffset>, where ChunkAndOffset can easily fit in u32 IMO.

kdy1 added a commit to kdy1/swc that referenced this pull request Nov 29, 2019
kdy1 added a commit to swc-project/swc that referenced this pull request Nov 29, 2019
swc_common:
 - apply patch from rust-lang/rust#59693

swc:
 - use &Options instead of Options
 - configures commons::CM
 - exposes `handler`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants