Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catching panics is eating into Servo's DOM performance #34727

Closed
2 of 3 tasks
jdm opened this issue Jul 8, 2016 · 21 comments
Closed
2 of 3 tasks

Catching panics is eating into Servo's DOM performance #34727

jdm opened this issue Jul 8, 2016 · 21 comments
Labels
A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@jdm
Copy link
Contributor

jdm commented Jul 8, 2016

I have a profile showing that of 60% of execution time spent under a Rust callback from SpiderMonkey that corresponds to invoking a method on a JS object inside of a hot loop, fully half of that time is spent inside __rust_try, __rust_maybe_catch_panic, and std::panicking::PANIC_COUNT::__getit. Since we go Rust -> C++ -> JS -> C++ -> Rust when executing JS that invokes a DOM method, every loop iteration we call catch_unwind in the Rust callbacks invoked by C. This appears to be impacting our potential maximum performance significantly.

Issues to solve

@eefriedman
Copy link
Contributor

I assume you mean that std::panic::catch_unwind is slow, even if the called code doesn't panic? That's bad. Probably code which should be inlined isn't getting inlined for some reason.

CC @alexcrichton

@jdm
Copy link
Contributor Author

jdm commented Jul 9, 2016

Yes, that's correct.

@alexcrichton
Copy link
Member

Unfortunately there's not a lot we can do here to easily make this better. The catch_unwind function was not prioritized to be fast as it was assumed it's only crossed at isolation boundaries, which should be relatively rare in theory.

That's not to say it can never be fast, just that it's going to be difficult. The reasons you're seeing this today are:

  • Thread locals across crates are not easily inlined
  • With the design of panic runtimes, the call to __rust_maybe_catch_panic is unconditional and will never go away unless you LTO. This prevents some inlining.
  • The __rust_try function is generated by the compiler specially because it has a different personality function than all the rest of the Rust code in the world. As a result, LLVM cannot inline this because there are differing personality functions.

There are a few local optimizations possible, such as not using options here, but I've tried that out locally and it doesn't really buy you much.

LTO will buy you the most in terms of optimizations because thread local access can get inlined as well as __rust_maybe_catch_panic. At that point all that can't be inlined is __rust_try. We could try to finagle only using one personality routine so this could get inlined, but it may not be trivial.

That's at least my current thoughts for now, unfortunately that may not help much, but hopefully it's at least a little illuminating!

@alexcrichton
Copy link
Member

Some ideas for how to get rid of TLS:

We don't actually need to hit TLS when we run catch_unwind, the only thing it's doing is resetting the panic counter to 0. This is if you have something like this:

use std::panic::catch_unwind;
struct A;

let _a = A;
panic!();

impl Drop for A {
    fn drop(&mut self) {
        catch_unwind(|| panic!());
    }
}

That program doesn't abort. Now whether or not that program aborts with a double panic seems a bit iffy to me. This means that a catch_unwind changes thread::panicking to false which is arguably the wrong thing to do because the thread is still panicking. I'd be fine saying that something like this is akin to double panic, but others may feel differently.

Basically, what I'm getting at is that we could try to remove the hit to TLS in catch_unwind if possible. All we need to do then is to be able to detect when panicking if we're already panicking. This may be possible with some sort of stack unwinding (maybe throw a "pseudo-exception" to see if something catches it?) or something like that. Haven't thought this through all the way, but it should be fine for us to make panics basically as arbitrarily slow as we can.

Next, to enable all the inlining we need to get rid of our two personality functions and only have one. This should be possible as C++ does it all the time. The implementation of our personality function would then have to inspect the "language specific data area" I believe (the LSDA) and basically see if the function the personality is running for contains a catch. I believe this is encoded in DWARF somewhere and we just need to read it out, I do not understand the specifics of doing so however.

If we can get both of those done then we the only point where we can't inline is the __rust_maybe_catch_panic boundary, which seems fine as LTO easily erases it.

@arielb1
Copy link
Contributor

arielb1 commented Jul 9, 2016

Can't you compile servo with -C panic=abort?

@jdm
Copy link
Contributor Author

jdm commented Jul 9, 2016

That would remove a significant reason for Servo to be written in Rust; thread isolation from panics is one of our selling points.

bors-servo pushed a commit to servo/servo that referenced this issue Jul 10, 2016
Improve performance of HTMLDivElement constructor

These changes address two sources of performance loss seen while profiling in #12354. #12358 and rust-lang/rust#34727 are still the biggest offenders, however.

---
- [X] `./mach build -d` does not report any errors
- [X] `./mach test-tidy` does not report any errors
- [X] These changes do not require tests because we don't have performance tests and these are only optimizations

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/12374)
<!-- Reviewable:end -->
@brson brson added I-slow Issue: Problems and improvements with respect to performance of generated code. A-libs A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows labels Jul 11, 2016
@alexcrichton
Copy link
Member

alexcrichton commented Jul 12, 2016

I've updated the description of this issue to list the primary issues to solve to close out this issue. I figure we can leave this open though for discussion about perhaps different strategies for optimization.

@alexcrichton
Copy link
Member

With @vadimcn's and @cynicaldevil's awesome work, some secret sauce of mine I'll land after @cynicaldevil's PR, and LTO, we can optimize this to a noop:

std::panic::catch_unwind(|| {})

That I believe means we've given LLVM all the inlining opportunities we need, so this should be close to getting fixed. With @cynicaldevil's PR we may not need to tackle cross-crate thread-locals at all!

alexcrichton added a commit to alexcrichton/rust that referenced this issue Aug 11, 2016
The previous implementation of this function was overly conservative with
liberal usage of `Option` and `.unwrap()` which in theory never triggers. This
commit essentially removes the `Option`s in favor of unsafe implementations,
improving the code generation of the fast path for LLVM to see through what's
happening more clearly.

cc rust-lang#34727
Manishearth added a commit to Manishearth/rust that referenced this issue Aug 13, 2016
… r=brson

std: Optimize panic::catch_unwind slightly

The previous implementation of this function was overly conservative with
liberal usage of `Option` and `.unwrap()` which in theory never triggers. This
commit essentially removes the `Option`s in favor of unsafe implementations,
improving the code generation of the fast path for LLVM to see through what's
happening more clearly.

cc rust-lang#34727
Manishearth added a commit to Manishearth/rust that referenced this issue Aug 13, 2016
… r=brson

std: Optimize panic::catch_unwind slightly

The previous implementation of this function was overly conservative with
liberal usage of `Option` and `.unwrap()` which in theory never triggers. This
commit essentially removes the `Option`s in favor of unsafe implementations,
improving the code generation of the fast path for LLVM to see through what's
happening more clearly.

cc rust-lang#34727
eddyb added a commit to eddyb/rust that referenced this issue Aug 14, 2016
… r=brson

std: Optimize panic::catch_unwind slightly

The previous implementation of this function was overly conservative with
liberal usage of `Option` and `.unwrap()` which in theory never triggers. This
commit essentially removes the `Option`s in favor of unsafe implementations,
improving the code generation of the fast path for LLVM to see through what's
happening more clearly.

cc rust-lang#34727
eddyb added a commit to eddyb/rust that referenced this issue Aug 14, 2016
… r=brson

std: Optimize panic::catch_unwind slightly

The previous implementation of this function was overly conservative with
liberal usage of `Option` and `.unwrap()` which in theory never triggers. This
commit essentially removes the `Option`s in favor of unsafe implementations,
improving the code generation of the fast path for LLVM to see through what's
happening more clearly.

cc rust-lang#34727
@alexcrichton
Copy link
Member

alexcrichton commented Aug 16, 2016

Current status:

toolchain + flags time for one catch_unwind
stable, no LTO 8ns
stable, LTO 5ns
nightly, no LTO 3ns
nightly, LTO 0ns

That is, I think we're as fast as we're gonna get. Without LTO we can't inline the panic runtime itself (because it's swappable), and in this case we're about as fast as we're gonna get. With LTO, however, we can inline everything and optimize the catch_unwind away entirely.

@jdm when you get a chance could you try re-benchmarking? Is the performance acceptable now?

@Amanieu
Copy link
Member

Amanieu commented Aug 16, 2016

We could go one step further and move __rust_maybe_catch_panic into libstd and mark it with #[inline]. This would bring the non-LTO overhead to 0ns as well.

@alexcrichton
Copy link
Member

I don't think we can do that unfortunately due to how panic runtimes. work. The fact that switching the panic runtime is a compiler switch means that we have to have a strict ABI between the two, which means that #[inline] isn't possible.

@Amanieu
Copy link
Member

Amanieu commented Aug 16, 2016

What I mean is that we could use intrinsics::try for both panic runtimes. This will have practically zero cost in the case of panic_abort since the unwind path will never be called. However it will allow the try to be inlined entirely into the calling crate.

@alexcrichton
Copy link
Member

Perhaps! If we wanna change that ABI then I'd love to solve this issue to get solved even without LTO

@jdm
Copy link
Contributor Author

jdm commented Aug 22, 2016

Instruments.app results are looking much better. I still see 5% of time spent in tlv_get_addr underneath std::panicking::try_call, though, which is worrying. LTO is on by default for optimized build, right?

@TimNN
Copy link
Contributor

TimNN commented Aug 22, 2016

@jdm: from what http://doc.crates.io/manifest.html#the-profile-sections says, I don't think LTO is enabled in release mode by default.

@jdm
Copy link
Contributor Author

jdm commented Aug 22, 2016

With LTO enabled that tlv_get_addr is gone. Thanks for investigating and fixing this, @alexcrichton and @cynicaldevil!

@eddyb
Copy link
Member

eddyb commented Aug 22, 2016

@alexcrichton @jdm What's the tlv_get_addr call for?
Shouldn't the counter not be accessed at all in the non-panicking case now?

@alexcrichton
Copy link
Member

@jdm yeah as @eddyb mentioned we shouldn't be hitting that function at all any more unless something actually panics...

@jdm
Copy link
Contributor Author

jdm commented Aug 22, 2016

I've seen tlv_get_addr under call stacks that are Servo's code, rather than directly from try_call, so maybe it was misreported or something.

@steveklabnik steveklabnik added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed A-libs labels Mar 24, 2017
@Mark-Simulacrum Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 25, 2017
@alexcrichton
Copy link
Member

IIRC everything here was handled except for #25088, so closing in favor of #25088

gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this issue Oct 1, 2019
…or (from jdm:jsup); r=Manishearth

These changes address two sources of performance loss seen while profiling in #12354. #12358 and rust-lang/rust#34727 are still the biggest offenders, however.

---
- [X] `./mach build -d` does not report any errors
- [X] `./mach test-tidy` does not report any errors
- [X] These changes do not require tests because we don't have performance tests and these are only optimizations

Source-Repo: https://github.com/servo/servo
Source-Revision: 01ec8491d3200d6710336da1bb0f4e01b49dc4bc

UltraBlame original commit: 383b98548e4b848e35322e195dff638072fa2da0
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this issue Oct 1, 2019
…or (from jdm:jsup); r=Manishearth

These changes address two sources of performance loss seen while profiling in #12354. #12358 and rust-lang/rust#34727 are still the biggest offenders, however.

---
- [X] `./mach build -d` does not report any errors
- [X] `./mach test-tidy` does not report any errors
- [X] These changes do not require tests because we don't have performance tests and these are only optimizations

Source-Repo: https://github.com/servo/servo
Source-Revision: 01ec8491d3200d6710336da1bb0f4e01b49dc4bc

UltraBlame original commit: 383b98548e4b848e35322e195dff638072fa2da0
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this issue Oct 1, 2019
…or (from jdm:jsup); r=Manishearth

These changes address two sources of performance loss seen while profiling in #12354. #12358 and rust-lang/rust#34727 are still the biggest offenders, however.

---
- [X] `./mach build -d` does not report any errors
- [X] `./mach test-tidy` does not report any errors
- [X] These changes do not require tests because we don't have performance tests and these are only optimizations

Source-Repo: https://github.com/servo/servo
Source-Revision: 01ec8491d3200d6710336da1bb0f4e01b49dc4bc

UltraBlame original commit: 383b98548e4b848e35322e195dff638072fa2da0
@purplesyringa
Copy link
Contributor

What is the evidence that throwing a panic inside a cleanup pad is unsound on MSVC? I could not find this in MSVC docs and I'm wondering what this claim is based on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests