-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Catching panics is eating into Servo's DOM performance #34727
Comments
I assume you mean that |
Yes, that's correct. |
Unfortunately there's not a lot we can do here to easily make this better. The That's not to say it can never be fast, just that it's going to be difficult. The reasons you're seeing this today are:
There are a few local optimizations possible, such as not using options here, but I've tried that out locally and it doesn't really buy you much. LTO will buy you the most in terms of optimizations because thread local access can get inlined as well as That's at least my current thoughts for now, unfortunately that may not help much, but hopefully it's at least a little illuminating! |
Some ideas for how to get rid of TLS: We don't actually need to hit TLS when we run use std::panic::catch_unwind;
struct A;
let _a = A;
panic!();
impl Drop for A {
fn drop(&mut self) {
catch_unwind(|| panic!());
}
} That program doesn't abort. Now whether or not that program aborts with a double panic seems a bit iffy to me. This means that a Basically, what I'm getting at is that we could try to remove the hit to TLS in Next, to enable all the inlining we need to get rid of our two personality functions and only have one. This should be possible as C++ does it all the time. The implementation of our personality function would then have to inspect the "language specific data area" I believe (the LSDA) and basically see if the function the personality is running for contains a catch. I believe this is encoded in DWARF somewhere and we just need to read it out, I do not understand the specifics of doing so however. If we can get both of those done then we the only point where we can't inline is the |
Can't you compile servo with |
That would remove a significant reason for Servo to be written in Rust; thread isolation from panics is one of our selling points. |
Improve performance of HTMLDivElement constructor These changes address two sources of performance loss seen while profiling in #12354. #12358 and rust-lang/rust#34727 are still the biggest offenders, however. --- - [X] `./mach build -d` does not report any errors - [X] `./mach test-tidy` does not report any errors - [X] These changes do not require tests because we don't have performance tests and these are only optimizations <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/12374) <!-- Reviewable:end -->
I've updated the description of this issue to list the primary issues to solve to close out this issue. I figure we can leave this open though for discussion about perhaps different strategies for optimization. |
With @vadimcn's and @cynicaldevil's awesome work, some secret sauce of mine I'll land after @cynicaldevil's PR, and LTO, we can optimize this to a noop: std::panic::catch_unwind(|| {}) That I believe means we've given LLVM all the inlining opportunities we need, so this should be close to getting fixed. With @cynicaldevil's PR we may not need to tackle cross-crate thread-locals at all! |
The previous implementation of this function was overly conservative with liberal usage of `Option` and `.unwrap()` which in theory never triggers. This commit essentially removes the `Option`s in favor of unsafe implementations, improving the code generation of the fast path for LLVM to see through what's happening more clearly. cc rust-lang#34727
… r=brson std: Optimize panic::catch_unwind slightly The previous implementation of this function was overly conservative with liberal usage of `Option` and `.unwrap()` which in theory never triggers. This commit essentially removes the `Option`s in favor of unsafe implementations, improving the code generation of the fast path for LLVM to see through what's happening more clearly. cc rust-lang#34727
… r=brson std: Optimize panic::catch_unwind slightly The previous implementation of this function was overly conservative with liberal usage of `Option` and `.unwrap()` which in theory never triggers. This commit essentially removes the `Option`s in favor of unsafe implementations, improving the code generation of the fast path for LLVM to see through what's happening more clearly. cc rust-lang#34727
… r=brson std: Optimize panic::catch_unwind slightly The previous implementation of this function was overly conservative with liberal usage of `Option` and `.unwrap()` which in theory never triggers. This commit essentially removes the `Option`s in favor of unsafe implementations, improving the code generation of the fast path for LLVM to see through what's happening more clearly. cc rust-lang#34727
… r=brson std: Optimize panic::catch_unwind slightly The previous implementation of this function was overly conservative with liberal usage of `Option` and `.unwrap()` which in theory never triggers. This commit essentially removes the `Option`s in favor of unsafe implementations, improving the code generation of the fast path for LLVM to see through what's happening more clearly. cc rust-lang#34727
Current status:
That is, I think we're as fast as we're gonna get. Without LTO we can't inline the panic runtime itself (because it's swappable), and in this case we're about as fast as we're gonna get. With LTO, however, we can inline everything and optimize the @jdm when you get a chance could you try re-benchmarking? Is the performance acceptable now? |
We could go one step further and move |
I don't think we can do that unfortunately due to how panic runtimes. work. The fact that switching the panic runtime is a compiler switch means that we have to have a strict ABI between the two, which means that |
What I mean is that we could use |
Perhaps! If we wanna change that ABI then I'd love to solve this issue to get solved even without LTO |
Instruments.app results are looking much better. I still see 5% of time spent in tlv_get_addr underneath std::panicking::try_call, though, which is worrying. LTO is on by default for optimized build, right? |
@jdm: from what http://doc.crates.io/manifest.html#the-profile-sections says, I don't think LTO is enabled in release mode by default. |
With LTO enabled that tlv_get_addr is gone. Thanks for investigating and fixing this, @alexcrichton and @cynicaldevil! |
@alexcrichton @jdm What's the |
I've seen tlv_get_addr under call stacks that are Servo's code, rather than directly from try_call, so maybe it was misreported or something. |
…or (from jdm:jsup); r=Manishearth These changes address two sources of performance loss seen while profiling in #12354. #12358 and rust-lang/rust#34727 are still the biggest offenders, however. --- - [X] `./mach build -d` does not report any errors - [X] `./mach test-tidy` does not report any errors - [X] These changes do not require tests because we don't have performance tests and these are only optimizations Source-Repo: https://github.com/servo/servo Source-Revision: 01ec8491d3200d6710336da1bb0f4e01b49dc4bc UltraBlame original commit: 383b98548e4b848e35322e195dff638072fa2da0
…or (from jdm:jsup); r=Manishearth These changes address two sources of performance loss seen while profiling in #12354. #12358 and rust-lang/rust#34727 are still the biggest offenders, however. --- - [X] `./mach build -d` does not report any errors - [X] `./mach test-tidy` does not report any errors - [X] These changes do not require tests because we don't have performance tests and these are only optimizations Source-Repo: https://github.com/servo/servo Source-Revision: 01ec8491d3200d6710336da1bb0f4e01b49dc4bc UltraBlame original commit: 383b98548e4b848e35322e195dff638072fa2da0
…or (from jdm:jsup); r=Manishearth These changes address two sources of performance loss seen while profiling in #12354. #12358 and rust-lang/rust#34727 are still the biggest offenders, however. --- - [X] `./mach build -d` does not report any errors - [X] `./mach test-tidy` does not report any errors - [X] These changes do not require tests because we don't have performance tests and these are only optimizations Source-Repo: https://github.com/servo/servo Source-Revision: 01ec8491d3200d6710336da1bb0f4e01b49dc4bc UltraBlame original commit: 383b98548e4b848e35322e195dff638072fa2da0
What is the evidence that throwing a panic inside a cleanup pad is unsound on MSVC? I could not find this in MSVC docs and I'm wondering what this claim is based on. |
I have a profile showing that of 60% of execution time spent under a Rust callback from SpiderMonkey that corresponds to invoking a method on a JS object inside of a hot loop, fully half of that time is spent inside __rust_try, __rust_maybe_catch_panic, and std::panicking::PANIC_COUNT::__getit. Since we go Rust -> C++ -> JS -> C++ -> Rust when executing JS that invokes a DOM method, every loop iteration we call catch_unwind in the Rust callbacks invoked by C. This appears to be impacting our potential maximum performance significantly.
Issues to solve
catch_unwind
#34787The text was updated successfully, but these errors were encountered: