Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance regression from #89695 #89732

Closed
GuillaumeGomez opened this issue Oct 10, 2021 · 9 comments
Closed

Fix performance regression from #89695 #89732

GuillaumeGomez opened this issue Oct 10, 2021 · 9 comments
Labels
perf-regression Performance regression. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.

Comments

@GuillaumeGomez
Copy link
Member

In #89695, we extended the usage of tera. However, it impacted the doc build time badly, so we need to fix it.

cc @jsha

@GuillaumeGomez GuillaumeGomez added T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. perf-regression Performance regression. labels Oct 10, 2021
@jsha
Copy link
Contributor

jsha commented Oct 10, 2021

From https://perf.rust-lang.org/compare.html?start=6928fafe06e4ab29317f75194e1bf67c119dccdc&end=c1cb97481a633bdfdf3d6b57c6dcebfdfadbcfdf (linked by @rust-timer in #89695 (comment)), it looks like these are the most-affected doc builds:

Name & Profile Scenario #89661 #89695 % Change Significance Factor?
externs-doc full 4189786474 4344364681 3.69% 47.95x
stm32f4-doc full 56031165508 57091116514 1.89% 18.54x
style-servo-doc full 59526418329 60422392129 1.51% 20.36x
html5ever-doc full 5015123198 5043144386 0.56% 6.64x

How can I find the commands I should run to reproduce these builds? In particular I'd like to try and reproduce the externs-doc build since that's most affected. I'd also appreciate any pointers on differential profiling so I can narrow in on the differences.

@the8472
Copy link
Member

the8472 commented Oct 10, 2021

https://github.com/rust-lang/rustc-perf/tree/master/collector has instructions how to run them locally

@jsha
Copy link
Contributor

jsha commented Oct 11, 2021

Thanks! I've cloned that repo and got it working, but I can't find any mention of externs-doc in it. Where would I find details of how externs-doc is run?

@the8472
Copy link
Member

the8472 commented Oct 11, 2021

"externs-doc" means: benchmark running rustdoc on the externs crate. Select the crate with --include externs and the benchmark mode via --runs Doc

@jsha
Copy link
Contributor

jsha commented Oct 12, 2021

Thanks to tips in #87244 (comment), I've run a profile comparison with cachegrind:comparison.txt

Some highlights:

15,264,218 ( 8.70%)  ./malloc/malloc.c:_int_free                                                                                                                                                                   
 9,694,689 ( 5.53%)  ???:<regex::re_trait::Matches<R> as core::iter::traits::iterator::Iterator>::next                                                                                                             
 9,251,931 ( 5.28%)  ???:alloc::collections::btree::map::BTreeMap<K,V>::insert                                                                                                                                     
 9,026,812 ( 5.15%)  ./string/../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:__memcmp_avx2_movbe                                                                                                                  
 8,679,939 ( 4.95%)  ./malloc/malloc.c:malloc                                                                                                                                                                      
 7,931,216 ( 4.52%)  ???:tera::renderer::processor::Processor::render_node                                                                                                                                         
 6,247,456 ( 3.56%)  ./malloc/malloc.c:free                                                                                                                                                                        
 6,031,251 ( 3.44%)  ./malloc/malloc.c:_int_malloc                                                                                                                                                                 
 5,809,794 ( 3.31%)  ???:alloc::collections::btree::node::Handle<alloc::collections::btree::node::NodeRef<alloc::collections::btree::node::marker::Mut,K,V,alloc::collections::btree::node::marker::Leaf>,alloc::collections::btree::node::marker::Edge>::insert_recursing                                                                                                                                                            
 5,701,810 ( 3.25%)  ???:tera::renderer::call_stack::CallStack::lookup                                                                                                                                             
 4,808,736 ( 2.74%)  ???:tera::utils::escape_html                                                                                                                                                                  
 4,516,246 ( 2.57%)  ./string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms                                                                                                 
 4,387,412 ( 2.50%)  ???:<std::collections::hash::map::DefaultHasher as core::hash::Hasher>::write                                                                                                                 
 4,332,914 ( 2.47%)  ???:tera::renderer::stack_frame::StackFrame::find_value                                                                                                                                       
 4,180,122 ( 2.38%)  ???:tera::renderer::processor::Processor::eval_expression                                                                                                                                     
 4,096,170 ( 2.34%)  ???:std::io::Write::write_fmt                                                                                                                                                                 
 3,814,135 ( 2.17%)  ???:<serde_json::value::ser::SerializeMap as serde::ser::SerializeStruct>::serialize_field                                                                                                    
 3,628,023 ( 2.07%)  ???:hashbrown::map::make_hash                                                                                                                                                                 
 3,168,934 ( 1.81%)  /rustc///library/core/src/fmt/mod.rs:core::fmt::write  

Some interesting things that seems to point to are use of regex in Tera (used to process dotted field names, like component.path) and building of BTrees. BTree is used when building a tera::Context, which is the mapping from field names to field values. We do this on each render, with substantively the same tree structure. We might be able to speed things up by reusing the same Context and inserting different data into it.

There is also a lot of malloc and free going on. The Tera API inherently involves a lot of Strings, because it doesn't take ownership of its inputs, so it has to copy them into Strings (or other owned types).

@the8472
Copy link
Member

the8472 commented Oct 12, 2021

rustc is compiled with jemalloc in linux release builds, does that also apply to rustdoc? If so you have to manually enable that in your config toml to get comparable results.

@jsha
Copy link
Contributor

jsha commented Oct 13, 2021

Good tip, thanks! Here's what it looks like after setting jemalloc = true in config.toml and rebuilding rustdoc:

 9,694,689 ( 6.29%)  ???:<regex::re_trait::Matches<R> as core::iter::traits::iterator::Iterator>::next
 9,251,931 ( 6.00%)  ???:alloc::collections::btree::map::BTreeMap<K,V>::insert
 9,026,861 ( 5.86%)  ./string/../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:__memcmp_avx2_movbe
 7,931,216 ( 5.15%)  ???:tera::renderer::processor::Processor::render_node
 5,809,794 ( 3.77%)  ???:alloc::collections::btree::node::Handle<alloc::collections::btree::node::NodeRef<alloc::collections::btree::node::marker::Mut,K,V,alloc::collections::btree::node::marker::Leaf>,alloc::collections::btree::node::marker::Edge>::insert_recursing
 5,701,810 ( 3.70%)  ???:tera::renderer::call_stack::CallStack::lookup
 5,249,612 ( 3.41%)  ./string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms
 4,808,736 ( 3.12%)  ???:tera::utils::escape_html
 4,387,412 ( 2.85%)  ???:<std::collections::hash::map::DefaultHasher as core::hash::Hasher>::write
 4,332,914 ( 2.81%)  ???:tera::renderer::stack_frame::StackFrame::find_value
 4,180,122 ( 2.71%)  ???:tera::renderer::processor::Processor::eval_expression
 4,096,170 ( 2.66%)  ???:std::io::Write::write_fmt
 3,814,135 ( 2.47%)  ???:<serde_json::value::ser::SerializeMap as serde::ser::SerializeStruct>::serialize_field
 3,628,023 ( 2.35%)  ???:hashbrown::map::make_hash
 3,168,934 ( 2.06%)  /rustc///library/core/src/fmt/mod.rs:core::fmt::write
 2,958,538 ( 1.92%)  ???:alloc::str::<impl str>::replace
 2,835,875 ( 1.84%)  ???:alloc::collections::btree::map::entry::VacantEntry<K,V>::insert
 2,835,875 ( 1.84%)  ???:alloc::collections::btree::navigate::<impl alloc::collections::btree::node::Handle<alloc::collections::btree::node::NodeRef<alloc::collections::btree::node::marker::Dying,K,V,alloc::collections::btree::node::marker::Leaf>,alloc::collections::btree::node::marker::Edge>>::deallocating_next_unchecked
 2,811,791 ( 1.82%)  ???:<alloc::collections::btree::map::BTreeMap<K,V> as core::ops::drop::Drop>::drop

comparison2.txt

@the8472
Copy link
Member

the8472 commented Oct 13, 2021

BTree is used when building a tera::Context, which is the mapping from field names to field values. We do this on each render, with substantively the same tree structure.

Is the ordering essential to all use-cases? Maybe the option to use a hashmap instead of a tree map could speed it up.

@GuillaumeGomez
Copy link
Member Author

It was fixed in #92526 when we switched to askama.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

3 participants