-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance audit, Spring 2017 #41410
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
@arielb1, |
I'm doing measurements locally, but after this is done it'll show up on perf.rust-lang.org (or the mirror https://perf-rlo.herokuapp.com). |
This improves LLVM performance by 10% lost during the shimmir transition.
this improves typeck & trans performance by 1%. This looked hotter on callgrind than it is on a CPU.
ce14cf8
to
c8fe505
Compare
WOW! |
c8fe505
to
71d3270
Compare
(self.tcx().mk_region(ty::ReStatic), | ||
self.tcx().mk_region(ty::ReStatic)) | ||
(self.tcx().types.re_static, | ||
self.tcx().types.re_static) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that we don't have these constants pre-interned. This is going to conflict like hell with one of my in-progress branches, but oh well. =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r=me on the stuff so far
@@ -573,7 +589,7 @@ pub fn shift_regions<'a, 'gcx, 'tcx, T>(tcx: TyCtxt<'a, 'gcx, 'tcx>, | |||
value, amount); | |||
|
|||
value.fold_with(&mut RegionFolder::new(tcx, &mut false, &mut |region, _current_depth| { | |||
tcx.mk_region(shift_region(*region, amount)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow. By-value shift_region
may have been used by elision or something. Can we just kill it?
src/librustc/ty/layout.rs
Outdated
@@ -1942,6 +1940,6 @@ impl<'a, 'tcx> TyLayout<'tcx> { | |||
} | |||
|
|||
pub fn field<C: LayoutTyper<'tcx>>(&self, cx: C, i: usize) -> C::TyLayout { | |||
cx.layout_of(self.field_type(cx, i)) | |||
cx.layout_of(cx.normalize_associated_type(self.field_type(cx, i))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to do it here, when layout_of
does it at the start?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now it doesn't. types in trans are always normalized.
if let Some(&layout) = self.tcx().layout_cache.borrow().get(&ty) { | ||
return TyLayout { ty: ty, layout: layout, variant_index: None }; | ||
} | ||
|
||
self.tcx().infer_ctxt((), traits::Reveal::All).enter(|infcx| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is creating the infer_ctxt
costly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you (trans-)normalize before checking the cache above? Would that solve the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still like to get rid of the normalize_associated_type
method - can't field_of
rely on layout_of
normalizing before checking the cache? if !ty.has_projection_types() {
is really fast, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except when there are projection types (nested binders) etc. This method is hot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is has_projection_types
anything other than a flag check? I'm not sure I understand what's going on. Can the cache be hit with the unnormalized type if has_projection_types
returns true?
ref item => bug!("trait_impl_polarity: {:?} not an impl", item) | ||
} | ||
} else { | ||
self.sess.cstore.impl_polarity(id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove the CrateStore
method? They keep piling up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this commit does what it's supposed to.
@@ -438,6 +437,38 @@ impl Rc<str> { | |||
} | |||
} | |||
|
|||
impl<T> Rc<[T]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, for the purposes of the compiler, does Rc<[T]>
provide a measurable improvement over Rc<Vec<T>>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may also be more easily implementable by consuming Vec<T>
as you've got to copy data anyway. With Vec<T>
the box_free
also doesn't need to be exposed as you can just .set_len(0)
to drop all the elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't bother checking. But Rc<Vec<T>>
is too ugly to me.
82092ca
to
c357feb
Compare
I think this is enough for one PR. |
That method is *incredibly* hot, so this ends up saving 10% of trans time. BTW, we really should be doing dependency tracking there - and possibly be taking the respective perf hit (got to find a way to make DTMs fast), but `layout_cache` is a non-dep-tracking map.
c357feb
to
f964da5
Compare
@bors r=nikomatsakis,eddyb |
📌 Commit f964da5 has been approved by |
…nikomatsakis,eddyb Performance audit, Spring 2017 Fix up some quite important performance "surprises" I've found running callgrind on rustc. This really should land in 1.18.
@bors r- Build failed. |
#[inline] | ||
unsafe fn box_free<T: ?Sized>(ptr: *mut T) { | ||
pub(crate) unsafe fn box_free<T: ?Sized>(ptr: *mut T) { | ||
let size = size_of_val(&*ptr); | ||
let align = min_align_of_val(&*ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two functions are not imported during testing (so this fails to compile then).
this avoids parsing item attributes on each call to `item_attrs`, which takes off 33% (!) of translation time and 50% (!) of trans-item collection time.
improves trans performance by *another* 10%.
this improves trans performance by *another* 10%.
this is another one of these things that looks *much* worse on valgrind.
f964da5
to
dae49f1
Compare
@bors r=eddyb |
📌 Commit dae49f1 has been approved by |
…eddyb Performance audit, Spring 2017 Fix up some quite important performance "surprises" I've found running callgrind on rustc. This really should land in 1.18.
9cab5fd
to
461bee5
Compare
So now I solved the "specialization caching" problem for real. |
In some cases (e.g. <[int-var] as Add<[int-var]>>), selection can turn up a large number of candidates. Bailing out early avoids O(n^2) performance. This improves item-type checking time by quite a bit, resulting in ~2% of total time-to-typeck.
461bee5
to
1b207ca
Compare
@bors r=nikomatsakis,eddyb |
📌 Commit 1b207ca has been approved by |
🔒 Merge conflict |
☔ The latest upstream changes (presumably #41464) made this pull request unmergeable. Please resolve the merge conflicts. |
Moving the branch over to arielb1/rust. |
Fix up some quite important performance "surprises" I've found running callgrind on rustc.
This really should land in 1.18.