-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A pretty big performance hit after a recent change (PR #51361) #52849
Comments
Are these CTFE-heavy crates? Still it's somewhat strange as only the final result of CTFE is supposed to be checked, not every intermediate value. (That's something I wanted to add, actually, but maybe that should be a miri-only feature.) I opened #52291 because I wasn't sure we are stressing CTFE enough in perf, but maybe we are? |
static BIG_ARRAY_OF_TUPLES: [(i32, (f64, f64, f64)); 0xffff] = [
(641161, (2.988, 10.544, 29.598)),
...
(209150, (5.216, 15.229, 23.389)),
];
fn main() {
assert!(BIG_ARRAY_OF_TUPLES.len() == 0xffff);
}
pub static MAPPING: [&'static str; 0xffff] = [
"\u{0}",
...
"",
];
fn main() { } |
Yea. A regression there is pretty much expected. But we can do better. Right now we'd keep reopening the allocation at least 2^20 times in the tuple-stress case. We should move a bunch of methods from Memory to Allocation and just keep working the Allocation. Candidates are the methods reading integers, pointers or primvals in general |
That's just |
|
Ah, so it's just a big constant. |
This looks like the same regression: The following code regressed from the previous nightly (2018-07-24) to today's (2018-07-30): #![feature(const_fn, const_let)]
// Try to make CTFE actually do a lot of computation, without producing a big result.
// And without support for loops.
macro_rules! const_repeat {
// Recursive case: Take a 16
([16 $($n: tt)*] $e: expr, $T: ty) => {{
const fn e() -> $T { const_repeat!([$($n)*] $e, $T) }
e(); e(); e(); e();
e(); e(); e(); e();
e(); e(); e(); e();
e(); e(); e(); e()
}};
// Recursive case: Take a 2
([2 $($n: tt)*] $e: expr, $T: ty) => {{
const fn e() -> $T { const_repeat!([$($n)*] $e, $T) }
e(); e()
}};
// Base case
([] $e: expr, $T: ty) => {$e};
// 16^4 * 2^2 = 2^(4*4+2) = 2^18 = 256k repetitions
($e: expr, $T: ty) => (const_repeat!([16 16 16 16 2 2] $e, $T));
}
macro_rules! expensive_static {
($name: ident : $T: ty = $e : expr) =>
(pub static $name : $T = const_repeat!($e, $T);)
}
pub trait Trait: Sync {}
impl Trait for u32 {}
const fn nop<T>(t: T) -> T { t }
const fn inc(i: i32) -> i32 { i + 1 }
expensive_static!(RELOCATIONS : &'static str = "hello");
expensive_static!(FIELDS: &'static i32 = &("bar", 42, "foo", 3.14).1);
expensive_static!(CHECKED_INDEX: u8 = b"foomp"[3]);
expensive_static!(UNSIZING: &'static [u8] = b"foo");
expensive_static!(UNSIZE_TRAIT: &'static Trait = &42u32);
expensive_static!(CHAIN: usize = 42i32 as u8 as u64 as i8 as isize as usize);
expensive_static!(OPS: i32 = ((((10 >> 1) + 3) * 7) / 2 - 12) << 4);
expensive_static!(FORCE_ALLOC: i32 = *****(&&&&&5));
expensive_static!(CONST_FN_BASELINE: i32 = nop(42));
expensive_static!(CONST_FN_SIMPLE: i32 = inc(42)); Old:13s With 2^20 runs, it doesn't even seem to terminate at all now (5min and still going, likely in That's definitely fishy, because the final constant (that is supposed to be sanity checked only) is pretty small. |
Is it possible that it is now doing something for every allocation, even those that are not reachable from the final constant? That would explain why |
Interestingly, the "it diverges" part of the regression goes away when I change the macro to just repeat Function calls seem to have regressed a lot. Also, according to perf, all that time it diverges is spent hashing. Might be related to #51702? However, the "good" nightly 2018-07-24 already contains that PR as well. |
Presumably the "Sanity-check all constants" commit is the one responsible, given that |
html5ever also took a smaller (but still substantial) hit:
|
Here's the top part of a Cachegrind diff, showing where the increase came from. The numbers are instruction counts. Lots of stuff in librustc_mir.
|
I verified (using cargo-bisect-rustc) that the regression I reported above is also caused by the same PR:
|
At least for my benchmark, the regression is entirely caused by https://github.com/rust-lang/rust/pull/51361/files#diff-1651ba5b47f9ffa648d59265d3f94becR1671: This leads to statics being CTFE'd at all in |
For example, with generic statics (which promoteds can create) or generic consts (which are proposed, AFAIK), this check couldn't do anything. There, we'd have to check on instantiation, right? |
visited for triage. P-high. Haven't decided whom if anyone to assign this to; hopefully that will be addressed soonish. |
I'm mostly finished by the way, the only missing piece is to write a convenience macro to avoid code duplication, but I've been lacking the time to work on it. Hopefully by the end of the week I'll have it ready for review. |
According to @oli-obk, @RalfJung has turned this from a perf regression into a perf win. Way to go! |
For anyone interested, the relevant PR is #53424 |
Uh, I think it is still slower than without the sanity check... but that is expected. However, there is also a known actual issue still open: #52849 (comment). I am going to reopen this issue until we got that solved. I think we should stop hashing memory, also see #52626 (comment). |
PR for fixing this in beta is #53425. |
That's not correct. That PR fixes a functional bug, it does not help with perf. |
Uh, ok. So, since #53424 is not going to be backported, will the perf regression be fixed in another PR or should I mark this as "won't fix"? |
The full perf regression is not fixed in master yet but I plan to do so -- I am essentially waiting for #52626 because that touches the same code (and actually that one might already do it). That will still be slower than it used to but that's expected, it is doing more work now. As for beta -- @oli-obk any plans? We could (should, I think) backport #52925, that would help. Beyond that, the only measures I can think of is to reduce what validation does -- make it stop at pointers, or disable it entirely. |
visiting for triage. Assigning to @oli-obk to take the lead on the backport. |
visited for triage: I think @RalfJung has one more fix in this direction planned? |
Do I? Not that I know of.^^ I consider this done with #52626. |
Awesome! I'll close this when #52626 is merged |
Fix issue #52475: Make loop detector only consider reachable memory As [suggested](#51702 (comment)) by @oli-obk `alloc_id`s should be ignored by traversing all `Allocation`s in interpreter memory at a given moment in time, beginning by `ByRef` locals in the stack. - [x] Generalize the implementation of `Hash` for `EvalSnapshot` to traverse `Allocation`s - [x] Generalize the implementation of `PartialEq` for `EvalSnapshot` to traverse `Allocation`s - [x] Commit regression tests Fixes #52626 Fixes #52849
Browsing the perf page I noticed a pretty big regression:
I have tracked it down to one of these changes. This might be a necessary evil, but I'd rather make sure.
cc @RalfJung, @oli-obk
The text was updated successfully, but these errors were encountered: