Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unwinding through fn() references across crates causes the behavior to depend on the optimization level #29485

Closed
mahkoh opened this issue Oct 30, 2015 · 33 comments · Fixed by #31545
Assignees
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@mahkoh
Copy link
Contributor

mahkoh commented Oct 30, 2015

In a.rs

#![crate_type = "lib"]

pub struct X(pub u8);

impl Drop for X {
    fn drop(&mut self) {
        println!("{}", self.0);
    }
}

pub fn f(x: &mut X, g: fn()) {
    x.0 = 1;
    g();
    x.0 = 0;
}

In b.rs

extern crate a;

fn main() {
    a::f(&mut a::X(0), g);
}

fn g() {
    panic!();
}
$ rustc -O a.rs && rustc -L . b.rs && ./b
thread '<main>' panicked at 'explicit panic', b.rs:8
0
$ rustc a.rs && rustc -L . b.rs && ./b 
thread '<main>' panicked at 'explicit panic', b.rs:8
1
$ rustc -V
rustc 1.6.0-nightly (2e07996a9 2015-10-29)

NB: Some pre-1.0 nightlies already behaved this way.

@alexcrichton
Copy link
Member

Looks like a codegen bug related to the noalias attribute we place on the IR. It also looks like this isn't necessarily related to a function pointer. The codegen for this function is a little messy but when cleaned up it looks like:

%X = type { i8, i8 }                                  

declare void @foo()                                   

define void @f(%X* noalias) {                         
  %x = alloca %X*                                     
  store %X* %0, %X** %x                               
  %2 = load %X*, %X** %x                              
  %3 = getelementptr inbounds %X, %X* %2, i32 0, i32 0
  store i8 1, i8* %3                                  
  call void @foo()                                    
  %4 = load %X*, %X** %x                              
  %5 = getelementptr inbounds %X, %X* %4, i32 0, i32 0
  store i8 0, i8* %5                                  
  ret void                                            
}                                                     

When optimized, we get:

%X = type { i8, i8 }

declare void @foo()

define void @f(%X* noalias nocapture) {
  %2 = getelementptr inbounds %X, %X* %0, i64 0, i32 0
  tail call void @foo()
  store i8 0, i8* %2, align 1
  ret void
}

And as we can see, the store of 1 has been optimized away. If we instead remove the noalias attribute on the pointer, the store remains

%X = type { i8, i8 }

declare void @foo()

define void @f(%X* nocapture) {
  %2 = getelementptr inbounds %X, %X* %0, i64 0, i32 0
  store i8 1, i8* %2, align 1
  tail call void @foo()
  store i8 0, i8* %2, align 1
  ret void
}

cc @dotdash, perhaps we can't place noalias on &mut? That seems unfortunate, but I guess there's also a possibility this is a bug in LLVM (but seems unlikely).

@alexcrichton alexcrichton added A-codegen Area: Code generation I-nominated T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 31, 2015
@Gankra
Copy link
Contributor

Gankra commented Oct 31, 2015

I'm no expert, but isn't the bug here that we're emitting call instead of invoke?

AFAIK marking &mut's noalias is correct. It seems to me that LLVM is using the fact that a call can't throw to determine that the first store is unobservable. If we emited invoke it would be forced to conservatively assume that g can throw and perform the store.

I'm wrong.

@eefriedman
Copy link
Contributor

An LLVM IR call is semantically the same as an invoke with a trivial landing pad.

There's a decent argument that LLVM's treatment of noalias is wrong. Consider the following C++ (GNU-dialect) testcase:

#include <iostream>
using namespace std;

void xx() { 
    throw 1;
}

void (*volatile g)() = xx;

void f(int* __restrict x) {
    *x = 1;
    g();
    *x = 0;
}

int main() {
    int x = 0;
    try { f(&x); } catch(...) {}
    std::cout << x;
    return 0;
}

gcc prints 1; clang prints 0.

On the other hand, you're likely to run into resistance proposing any restriction on how alias analysis can reason about noalias pointers.

@sunfishcode
Copy link
Member

This isn't just a noalias issue; it can occur with other kinds of alias information as well. For example, the first store gets deleted in this code too, even though the call to foo may unwind.

declare void @foo()

@g = external global i8

define void @f() {
  store i8 1, i8* @g
  call void @foo() readnone
  store i8 0, i8* @g
  ret void
}

MemoryDependenceAnalysis in LLVM currently says that the second store here has a direct local dependence on the first store. There's a sense in which that is a correct answer, but DeadStoreElimination inteprets this answer to mean that the first store is dead. It's not immediately obvious which of these two should be fixed.

call having an implicit control path requires special handling in a other passes too (eg., the inliner has to know how to convert call into invoke if it's being inlined into a context where it'll have a cleanup), so it'd be cool if LLVM had something like an invoke with only one successor, which the optimizer could treat like a kind of conditional return, however, that's a much bigger change.

@nikomatsakis
Copy link
Contributor

triage: P-high

@rust-highfive rust-highfive added P-high High priority and removed I-nominated labels Nov 5, 2015
@nikomatsakis
Copy link
Contributor

Is the problem here that LLVM is optimizing too heavily, or that we are providing incorrect annotations?

Either way, it may be prudent to supply fewer annotations for the time being. That'd certainly be an easy patch to do.

@Gankra
Copy link
Contributor

Gankra commented Nov 5, 2015

I asked sunfish to clarify on IRC, and he believes this is 100% an llvm bug.

@sunfishcode
Copy link
Member

I've now filed an LLVM bug: https://llvm.org/bugs/show_bug.cgi?id=25422

@nikomatsakis
Copy link
Contributor

We were discussing in the compiler team meeting. Do we expect LLVM to fix this in a short time frame? I would think probably not -- we were thinking therefore that it would be prudent to workaround in the time being. Would this mean removing the aliasing annotations and thus taking some speed hit? Are there other solutions? Other implications?

@eefriedman
Copy link
Contributor

Do we expect LLVM to fix this in a short time frame?

No; fixing DSE is probably easy enough... but trusting noalias to work correctly alongside unwinding would probably require auditing every use of alias analysis in LLVM (for example, I'm pretty sure LICM is also broken). In any case, no matter how fast this is fixed, Rust can't unconditionally depend on any fix because we support linking against 3.7.

Would this mean removing the aliasing annotations and thus taking some speed hit?

Yes.

Are there other solutions?

No.

Other implications?

I don't think so.

@sunfishcode
Copy link
Member

LICM is not obviously broken; see eg. isGuaranteedToExecute in LICM.cpp; it explicitly checks for instructions which may unwind before doing related things.

The bug here does appear to arise from the difference between MemDep's way of viewing the world and DSE's.

I don't know how conservative Rust wants to be here, but there is room for optimism that this may just be an isolated DSE bug (or a MemDep bug which only affects DSE).

@arielb1
Copy link
Contributor

arielb1 commented Nov 12, 2015

Won't removing noalias kill our performance?

@eefriedman
Copy link
Contributor

Okay, LICM's is in fact correct... the definition of isGuaranteedToExecute is overly conservative in a way which happens to cover up this issue.

That said, my non-comprehensive audit shows -memcpyopt, -mldst-motion, and -loop-idiom are broken. (Granted, -mldst-motion isn't part of the default pass pipeline.) A couple more testcases:

#![crate_type="lib"]
// Optimized to memset before loop, but g() could panic.
pub fn f(x: &mut [i32; 1000], g: fn() -> i32) {
    for i in 0..1000 {
        x[i] = 0;
        g();
    }
}
#![crate_type="lib"]
// Optimized to output result of h() directly into x, but g() could panic.
pub fn f(x: &mut [i32; 100], g: fn() -> i32, h: fn() -> [i32; 100]) {
    let a = h();
    g();
    *x = a;
}

@sunfishcode
Copy link
Member

Wow. I revoke my optimism then.

@nikomatsakis
Copy link
Contributor

@arielb1

Won't removing noalias kill our performance?

I am curious what effect it will have, I don't really have a good feeling
at all for how important this is.

On Thu, Nov 12, 2015 at 7:39 PM, Dan Gohman notifications@github.com
wrote:

Wow. I revoke my optimism then.


Reply to this email directly or view it on GitHub
#29485 (comment).

@arielb1 arielb1 added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Nov 13, 2015
@arielb1
Copy link
Contributor

arielb1 commented Nov 13, 2015

This looks rather unambiguously like an LLVM bug (though we may want to work around it).

@pnkfelix
Copy link
Member

(a reasonable subtask for a relative newcomer to the project could be to evaluate the actual performance impact removing the noalias attribute here.)

@Gankra Gankra added the I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness label Nov 22, 2015
@dotdash
Copy link
Contributor

dotdash commented Dec 4, 2015

For rust itself, the text segment in the .so files grows by 1-3%. Runtime for rustc -Zno-trans also goes up by 1-3%, depending on the crate.

@arielb1
Copy link
Contributor

arielb1 commented Dec 4, 2015

@dotdash

Could we have concrete numbers (-Z time-passes output)? This is much lower than I thought - what change have you done?

@arielb1
Copy link
Contributor

arielb1 commented Dec 4, 2015

Of course, 1-3% perf losses still suck - this might be a reason to make official rustc builds use -Z no-landing-pads.

@dotdash
Copy link
Contributor

dotdash commented Dec 7, 2015

@arielb1 I removed all noalias attributes from function arguments and return values.

diff --git a/src/librustc_trans/trans/attributes.rs b/src/librustc_trans/trans/attributes.rs
index 28dfa4e..32dd31d 100644
--- a/src/librustc_trans/trans/attributes.rs
+++ b/src/librustc_trans/trans/attributes.rs
@@ -10,7 +10,7 @@
 //! Set and unset common attributes on LLVM values.

 use libc::{c_uint, c_ulonglong};
-use llvm::{self, ValueRef, AttrHelper};
+use llvm::{self, ValueRef};
 use middle::ty;
 use middle::infer;
 use session::config::NoDebugInfo;
@@ -117,7 +117,6 @@ pub fn from_fn_attrs(ccx: &CrateContext, attrs: &[ast::Attribute], llfn: ValueRe
                                                llvm::ColdAttribute as u64)
             }
         } else if attr.check_name("allocator") {
-            llvm::Attribute::NoAlias.apply_llfn(llvm::ReturnIndex as c_uint, llfn);
         } else if attr.check_name("unwind") {
             unwind(llfn, true);
         }
@@ -190,24 +189,12 @@ pub fn from_fn_type<'a, 'tcx>(ccx: &CrateContext<'a, 'tcx>, fn_type: ty::Ty<'tcx
             // invisible to the program. We also know it's nonnull as well
             // as how many bytes we can dereference
             attrs.arg(1, llvm::Attribute::StructRet)
-                 .arg(1, llvm::Attribute::NoAlias)
                  .arg(1, llvm::Attribute::NoCapture)
                  .arg(1, llvm::DereferenceableAttribute(llret_sz));

             // Add one more since there's an outptr
             idx += 1;
         } else {
-            // The `noalias` attribute on the return value is useful to a
-            // function ptr caller.
-            match ret_ty.sty {
-                // `Box` pointer return values never alias because ownership
-                // is transferred
-                ty::TyBox(it) if common::type_is_sized(ccx.tcx(), it) => {
-                    attrs.ret(llvm::Attribute::NoAlias);
-                }
-                _ => {}
-            }
-
             // We can also mark the return value as `dereferenceable` in certain cases
             match ret_ty.sty {
                 // These are not really pointers but pairs, (pointer, len)
@@ -233,8 +220,7 @@ pub fn from_fn_type<'a, 'tcx>(ccx: &CrateContext<'a, 'tcx>, fn_type: ty::Ty<'tcx
                 // For non-immediate arguments the callee gets its own copy of
                 // the value on the stack, so there are no aliases. It's also
                 // program-invisible so can't possibly capture
-                attrs.arg(idx, llvm::Attribute::NoAlias)
-                     .arg(idx, llvm::Attribute::NoCapture)
+                attrs.arg(idx, llvm::Attribute::NoCapture)
                      .arg(idx, llvm::DereferenceableAttribute(llarg_sz));
             }

@@ -244,8 +230,6 @@ pub fn from_fn_type<'a, 'tcx>(ccx: &CrateContext<'a, 'tcx>, fn_type: ty::Ty<'tcx

             // `Box` pointer parameters never alias because ownership is transferred
             ty::TyBox(inner) => {
-                attrs.arg(idx, llvm::Attribute::NoAlias);
-
                 if common::type_is_sized(ccx.tcx(), inner) {
                     let llsz = machine::llsize_of_real(ccx, type_of::type_of(ccx, inner));
                     attrs.arg(idx, llvm::DereferenceableAttribute(llsz));
@@ -265,10 +249,6 @@ pub fn from_fn_type<'a, 'tcx>(ccx: &CrateContext<'a, 'tcx>, fn_type: ty::Ty<'tcx
                 // on memory dependencies rather than pointer equality
                 let interior_unsafe = mt.ty.type_contents(ccx.tcx()).interior_unsafe();

-                if mt.mutbl == hir::MutMutable || !interior_unsafe {
-                    attrs.arg(idx, llvm::Attribute::NoAlias);
-                }
-
                 if mt.mutbl == hir::MutImmutable && !interior_unsafe {
                     attrs.arg(idx, llvm::Attribute::ReadOnly);
                 }
diff --git a/src/librustc_trans/trans/foreign.rs b/src/librustc_trans/trans/foreign.rs
index 95e9e85..39858e9 100644
--- a/src/librustc_trans/trans/foreign.rs
+++ b/src/librustc_trans/trans/foreign.rs
@@ -363,8 +363,7 @@ pub fn trans_native_call<'blk, 'tcx>(bcx: Block<'blk, 'tcx>,
         // The outptr can be noalias and nocapture because it's entirely
         // invisible to the program. We also know it's nonnull as well
         // as how many bytes we can dereference
-        attrs.arg(1, llvm::Attribute::NoAlias)
-             .arg(1, llvm::Attribute::NoCapture)
+        attrs.arg(1, llvm::Attribute::NoCapture)
              .arg(1, llvm::DereferenceableAttribute(llret_sz));
     };

Sizes before:

    text      data     bss       dec  filename
    9021     38464       8     47493  lib/libarena-71b07a99.so
   32863     12766       8     45637  lib/libflate-71b07a99.so
   19286     40241       4     59531  lib/libfmt_macros-71b07a99.so
   60656     70240       8    130904  lib/libgetopts-71b07a99.so
   12498     47964       4     60466  lib/libgraphviz-71b07a99.so
   25740     23790      24     49554  lib/liblog-71b07a99.so
   49478    104506       4    153988  lib/librbml-71b07a99.so
 3367632   3223067      32   6590731  lib/librustc-71b07a99.so
  216766     87705       4    304475  lib/librustc_back-71b07a99.so
  304387    176979       4    481370  lib/librustc_borrowck-71b07a99.so
   18597    149256       8    167861  lib/librustc_data_structures-71b07a99.so
 1926412    121359       8   2047779  lib/librustc_driver-71b07a99.so
  768321   1497412      16   2265749  lib/librustc_front-71b07a99.so
  259104     65749       8    324861  lib/librustc_lint-71b07a99.so
29222128   1897150  105744  31225022  lib/librustc_llvm-71b07a99.so
 1611292    218121       4   1829417  lib/librustc_metadata-71b07a99.so
  420463    165511       8    585982  lib/librustc_mir-71b07a99.so
  926977     39857       4    966838  lib/librustc_platform_intrinsics-71b07a99.so
   39491     11789       8     51288  lib/librustc_plugin-71b07a99.so
   89545     21788       4    111337  lib/librustc_privacy-71b07a99.so
  409397    164117       8    573522  lib/librustc_resolve-71b07a99.so
 2262107    715243      12   2977362  lib/librustc_trans-71b07a99.so
 1749382    562100       4   2311486  lib/librustc_typeck-71b07a99.so
 3166949    705446      24   3872419  lib/librustdoc-71b07a99.so
  163400    500427      16    663843  lib/libserialize-71b07a99.so
 1623203   1825834    5912   3454949  lib/libstd-71b07a99.so
 4221433   3111003      24   7332460  lib/libsyntax-71b07a99.so
  149020     80788       4    229812  lib/libterm-71b07a99.so
  215043    126814      16    341873  lib/libtest-71b07a99.so
53340591  15805486  111932  69258009  (TOTALS)

Sizes after:

    text      data     bss       dec  filename
    9245     38469       8     47722  lib/libarena-71b07a99.so
   32903     12772       4     45679  lib/libflate-71b07a99.so
   20786     40241       4     61031  lib/libfmt_macros-71b07a99.so
   63584     70235       4    133823  lib/libgetopts-71b07a99.so
   12744     47949       8     60701  lib/libgraphviz-71b07a99.so
   26132     23790      24     49946  lib/liblog-71b07a99.so
   49860    104494       8    154362  lib/librbml-71b07a99.so
 3393388   3223197      32   6616617  lib/librustc-71b07a99.so
  219222     87724       4    306950  lib/librustc_back-71b07a99.so
  317487    176994       4    494485  lib/librustc_borrowck-71b07a99.so
   18965    149257       4    168226  lib/librustc_data_structures-71b07a99.so
 1980840    121366       8   2102214  lib/librustc_driver-71b07a99.so
  777253   1497469      16   2274738  lib/librustc_front-71b07a99.so
  261536     65773       8    327317  lib/librustc_lint-71b07a99.so
29222240   1897162  105744  31225146  lib/librustc_llvm-71b07a99.so
 1661684    218169       4   1879857  lib/librustc_metadata-71b07a99.so
  427259    165522       4    592785  lib/librustc_mir-71b07a99.so
  927005     39868       4    966877  lib/librustc_platform_intrinsics-71b07a99.so
   39967     11805       8     51780  lib/librustc_plugin-71b07a99.so
   90133     21789       8    111930  lib/librustc_privacy-71b07a99.so
  407357    164139       4    571500  lib/librustc_resolve-71b07a99.so
 2351875    715590       8   3067473  lib/librustc_trans-71b07a99.so
 1763514    562155       4   2325673  lib/librustc_typeck-71b07a99.so
 3172349    705522      24   3877895  lib/librustdoc-71b07a99.so
  165452    500460      16    665928  lib/libserialize-71b07a99.so
 1642203   1825793    5912   3473908  lib/libstd-71b07a99.so
 4319225   3111203      24   7430452  lib/libsyntax-71b07a99.so
  151144     80785       4    231933  lib/libterm-71b07a99.so
  218399    126818      16    345233  lib/libtest-71b07a99.so
53743751  15806510  111920  69662181  (TOTALS)

Ignoring the rustc_llvm crate, which is pretty much unaffected by this change, the size of the text segment is 1.67% larger without the noalias attributes.

I'll gather new timings in the evening. It'll probably be output from perf stat though, as the -Ztime-passes times didn't seem stable enough to be useful. I'll check them again though.

@dotdash
Copy link
Contributor

dotdash commented Dec 7, 2015

More numbers. I went with the -Ztime-passes times, because this time they seemed rather stable. Maybe I previously forgot to turn off something that ran in the background and messed with the timings.

lilbcore

With noalias Without noalias Change Pass
0.095 0.099 +4.21% parsing
0.034 0.034 +0.00% configuration 1
0.000 0.000 +0.00% recursion limit
0.002 0.002 +0.00% gated macro checking
0.000 0.000 +0.00% crate injection
0.001 0.001 +0.00% macro loading
0.000 0.000 +0.00% plugin loading
0.000 0.000 +0.00% plugin registration
0.634 0.649 +2.36% expansion
0.012 0.013 +8.33% complete gated feature checking 1
0.146 0.149 +2.05% configuration 2
0.000 0.000 +0.00% gated configuration checking
0.071 0.072 +1.40% maybe building test harness
0.000 0.000 +0.00% prelude injection
0.006 0.006 +0.00% checking that all macro invocations are gone
0.000 0.000 +0.00% checking for inline asm in case the target doesn't support it
0.013 0.013 +0.00% complete gated feature checking 2
0.067 0.068 +1.49% assigning node ids
0.046 0.046 +0.00% lowering ast -> hir
0.101 0.102 +0.99% indexing hir
0.000 0.000 +0.00% attribute checking
0.023 0.023 +0.00% early lint checks
0.000 0.000 +0.00% external crate/lib resolution
0.004 0.004 +0.00% language item collection
0.092 0.093 +1.08% resolution
0.011 0.011 +0.00% lifetime resolution
0.000 0.000 +0.00% looking for entry point
0.000 0.000 +0.00% looking for plugin registrar
0.030 0.030 +0.00% region resolution
0.004 0.004 +0.00% loop checking
0.004 0.004 +0.00% static item recursion checking
0.105 0.105 +0.00% type collecting
0.001 0.001 +0.00% variance inference
0.451 0.469 +3.99% coherence checking
0.553 0.579 +4.70% wf checking (old)
0.729 0.735 +0.82% item-types checking
4.844 4.987 +2.95% item-bodies checking
0.000 0.000 +0.00% drop-impl checking
0.999 1.035 +3.60% wf checking (new)
0.194 0.198 +2.06% const checking
0.055 0.055 +0.00% privacy checking
0.009 0.009 +0.00% stability index
0.041 0.042 +2.43% intrinsic checking
0.011 0.011 +0.00% effect checking
0.070 0.071 +1.42% match checking
0.258 0.262 +1.55% MIR dump
0.025 0.026 +4.00% liveness checking
0.408 0.416 +1.96% borrow checking
0.204 0.207 +1.47% rvalue checking
0.092 0.093 +1.08% reachability checking
0.033 0.033 +0.00% death checking
0.031 0.032 +3.22% stability checking
0.000 0.000 +0.00% unused lib feature checking
0.696 0.725 +4.16% lint checking
11.300 11.609 +2.73% total real
11.158 11.696 +4.82% total user

libstd

With noalias Without noalias Change Pass
0.200 0.226 +13.00% parsing
0.065 0.065 +0.00% configuration 1
0.000 0.000 +0.00% recursion limit
0.003 0.003 +0.00% gated macro checking
0.000 0.000 +0.00% crate injection
0.004 0.004 +0.00% macro loading
0.000 0.000 +0.00% plugin loading
0.000 0.000 +0.00% plugin registration
0.248 0.253 +2.01% expansion
0.009 0.009 +0.00% complete gated feature checking 1
0.068 0.068 +0.00% configuration 2
0.000 0.000 +0.00% gated configuration checking
0.031 0.031 +0.00% maybe building test harness
0.029 0.029 +0.00% prelude injection
0.004 0.004 +0.00% checking that all macro invocations are gone
0.000 0.000 +0.00% checking for inline asm in case the target doesn't support it
0.009 0.009 +0.00% complete gated feature checking 2
0.029 0.029 +0.00% assigning node ids
0.025 0.025 +0.00% lowering ast -> hir
0.013 0.014 +7.69% indexing hir
0.000 0.000 +0.00% attribute checking
0.013 0.013 +0.00% early lint checks
0.001 0.001 +0.00% external crate/lib resolution
0.003 0.003 +0.00% language item collection
0.071 0.071 +0.00% resolution
0.005 0.005 +0.00% lifetime resolution
0.000 0.000 +0.00% looking for entry point
0.000 0.000 +0.00% looking for plugin registrar
0.015 0.016 +6.66% region resolution
0.002 0.002 +0.00% loop checking
0.002 0.002 +0.00% static item recursion checking
0.038 0.038 +0.00% type collecting
0.001 0.001 +0.00% variance inference
0.049 0.045 -8.17% coherence checking
0.061 0.063 +3.27% wf checking (old)
0.085 0.086 +1.17% item-types checking
1.529 1.570 +2.68% item-bodies checking
0.000 0.000 +0.00% drop-impl checking
0.171 0.181 +5.84% wf checking (new)
0.143 0.150 +4.89% const checking
0.030 0.029 -3.34% privacy checking
0.006 0.004 -33.34% stability index
0.013 0.014 +7.69% intrinsic checking
0.006 0.006 +0.00% effect checking
0.036 0.037 +2.77% match checking
0.153 0.158 +3.26% MIR dump
0.015 0.016 +6.66% liveness checking
0.262 0.271 +3.43% borrow checking
0.180 0.190 +5.55% rvalue checking
0.008 0.008 +0.00% reachability checking
0.018 0.018 +0.00% death checking
0.019 0.019 +0.00% stability checking
0.000 0.000 +0.00% unused lib feature checking
0.151 0.153 +1.32% lint checking
3.898 4.010 +2.87% total real
3.821 3.929 +2.82% total user

libsyntax

With noalias Without noalias Change Pass
0.167 0.176 +5.38% parsing
0.058 0.061 +5.17% configuration 1
0.000 0.000 +0.00% recursion limit
0.004 0.003 -25.00% gated macro checking
0.000 0.000 +0.00% crate injection
0.006 0.007 +16.66% macro loading
0.000 0.000 +0.00% plugin loading
0.000 0.000 +0.00% plugin registration
0.741 0.773 +4.31% expansion
0.019 0.019 +0.00% complete gated feature checking 1
0.159 0.167 +5.03% configuration 2
0.000 0.000 +0.00% gated configuration checking
0.078 0.080 +2.56% maybe building test harness
0.072 0.074 +2.77% prelude injection
0.009 0.009 +0.00% checking that all macro invocations are gone
0.000 0.000 +0.00% checking for inline asm in case the target doesn't support it
0.019 0.019 +0.00% complete gated feature checking 2
0.073 0.075 +2.73% assigning node ids
0.067 0.067 +0.00% lowering ast -> hir
0.056 0.057 +1.78% indexing hir
0.000 0.000 +0.00% attribute checking
0.035 0.033 -5.72% early lint checks
0.003 0.003 +0.00% external crate/lib resolution
0.006 0.006 +0.00% language item collection
0.251 0.256 +1.99% resolution
0.012 0.013 +8.33% lifetime resolution
0.000 0.000 +0.00% looking for entry point
0.000 0.000 +0.00% looking for plugin registrar
0.060 0.063 +5.00% region resolution
0.006 0.006 +0.00% loop checking
0.006 0.006 +0.00% static item recursion checking
0.064 0.064 +0.00% type collecting
0.001 0.001 +0.00% variance inference
0.062 0.059 -4.84% coherence checking
0.072 0.077 +6.94% wf checking (old)
0.128 0.139 +8.59% item-types checking
9.500 9.826 +3.43% item-bodies checking
0.000 0.000 +0.00% drop-impl checking
0.535 0.560 +4.67% wf checking (new)
0.510 0.513 +0.58% const checking
0.082 0.079 -3.66% privacy checking
0.011 0.011 +0.00% stability index
0.033 0.029 -12.13% intrinsic checking
0.025 0.024 -4.00% effect checking
0.115 0.115 +0.00% match checking
0.560 0.568 +1.42% MIR dump
0.075 0.075 +0.00% liveness checking
1.085 1.113 +2.58% borrow checking
0.530 0.548 +3.39% rvalue checking
0.024 0.024 +0.00% reachability checking
0.065 0.060 -7.70% death checking
0.060 0.062 +3.33% stability checking
0.000 0.000 +0.00% unused lib feature checking
0.435 0.440 +1.14% lint checking
16.035 16.514 +2.98% total real
15.430 15.897 +3.02% total user

@nikomatsakis
Copy link
Contributor

@dotdash great, thanks! This is sort of inline with my expectations. I also expect we'll see more impact on smaller examples.

@nikomatsakis
Copy link
Contributor

As an aside, at the work week I got a lot of great suggestions for benchmarks, I have to try and draw them up together into a central list now.

@nikomatsakis nikomatsakis self-assigned this Jan 7, 2016
@nikomatsakis
Copy link
Contributor

Assigning to myself for the purpose of gathering up a few benchmarks.

@RalfJung
Copy link
Member

Doesn't the above patch remove way more noalias annotations than is necessary? In particular, The noalias on returns from Box::new and the noalias on shared references should be much less critical, and not be affected by the LLVM bug at hand. I can't quite tell where many of the other noalias used to be added.

@dotdash
Copy link
Contributor

dotdash commented Jan 28, 2016

So I ran the benchmarks from https://github.com/cristicbz/rust-doom/tree/rustc-benchmark
with a current master build and one with the above patch applied.

master:

test freedoom1 ... bench: 396,057,607 ns/iter (+/- 14,013,493)
test freedoom2 ... bench: 508,963,398 ns/iter (+/- 19,068,029)

Patched:

test freedoom1 ... bench: 416,906,637 ns/iter (+/- 5,445,396)
test freedoom2 ... bench: 530,530,952 ns/iter (+/- 8,757,330)

@nikomatsakis
Copy link
Contributor

That is interesting. Looks like about a 5% slowdown.

@RalfJung
Copy link
Member

RalfJung commented Feb 1, 2016

This patch also removes NoAlias for "Box pointer return values" and for shared references. Neither of them are affected by the problem discussed here, and I assume at least Box return values are not even controversial. (The other places NoAlias is removed, I could not recognize; mind you I did not spend much time analyzing that.)

Maybe it'd be interesting to see how big the impact of just disabling NoAlias for shared borrows is?

EDIT: oops obviously I mixed up shared and mutable references,

@dotdash
Copy link
Contributor

dotdash commented Feb 4, 2016

Time for some new numbers. This time only removing the noalias attribute from mutable borrows as suggested by @RalfJung. I've been slightly sceptical about that the return value ones being ok to be marked as noalias, but that was because I thought of them as an out-parameter (like in the LLVM IR) instead of the return values they are in the rust code. The problematic cases don't arise with return values because we only write to them at the end of the function.

diff --git a/src/librustc_trans/trans/attributes.rs b/src/librustc_trans/trans/attributes.rs
index 28dfa4e..95c3941 100644
--- a/src/librustc_trans/trans/attributes.rs
+++ b/src/librustc_trans/trans/attributes.rs
@@ -265,7 +265,7 @@ pub fn from_fn_type<'a, 'tcx>(ccx: &CrateContext<'a, 'tcx>, fn_type: ty::Ty<'tcx
                 // on memory dependencies rather than pointer equality
                 let interior_unsafe = mt.ty.type_contents(ccx.tcx()).interior_unsafe();

-                if mt.mutbl == hir::MutMutable || !interior_unsafe {
+                if mt.mutbl != hir::MutMutable && !interior_unsafe {
                     attrs.arg(idx, llvm::Attribute::NoAlias);
                 }

Current master (rustc 1.8.0-dev (d0ef74026 2016-02-04)):

test freedoom1 ... bench: 384,162,874 ns/iter (+/- 11,155,763)
test freedoom2 ... bench: 492,713,445 ns/iter (+/- 18,825,658)

Patched master:

test freedoom1 ... bench: 387,141,369 ns/iter (+/- 4,549,142)
test freedoom2 ... bench: 493,736,564 ns/iter (+/- 6,199,499)

So basically no difference in performance. I'll make an PR for this.

@Gankra
Copy link
Contributor

Gankra commented Feb 5, 2016

Thanks for persevering, @dotdash!

@nikomatsakis
Copy link
Contributor

On Thu, Feb 04, 2016 at 02:48:48PM -0800, Björn Steinbrink wrote:

So basically no difference in performance. I'll make an PR for this.

Great.

dotdash added a commit to dotdash/rust that referenced this issue Feb 10, 2016
LLVM's memory dependence analysis doesn't properly account for calls
that could unwind and thus effectively act as a branching point. This
can lead to stores that are only visible when the call unwinds being
removed, possibly leading to calls to drop() functions with b0rked
memory contents.

As there is no fix for this in LLVM yet and we want to keep
compatibility to current LLVM versions anyways, we have to workaround
this bug by omitting the noalias attribute on &mut function arguments.
Benchmarks suggest that the performance loss by this change is very
small.

Thanks to @RalfJung for pushing me towards not removing too many
noalias annotations and @alexcrichton for helping out with the test for
this bug.

Fixes rust-lang#29485
bors added a commit that referenced this issue Feb 11, 2016
LLVM's memory dependence analysis doesn't properly account for calls
that could unwind and thus effectively act as a branching point. This
can lead to stores that are only visible when the call unwinds being
removed, possibly leading to calls to drop() functions with b0rked
memory contents.

As there is no fix for this in LLVM yet and we want to keep
compatibility to current LLVM versions anyways, we have to workaround
this bug by omitting the noalias attribute on &mut function arguments.
Benchmarks suggest that the performance loss by this change is very
small.

Thanks to @RalfJung for pushing me towards not removing too many
noalias annotations and @alexcrichton for helping out with the test for
this bug.

Fixes #29485
@bluss
Copy link
Member

bluss commented Mar 13, 2016

#32155 could be a regression from this change. It's not that particular code regresses all the time, but it makes the optimization flaky, and it depends on the surrounding program.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.