Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track TBAA metadata in jl_cgval_t #16230

Merged
merged 1 commit into from
May 7, 2016
Merged

Track TBAA metadata in jl_cgval_t #16230

merged 1 commit into from
May 7, 2016

Conversation

yuyichao
Copy link
Contributor

@yuyichao yuyichao commented May 6, 2016

So that field load can be marked with the correct TBAA.

This fixes the issue @JeffBezanson noticed in #15402 (comment)

My repro of the issue now gives (yeah, it vectorizes).

define %jl_value_t* @"julia_collect_to2!_51446"(%jl_value_t*, %Gen*) #0 {
top:
  %2 = getelementptr inbounds %Gen, %Gen* %1, i64 0, i32 0, i32 0
  %3 = load i64, i64* %2, align 8
  %4 = getelementptr inbounds %Gen, %Gen* %1, i64 0, i32 0, i32 1
  %5 = load i64, i64* %4, align 8
  %6 = add i64 %5, 1
  %7 = icmp eq i64 %3, %6
  br i1 %7, label %L2, label %if.lr.ph

if.lr.ph:                                         ; preds = %top
  %8 = bitcast %jl_value_t* %0 to double**
  %9 = load double*, double** %8, align 8
  %10 = load i64, i64* %4, align 8
  %11 = sub i64 %10, %3
  %backedge.overflow = icmp eq i64 %11, -1
  %12 = add i64 %10, 1
  br i1 %backedge.overflow, label %scalar.ph, label %overflow.checked

overflow.checked:                                 ; preds = %if.lr.ph
  %13 = sub i64 %12, %3
  %n.vec = and i64 %13, -8
  %end.idx.rnd.down = add i64 %n.vec, %3
  %cmp.zero = icmp eq i64 %n.vec, 0
  %ind.end = or i64 %n.vec, 1
  br i1 %cmp.zero, label %middle.block, label %vector.ph

vector.ph:                                        ; preds = %overflow.checked
  br label %vector.body

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i64 [ %3, %vector.ph ], [ %index.next, %vector.body ]
  %normalized.idx = sub i64 %index, %3
  %broadcast.splatinsert8 = insertelement <4 x i64> undef, i64 %index, i32 0
  %broadcast.splat9 = shufflevector <4 x i64> %broadcast.splatinsert8, <4 x i64> undef, <4 x i32> zeroinitializer
  %induction10 = add <4 x i64> %broadcast.splat9, <i64 0, i64 1, i64 2, i64 3>
  %induction11 = add <4 x i64> %broadcast.splat9, <i64 4, i64 5, i64 6, i64 7>
  %14 = sitofp <4 x i64> %induction10 to <4 x double>
  %15 = sitofp <4 x i64> %induction11 to <4 x double>
  %16 = fdiv <4 x double> %14, <double 1.000000e+01, double 1.000000e+01, double 1.000000e+01, double 1.000000e+01>
  %17 = fdiv <4 x double> %15, <double 1.000000e+01, double 1.000000e+01, double 1.000000e+01, double 1.000000e+01>
  %18 = getelementptr double, double* %9, i64 %normalized.idx
  %19 = bitcast double* %18 to <4 x double>*
  store <4 x double> %16, <4 x double>* %19, align 8
  %20 = getelementptr double, double* %18, i64 4
  %21 = bitcast double* %20 to <4 x double>*
  store <4 x double> %17, <4 x double>* %21, align 8
  %index.next = add i64 %index, 8
  %22 = icmp eq i64 %index.next, %end.idx.rnd.down
  br i1 %22, label %middle.block, label %vector.body

middle.block:                                     ; preds = %vector.body, %overflow.checked
  %resume.val = phi i64 [ 1, %overflow.checked ], [ %ind.end, %vector.body ]
  %resume.val5 = phi i64 [ %3, %overflow.checked ], [ %end.idx.rnd.down, %vector.body ]
  %trunc.resume.val = phi i64 [ %3, %overflow.checked ], [ %end.idx.rnd.down, %vector.body ]
  %cmp.n = icmp eq i64 %12, %resume.val5
  br i1 %cmp.n, label %L2.loopexit, label %scalar.ph

scalar.ph:                                        ; preds = %middle.block, %if.lr.ph
  %bc.resume.val = phi i64 [ %resume.val, %middle.block ], [ 1, %if.lr.ph ]
  %bc.trunc.resume.val = phi i64 [ %trunc.resume.val, %middle.block ], [ %3, %if.lr.ph ]
  br label %if

L2.loopexit:                                      ; preds = %middle.block, %if
  br label %L2

L2:                                               ; preds = %L2.loopexit, %top
  ret %jl_value_t* %0

if:                                               ; preds = %scalar.ph, %if
  %i.04 = phi i64 [ %bc.resume.val, %scalar.ph ], [ %28, %if ]
  %st.03 = phi i64 [ %bc.trunc.resume.val, %scalar.ph ], [ %23, %if ]
  %23 = add i64 %st.03, 1
  %24 = sitofp i64 %st.03 to double
  %25 = fdiv double %24, 1.000000e+01
  %26 = add i64 %i.04, -1
  %27 = getelementptr double, double* %9, i64 %26
  store double %25, double* %27, align 8
  %28 = add i64 %i.04, 1
  %29 = icmp eq i64 %st.03, %10
  br i1 %29, label %L2.loopexit, label %if
}

Marked as WIP since I still want to double check if the tbaa assignment are all valid.

@yuyichao yuyichao added performance Must go faster compiler:codegen Generation of LLVM IR and native code labels May 6, 2016
@yuyichao yuyichao changed the title WIP: Track TBAA metadata in jl_cgval_t Track TBAA metadata in jl_cgval_t May 6, 2016
@yuyichao
Copy link
Contributor Author

yuyichao commented May 6, 2016

I took this chance to clean up the tbaa tree a little, add a few more nodes for memory accesses that we weren't marking before and delete a few nodes that are not used anymore. A summary of the difference,

  1. (Maybe the most important one) array buffer is not allowed to alias jl_value_t* (why would you do that....)
  2. Access of jl_datatype_t uses tbaa_const (I don't think we mutate it after type construction time which should be invisible to codegen)
  3. Access of type tag are marked with tbaa_tag (It's still valid to unsafe_load it though, unsafe_store! is also valid from tbaa point of view but you'll run into other issues very quickly...)
  4. Replace tbaa_user with tbaa_mutab for mutable types, both inherit from tbaa_value.

Local tests passed, (CI seems to be struggling with the long queue...)

So that field load can be marked with the correct TBAA.
Also clean up the tbaa tree. Array buffer and normal julia objects
are not allowed to alias now.
@vtjnash vtjnash merged commit d29f997 into master May 7, 2016
@vtjnash vtjnash deleted the yyc/codegen/cgval-tbaa branch May 7, 2016 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants