Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FXML.1923: PDLL support for native constraints with attribute results #24

Conversation

martin-luecke
Copy link

This PR adds support for PDLL native constraints to return results.
This is useful for situations where a pattern checks for certain constraints of multiple interdependent attributes and computes a new attribute value based on them. Currently we do this check natively in C++ during matching and after a successful match have to escape to native C++ again to do the computation during the rewriting part of the pattern. With this PR we can do the computation in C++ during matching and use the result in the rewriting part of the pattern. Effectively this enables a choice in the trade-off of memory consumption during matching vs recomputation of values.

This is a simplified example of a situation where this is useful: We have two operations with certain attributes that have interdependent constraints. For instance attr_foo: one_of [0, 2, 4, 8], attr_bar: one_of [0, 2, 4, 8] and attr_foo == attr_bar. The pattern should only match if all conditions are true. The new operation should be created with a new attribute which is computed from the two matched attributes e.g. attr_baz = attr_foo * attr_bar. For the check we already escape to native C++ and have all values at hand so it makes sense to directly compute the new attribute value as well:

Constraint checkAndCompute(attr0: Attr, attr1 : Attr) -> Attr;

Pattern example with benefit(1) { 
    let foo = op<dialect.foo>() {attr = attr_foo : Attr}; 
    let bar = op<dialect.bar>(foo) {attr = attr_bar : Attr}; 
    let attr_baz = checkAndCompute(attr_foo, attr_bar); 
    rewrite bar with { 
        let baz = op<dialect.baz>; 
        setAttribute(baz, attr<"\"attr\"">, attr_baz); 
        replace bar with baz; 
    }; 
} 

To achieve this the following changes were necessary:

  • Remove simple check in PDLL parser to allow native constraints to return results

  • Change PDL definition of pdl.apply_native_constraint to allow variadic results

  • Change PDL_interp definition of pdl_interp.apply_constraint to allow variadic results

  • Adjust PDLToPDLInterp Pass:
    The input to the pass is an arbitrary number of PDL patterns. The pass collects all predicates
    that are required to match all of the pdl patterns and tries to establish an ordering that allows
    creation of a single efficient matcher function to match all of them. Hence, the pass does not support
    the creation of a value on the matching side of a pattern out of the box. Simply referring to
    the pdl.apply_native_constraint operation is also not possible because the pdl patterns are deleted
    before the pdl_interp dialect operations are created.
    To solve this we record the type of the results in the predicate for the constraint and
    create positions (how the pass refers to values that are not materialized yet) for them.
    When a position is evaluated (i.e. used by an op on the rhs of the pattern) we emit a placeholder value
    to enable the creation of the pdl_interp dialect operation. The placeholder value is replaced later
    when the actual pdl_interp.apply_constraint operation is created.
    This is required in all scenarios where a native constraint returns a result that is used on the rhs
    because said result has to be an input the the pdl_interp rewriter function that is created for the pattern.
    However, the call to this function is always generated before the pdl_interp.apply_constraint operation is created.

  • Modify Bytecode generator and interpreter:
    Constraint functions which return results have a different type compared to existing constraint functions.
    They have the same type as native rewrite functions. For this reason they are for now registered as rewrite functions.
    Other options:

    • change the type for all native constraints to include results
    • Note:
      Now that constraints may produce results it is hard to say what is the exact difference between native constraints
      and native rewrites. The only difference is that native constraints are evaluated during matching and native rewrites
      are evaluated during rewriting. So it might even make sense to unify the two concepts and have a single type for both
      and only a single function to register them, i.e. registerNativeFunction instead of registerConstraintFunction and registerRewriteFunction.

    ByteCode generation:

    • constraint lookup in ConstrainFunctions or RewriteFunctions depending on whether the constraint produces results
    • allocate memory for results in byte code

    ByteCode execution:

    • execute constraint as ConstraintFunction or RewriteFunction depending on whether it produces results
    • save results in byte code

For a follow-up PR:

  • add support for constraints which create and return new operations
  • add support for returning ranges from constraints

@martin-luecke martin-luecke self-assigned this Apr 18, 2023
@martin-luecke martin-luecke requested a review from ljfitz April 18, 2023 09:18
@ljfitz ljfitz requested review from maxbartel and ehsan-toosi April 19, 2023 09:02
Copy link

@ehsan-toosi ehsan-toosi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, thank you very much for your explanation in the PR description. I appreciate that a lot. That's a nice feature. I have a few questions/suggestions:

  • Not for this work perhaps: How hard is it to give these third-kind of native a different name (i.e. utility native) and not mixing what we want with constraints?
  • I don't see a useful case at the moment to have declarations of natives for PDLL. When we add new natives, we have to define them here as well which wasn't the case for PDL.
  • While I was working with PDLL and sometimes converted it to pdl_interp, I realized that if you have a constraint that is not dependent on the operation or values or attribute you are capturing, after lowering it to pdl_interp, the order of the natives are not matched with the one in PDLL. In fact, it moves it to the beginning of the matching section and that was problematic for some cases. I can help you to recreate that case and see if it would be problematic here as well.
  • Important: What if we enable rewrite failure instead of adding this kind of constraint? This is not supported by PDL now. So, the idea is, we match, we go the rewrite section and there we can fail and stop rewriting and we go to the next matching.

@martin-luecke
Copy link
Author

Not for this work perhaps: How hard is it to give these third-kind of native a different name (i.e. utility native) and not mixing what we want with constraints?

It really depends on how we want to handle this operation:
In the PDLL frontend the operation is not named anyway so there would be only be changes to the parser as to which operation it emits.
If the aim of your proposal is to not touch how the existing operations are represented and handled we would need to introduce a lot of duplicated code in the lowering pass and in the byte code interpreter as we would handle his new operations almost completely similar to the existing ones.
However, I think this is a quite natural extension to the existing constraints (in the existing code there was even a reference that constraints with results is a TODO). Also note that the semantics of this does not yield a "produce a value on the lhs of the Pattern" behaviour but really still "check a native constraint and possibly return a value". i.e. the pdl.apply_native_constraint still always requires at least one operand to check a constraint on.

While I was working with PDLL and sometimes converted it to pdl_interp, I realized that if you have a constraint that is not dependent on the operation or values or attribute you are capturing, after lowering it to pdl_interp, the order of the natives are not matched with the one in PDLL. In fact, it moves it to the beginning of the matching section and that was problematic for some cases. I can help you to recreate that case and see if it would be problematic here as well.

A constraint is meant to always check something on the IR that you are matching (which is why it requires at least one operand) and will by the pass be ordered to a location where all of its operands have been checked and evaluated, i.e. when there are no operands to the beginning of the pattern (Note: This is also influenced by other patterns in the same file, especially when they call the same native constraint). I think what you are referring to are the constraints that should check for state in the pass it is applied in. They are quite special and out of scope for this extension. Nothing with regard to that has been changed so I expect the issues will be similar.

Important: What if we enable rewrite failure instead of adding this kind of constraint? This is not supported by PDL now. So, the idea is, we match, we go the rewrite section and there we can fail and stop rewriting and we go to the next matching.

The semantics of a PDL pattern is: if we have a successful match of the lhs then the rhs should be able to completely perform the rewrite. This clear separation of matching and rewriting enables the creation of an efficient matcher for a set of patterns and avoiding error handling and aborting rewriting in the bytecode interpreter (which would introduce a runtime penalty). I think we should avoid mixing these concepts more than absolutely required as to not lose these properties.

@martin-luecke
Copy link
Author

One possible change that I would like your opinion on would be the registering of constraint functions with results.
With this proposal they have to be registered using PDLPatternModule::registerRewriteFunction. This seems very unnatural from the user perspective as they want to register a constraint, not a rewrite.
We could introduce a new PDLPatternModule::registerConstraintFunctionWithResults so the user does not have to think about how constraints with results are handled under the hood.

@ehsan-toosi
Copy link

I agree. That would be a better design IMO too. Separating them from the user's perspective reduces the confusion.

Copy link

@maxbartel maxbartel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really impressive! My only questions are about signaling a failure in a constraint with results.

mlir/lib/Rewrite/ByteCode.cpp Outdated Show resolved Hide resolved
mlir/include/mlir/IR/PatternMatch.h Outdated Show resolved Hide resolved
/// the low-level PDLValue form, and the results are manually appended to
/// the given result list.
///
/// * `ResultT (PatternRewriter &, ValueTs... values)`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we signal a failure with this interface? Don't we need something similar like FailureOr<ResultT>?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!
I think it makes sense to remove the template magic and under the hood conversion that is enabled by this. The problem is that not all valid native rewrite functions are also valid constraint with results functions. We would have to adjust how the conversion from PDL structures to built-in structure is done for this.
So for the scope of this PR we will only support constraints with results in form of
LogicalResult (PatternRewriter &, PDLResultList &, ArrayRef<PDLValue>)

Comment on lines +372 to +376
// At this point in time the corresponding pdl.ApplyNativeConstraint op has
// been deleted and the new pdl_interp.ApplyConstraint has not been created
// yet. To enable use of results created by these operations we build a
// placeholder value that will be replaced when the actual
// pdl_interp.ApplyConstraint operation is created.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know enough about PDLInterp, could you give me an example when this is relevant?

Copy link
Author

@martin-luecke martin-luecke Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my description on top:

This is required in all scenarios where a native constraint returns a result that is used on the rhs
because said result has to be an input the the pdl_interp rewriter function that is created for the pattern.
However, the call to this function is always generated before the pdl_interp.apply_constraint operation is created.

An example for the output of this pass is the following:
PDL input:

module {
  pdl.pattern @simple : benefit(1) {
    %0 = operands
    %1 = operation "test.op"(%0 : !pdl.range<value>) 
    %attr = apply_native_constraint "NativeConstraint"(%1 : !pdl.operation) : !pdl.attribute 
    rewrite %1 {
      %3 = operation "test.success"  {"someAttr" = %attr}
      replace %1 with %3
    }
  }
}

The PDL_interp output is the following:

module {
  pdl_interp.func @matcher(%arg0: !pdl.operation) {
    pdl_interp.check_operation_name of %arg0 is "test.op" -> ^bb2, ^bb1
  ^bb1:  // 4 preds: ^bb0, ^bb2, ^bb3, ^bb4
    pdl_interp.finalize
  ^bb2:  // pred: ^bb0
    pdl_interp.check_result_count of %arg0 is 0 -> ^bb3, ^bb1
  ^bb3:  // pred: ^bb2
    %0 = pdl_interp.apply_constraint "NativeConstraint"(%arg0 : !pdl.operation) : !pdl.attribute -> ^bb4, ^bb1
  ^bb4:  // pred: ^bb3
    pdl_interp.record_match @rewriters::@pdl_generated_rewriter(%0, %arg0 : !pdl.attribute, !pdl.operation) : benefit(1), generatedOps(["test.success"]), loc([%arg0]), root("test.op") -> ^bb1
  }
  module @rewriters {
    pdl_interp.func @pdl_generated_rewriter(%arg0: !pdl.attribute, %arg1: !pdl.operation) {
      %0 = pdl_interp.get_results of %arg1 : !pdl.range<value>
      %1 = pdl_interp.get_value_type of %0 : !pdl.range<type>
      %2 = pdl_interp.create_operation "test.success" {"someAttr" = %arg0}  -> (%1 : !pdl.range<type>)
      pdl_interp.erase %arg1
      pdl_interp.finalize
    }
  }
}

The pdl_interp.record_match operation and the block it is in is not the last thing this pass generates for a matcher. This is done so the operation that is the last check before a successful match can have this block (^bb4) as successor (e.g. %0 = pdl_interp.apply_constraint here). But the pdl_interp.record_match needs the result of %0 = pdl_interp.apply_constraint as operand.
So there is a cyclic dependency here, which we solve using the placeholder.

In short:

  • %0 = pdl_interp.apply_constraint needs the success branch as successor.
  • pdl_interp.record_match is created with the success branch and needs the result %0 as operand

@martin-luecke martin-luecke merged commit 7cc5626 into feature/fused-ops Apr 26, 2023
@martin-luecke martin-luecke deleted the origin/martin.FXML-1923.PDL_native_constraint_with_results branch April 26, 2023 11:18
mgehre-amd pushed a commit that referenced this pull request Jul 17, 2023
`__xray_customevent` and `__xray_typedevent` are built-in functions in Clang.
With -fxray-instrument, they are lowered to intrinsics llvm.xray.customevent and
llvm.xray.typedevent, respectively. These intrinsics are then lowered to
TargetOpcode::{PATCHABLE_EVENT_CALL,PATCHABLE_TYPED_EVENT_CALL}. The target is
responsible for generating a code sequence that calls either
`__xray_CustomEvent` (with 2 arguments) or `__xray_TypedEvent` (with 3
arguments).

Before patching, the code sequence is prefixed by a branch instruction that
skips the rest of the code sequence. After patching
(compiler-rt/lib/xray/xray_AArch64.cpp), the branch instruction becomes a NOP
and the function call will take effects.

This patch implements the lowering process for
{PATCHABLE_EVENT_CALL,PATCHABLE_TYPED_EVENT_CALL} and implements the runtime.

```
// Lowering of PATCHABLE_EVENT_CALL
.Lxray_sled_N:
  b  #24
  stp x0, x1, [sp, #-16]!
  x0 = reg of op0
  x1 = reg of op1
  bl __xray_CustomEvent
  ldrp x0, x1, [sp], #16
```

As a result, two updated tests in compiler-rt/test/xray/TestCases/Posix/ now
pass on AArch64.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D153320
mgehre-amd pushed a commit that referenced this pull request Feb 27, 2024
…lvm#80904)"

This reverts commit b1ac052.

This commit breaks coroutine splitting for non-swift calling convention
functions. In this example:

```ll
; ModuleID = 'repro.ll'
source_filename = "stdlib/test/runtime/test_llcl.mojo"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@0 = internal constant { i32, i32 } { i32 trunc (i64 sub (i64 ptrtoint (ptr @craSH to i64), i64 ptrtoint (ptr getelementptr inbounds ({ i32, i32 }, ptr @0, i32 0, i32 1) to i64)) to i32), i32 64 }

define dso_local void @af_suspend_fn(ptr %0, i64 %1, ptr %2) #0 {
  ret void
}

define dso_local void @craSH(ptr %0) #0 {
  %2 = call token @llvm.coro.id.async(i32 64, i32 8, i32 0, ptr @0)
  %3 = call ptr @llvm.coro.begin(token %2, ptr null)
  %4 = getelementptr inbounds { ptr, { ptr, ptr }, i64, { ptr, i1 }, i64, i64 }, ptr poison, i32 0, i32 0
  %5 = call ptr @llvm.coro.async.resume()
  store ptr %5, ptr %4, align 8
  %6 = call { ptr, ptr, ptr } (i32, ptr, ptr, ...) @llvm.coro.suspend.async.sl_p0p0p0s(i32 0, ptr %5, ptr @ctxt_proj_fn, ptr @af_suspend_fn, ptr poison, i64 -1, ptr poison)
  ret void
}

define dso_local ptr @ctxt_proj_fn(ptr %0) #0 {
  ret ptr %0
}

; Function Attrs: nomerge nounwind
declare { ptr, ptr, ptr } @llvm.coro.suspend.async.sl_p0p0p0s(i32, ptr, ptr, ...) #1

; Function Attrs: nounwind
declare token @llvm.coro.id.async(i32, i32, i32, ptr) #2

; Function Attrs: nounwind
declare ptr @llvm.coro.begin(token, ptr writeonly) #2

; Function Attrs: nomerge nounwind
declare ptr @llvm.coro.async.resume() #1

attributes #0 = { "target-features"="+adx,+aes,+avx,+avx2,+bmi,+bmi2,+clflushopt,+clwb,+clzero,+crc32,+cx16,+cx8,+f16c,+fma,+fsgsbase,+fxsr,+invpcid,+lzcnt,+mmx,+movbe,+mwaitx,+pclmul,+pku,+popcnt,+prfchw,+rdpid,+rdpru,+rdrnd,+rdseed,+sahf,+sha,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+sse4a,+ssse3,+vaes,+vpclmulqdq,+wbnoinvd,+x87,+xsave,+xsavec,+xsaveopt,+xsaves" }
attributes #1 = { nomerge nounwind }
attributes #2 = { nounwind }
```

This verifier crashes after the `coro-split` pass with

```
cannot guarantee tail call due to mismatched parameter counts
  musttail call void @af_suspend_fn(ptr poison, i64 -1, ptr poison)
LLVM ERROR: Broken function
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: opt ../../../reduced.ll -O0
 #0 0x00007f1d89645c0e __interceptor_backtrace.part.0 /build/gcc-11-XeT9lY/gcc-11-11.4.0/build/x86_64-linux-gnu/libsanitizer/asan/../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4193:28
 #1 0x0000556d94d254f7 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Support/Unix/Signals.inc:723:22
 #2 0x0000556d94d19a2f llvm::sys::RunSignalHandlers() /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Support/Signals.cpp:105:20
 #3 0x0000556d94d1aa42 SignalHandler(int) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Support/Unix/Signals.inc:371:36
 #4 0x00007f1d88e42520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #5 0x00007f1d88e969fc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #6 0x00007f1d88e969fc __pthread_kill_internal ./nptl/pthread_kill.c:78:10
 #7 0x00007f1d88e969fc pthread_kill ./nptl/pthread_kill.c:89:10
 #8 0x00007f1d88e42476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #9 0x00007f1d88e287f3 abort ./stdlib/abort.c:81:7
 #10 0x0000556d8944be01 std::vector<llvm::json::Value, std::allocator<llvm::json::Value>>::size() const /usr/include/c++/11/bits/stl_vector.h:919:40
 #11 0x0000556d8944be01 bool std::operator==<llvm::json::Value, std::allocator<llvm::json::Value>>(std::vector<llvm::json::Value, std::allocator<llvm::json::Value>> const&, std::vector<llvm::json::Value, std::allocator<llvm::json::Value>> const&) /usr/include/c++/11/bits/stl_vector.h:1893:23
 #12 0x0000556d8944be01 llvm::json::operator==(llvm::json::Array const&, llvm::json::Array const&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/Support/JSON.h:572:69
 #13 0x0000556d8944be01 llvm::json::operator==(llvm::json::Value const&, llvm::json::Value const&) (.cold) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Support/JSON.cpp:204:28
 #14 0x0000556d949ed2bd llvm::report_fatal_error(char const*, bool) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Support/ErrorHandling.cpp:82:70
 #15 0x0000556d8e37e876 llvm::SmallVectorBase<unsigned int>::size() const /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallVector.h:91:32
 #16 0x0000556d8e37e876 llvm::SmallVectorTemplateCommon<llvm::DiagnosticInfoOptimizationBase::Argument, void>::end() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallVector.h:282:41
 #17 0x0000556d8e37e876 llvm::SmallVector<llvm::DiagnosticInfoOptimizationBase::Argument, 4u>::~SmallVector() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallVector.h:1215:24
 #18 0x0000556d8e37e876 llvm::DiagnosticInfoOptimizationBase::~DiagnosticInfoOptimizationBase() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/DiagnosticInfo.h:413:7
 #19 0x0000556d8e37e876 llvm::DiagnosticInfoIROptimization::~DiagnosticInfoIROptimization() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/DiagnosticInfo.h:622:7
 #20 0x0000556d8e37e876 llvm::OptimizationRemark::~OptimizationRemark() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/DiagnosticInfo.h:689:7
 #21 0x0000556d8e37e876 operator() /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Transforms/Coroutines/CoroSplit.cpp:2213:14
 #22 0x0000556d8e37e876 emit<llvm::CoroSplitPass::run(llvm::LazyCallGraph::SCC&, llvm::CGSCCAnalysisManager&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&)::<lambda()> > /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/Analysis/OptimizationRemarkEmitter.h:83:12
 #23 0x0000556d8e37e876 llvm::CoroSplitPass::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Transforms/Coroutines/CoroSplit.cpp:2212:13
 #24 0x0000556d8c36ecb1 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::CoroSplitPass, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:91:3
 #25 0x0000556d91c1a84f llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:90:12
 #26 0x0000556d8c3690d1 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:91:3
 #27 0x0000556d91c2162d llvm::ModuleToPostOrderCGSCCPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:278:18
 #28 0x0000556d8c369035 llvm::detail::PassModel<llvm::Module, llvm::ModuleToPostOrderCGSCCPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:91:3
 #29 0x0000556d9457abc5 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/PassManager.h:247:20
 #30 0x0000556d8e30979e llvm::CoroConditionalWrapper::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Transforms/Coroutines/CoroConditionalWrapper.cpp:19:74
 #31 0x0000556d8c365755 llvm::detail::PassModel<llvm::Module, llvm::CoroConditionalWrapper, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:91:3
 #32 0x0000556d9457abc5 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/PassManager.h:247:20
 #33 0x0000556d89818556 llvm::SmallPtrSetImplBase::isSmall() const /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:196:33
 #34 0x0000556d89818556 llvm::SmallPtrSetImplBase::~SmallPtrSetImplBase() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:84:17
 #35 0x0000556d89818556 llvm::SmallPtrSetImpl<llvm::AnalysisKey*>::~SmallPtrSetImpl() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:321:7
 #36 0x0000556d89818556 llvm::SmallPtrSet<llvm::AnalysisKey*, 2u>::~SmallPtrSet() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:427:7
 #37 0x0000556d89818556 llvm::PreservedAnalyses::~PreservedAnalyses() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/Analysis.h:109:7
 #38 0x0000556d89818556 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) /home/ubuntu/modular/third-party/llvm-project/llvm/tools/opt/NewPMDriver.cpp:532:10
 #39 0x0000556d897e3939 optMain /home/ubuntu/modular/third-party/llvm-project/llvm/tools/opt/optdriver.cpp:737:27
 #40 0x0000556d89455461 main /home/ubuntu/modular/third-party/llvm-project/llvm/tools/opt/opt.cpp:25:33
 #41 0x00007f1d88e29d90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
 #42 0x00007f1d88e29e40 call_init ./csu/../csu/libc-start.c:128:20
 #43 0x00007f1d88e29e40 __libc_start_main ./csu/../csu/libc-start.c:379:5
 #44 0x0000556d897b6335 _start (/home/ubuntu/modular/.derived/third-party/llvm-project/build-relwithdebinfo-asan/bin/opt+0x150c335)
Aborted (core dumped)
mgehre-amd pushed a commit that referenced this pull request Mar 11, 2024
TestCases/Misc/Linux/sigaction.cpp fails because dlsym() may call malloc
on failure. And then the wrapped malloc appears to access thread local
storage using global dynamic accesses, thus calling
___interceptor___tls_get_addr, before REAL(__tls_get_addr) has
been set, so we get a crash inside ___interceptor___tls_get_addr. For
example, this can happen when looking up __isoc23_scanf which might not
exist in some libcs.

Fix this by marking the thread local variable accessed inside the
debug checks as "initial-exec", which does not require __tls_get_addr.

This is probably a better alternative to llvm#83886.

This fixes a different crash but is related to llvm#46204.

Backtrace:
```
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff6a9d89e in ___interceptor___tls_get_addr (arg=0x7ffff6b27be8) at /path/to/llvm/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:2759
#2 0x00007ffff6a46bc6 in __sanitizer::CheckedMutex::LockImpl (this=0x7ffff6b27be8, pc=140737331846066) at /path/to/llvm/compiler-rt/lib/sanitizer_common/sanitizer_mutex.cpp:218
#3 0x00007ffff6a448b2 in __sanitizer::CheckedMutex::Lock (this=0x7ffff6b27be8, this@entry=0x730000000580) at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_mutex.h:129
#4 __sanitizer::Mutex::Lock (this=0x7ffff6b27be8, this@entry=0x730000000580) at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_mutex.h:167
#5 0x00007ffff6abdbb2 in __sanitizer::GenericScopedLock<__sanitizer::Mutex>::GenericScopedLock (mu=0x730000000580, this=<optimized out>) at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_mutex.h:383
#6 __sanitizer::SizeClassAllocator64<__tsan::AP64>::GetFromAllocator (this=0x7ffff7487dc0 <__tsan::allocator_placeholder>, stat=stat@entry=0x7ffff570db68, class_id=11, chunks=chunks@entry=0x7ffff5702cc8, n_chunks=n_chunks@entry=128) at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_allocator_primary64.h:207
#7 0x00007ffff6abdaa0 in __sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__tsan::AP64> >::Refill (this=<optimized out>, c=c@entry=0x7ffff5702cb8, allocator=<optimized out>, class_id=<optimized out>)
 at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_allocator_local_cache.h:103
#8 0x00007ffff6abd731 in __sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__tsan::AP64> >::Allocate (this=0x7ffff6b27be8, allocator=0x7ffff5702cc8, class_id=140737311157448)
 at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_allocator_local_cache.h:39
#9 0x00007ffff6abc397 in __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator64<__tsan::AP64>, __sanitizer::LargeMmapAllocatorPtrArrayDynamic>::Allocate (this=0x7ffff5702cc8, cache=0x7ffff6b27be8, size=<optimized out>, size@entry=175, alignment=alignment@entry=16)
 at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_allocator_combined.h:69
#10 0x00007ffff6abaa6a in __tsan::user_alloc_internal (thr=0x7ffff7ebd980, pc=140737331499943, sz=sz@entry=175, align=align@entry=16, signal=true) at /path/to/llvm/compiler-rt/lib/tsan/rtl/tsan_mman.cpp:198
#11 0x00007ffff6abb0d1 in __tsan::user_alloc (thr=0x7ffff6b27be8, pc=140737331846066, sz=11, sz@entry=175) at /path/to/llvm/compiler-rt/lib/tsan/rtl/tsan_mman.cpp:223
#12 0x00007ffff6a693b5 in ___interceptor_malloc (size=175) at /path/to/llvm/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:666
#13 0x00007ffff7fce7f2 in malloc (size=175) at ../include/rtld-malloc.h:56
#14 __GI__dl_exception_create_format (exception=exception@entry=0x7fffffffd0d0, objname=0x7ffff7fc3550 "/path/to/llvm/compiler-rt/cmake-build-all-sanitizers/lib/linux/libclang_rt.tsan-x86_64.so",
 fmt=fmt@entry=0x7ffff7ff2db9 "undefined symbol: %s%s%s") at ./elf/dl-exception.c:157
#15 0x00007ffff7fd50e8 in _dl_lookup_symbol_x (undef_name=0x7ffff6af868b "__isoc23_scanf", undef_map=<optimized out>, ref=0x7fffffffd148, symbol_scope=<optimized out>, version=<optimized out>, type_class=0, flags=2, skip_map=0x7ffff7fc35e0) at ./elf/dl-lookup.c:793
--Type <RET> for more, q to quit, c to continue without paging--
#16 0x00007ffff656d6ed in do_sym (handle=<optimized out>, name=0x7ffff6af868b "__isoc23_scanf", who=0x7ffff6a3bb84 <__interception::InterceptFunction(char const*, unsigned long*, unsigned long, unsigned long)+36>, vers=vers@entry=0x0, flags=flags@entry=2) at ./elf/dl-sym.c:146
#17 0x00007ffff656d9dd in _dl_sym (handle=<optimized out>, name=<optimized out>, who=<optimized out>) at ./elf/dl-sym.c:195
#18 0x00007ffff64a2854 in dlsym_doit (a=a@entry=0x7fffffffd3b0) at ./dlfcn/dlsym.c:40
#19 0x00007ffff7fcc489 in __GI__dl_catch_exception (exception=exception@entry=0x7fffffffd310, operate=0x7ffff64a2840 <dlsym_doit>, args=0x7fffffffd3b0) at ./elf/dl-catch.c:237
#20 0x00007ffff7fcc5af in _dl_catch_error (objname=0x7fffffffd368, errstring=0x7fffffffd370, mallocedp=0x7fffffffd367, operate=<optimized out>, args=<optimized out>) at ./elf/dl-catch.c:256
#21 0x00007ffff64a2257 in _dlerror_run (operate=operate@entry=0x7ffff64a2840 <dlsym_doit>, args=args@entry=0x7fffffffd3b0) at ./dlfcn/dlerror.c:138
#22 0x00007ffff64a28e5 in dlsym_implementation (dl_caller=<optimized out>, name=<optimized out>, handle=<optimized out>) at ./dlfcn/dlsym.c:54
#23 ___dlsym (handle=<optimized out>, name=<optimized out>) at ./dlfcn/dlsym.c:68
#24 0x00007ffff6a3bb84 in __interception::GetFuncAddr (name=0x7ffff6af868b "__isoc23_scanf", trampoline=140737311157448) at /path/to/llvm/compiler-rt/lib/interception/interception_linux.cpp:42
#25 __interception::InterceptFunction (name=0x7ffff6af868b "__isoc23_scanf", ptr_to_real=0x7ffff74850e8 <__interception::real___isoc23_scanf>, func=11, trampoline=140737311157448)
 at /path/to/llvm/compiler-rt/lib/interception/interception_linux.cpp:61
#26 0x00007ffff6a9f2d9 in InitializeCommonInterceptors () at /path/to/llvm/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_common_interceptors.inc:10315
```

Reviewed By: vitalybuka, MaskRay

Pull Request: llvm#83890
cferry-AMD pushed a commit that referenced this pull request Jun 19, 2024
In a similar manner as in https://reviews.llvm.org/D133494
use `TBL` to place bytes in the *upper* part of `i32` elements
and then convert to float using fixed-point `scvtf`, i.e.

    scvtf Vd.4s, Vn.4s, #24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants