[AutoBump] Merge with 267de854 (May 22) (50) #309

mgehre-amd · 2024-08-24T10:35:26Z

No description provided.

…y implicit class member access expressions (llvm#92318) According to [expr.prim.id.general] p2: > If an _id-expression_ `E` denotes a non-static non-type member of some class `C` at a point where the current class is `X` and > - `E` is potentially evaluated or `C` is `X` or a base class of `X`, and > - `E` is not the _id-expression_ of a class member access expression, and > - if `E` is a _qualified-id_, `E` is not the un-parenthesized operand of the unary `&` operator, > > the _id-expression_ is transformed into a class member access expression using `(*this)` as the object expression. Consider the following: ``` struct A { void f0(); template<typename T> void f1(); }; template<typename T> struct B : T { auto g0() -> decltype(T::f0()); // ok auto g1() -> decltype(T::template f1<int>()); // error: call to non-static member function without an object argument }; template struct B<A>; ``` Clang incorrectly rejects the call to `f1` in the _trailing-return-type_ of `g1`. Furthermore, the following snippet results in a crash during codegen: ``` struct A { void f(); }; template<typename T> struct B : T { template<typename U> static void g(); template<> void g<int>() { return T::f(); // crash here } }; template struct B<A>; ``` This happens because we unconditionally build a `CXXDependentScopeMemberExpr` (with an implicit object expression) for `T::f` when parsing the template definition, even though we don't know whether `g` is an implicit object member function yet. This patch fixes these issues by instead building `DependentScopeDeclRefExpr`s for such expressions, and only transforming them into implicit class member access expressions during instantiation. Since we implemented the MS "unqualified lookup into dependent bases" extension by building an implicit class member access (and relying on the first component name of the _nested-name-specifier_ to be looked up in the context of the object expression during instantiation), we instead pre-append a fake _nested-name-specifier_ that refers to the injected-class-name of the enclosing class. This patch also refactors `Sema::BuildQualifiedDeclarationNameExpr` and `Sema::BuildQualifiedTemplateIdExpr`, streamlining their implementation and removing any redundant checks.

…m#92742) Previously `report_fatal_error` is used for reporting something goes wrong in the backend, but this is confusing because `report_fatal_error` basically means there are something unexpected & crashed in the backend. So, turn this "crash" into an elegant error reporting. After this patch, clang can diagnose it: bpf-crash.c:4:30: error: Invalid usage of the XADD return value 4 | u32 next_event_id() { return __sync_fetch_and_add(&GLOBAL_EVENT_ID, 1); } | ^ 1 error generated.

I still don't see why we need to select to different Real instructions on different targets, but at least this is less verbose.

This amends 702a2b6 to hopefully get the test passing for Windows again.

Related to the poor performance of MCAssembler based constant folding (see `bool MCExpr::evaluateAsAbsolute(int64_t &Res, const MCAssembler *Asm) const` and `AttemptToFoldSymbolOffsetDifference`), commit 9500a5d (llvm#91082) caused -O0 -g compile time regression. 9500a5d special cased .eh_frame FDE emitting. This patch adds a special case to .debug_* emitting as well to mitigate the rest regression. The MCAssembler based constant folding strategy should be improved to remove the two special cases.

This allows use at other places, in particular an updated version of llvm#92307.

…se class (llvm#92597) Consider the following: ``` template<typename T> struct A { struct B : A { }; }; ``` According to [class.derived.general] p2: > [...] A _class-or-decltype_ shall denote a (possibly cv-qualified) class type that is not an incompletely defined class; any cv-qualifiers are ignored. [...] Although GCC and EDG rejects this, Clang accepts it. This is incorrect, as `A` is incomplete within its own definition (outside of a complete-class context). This patch correctly diagnoses instances where the current instantiation is used as a base class before it is complete. Conversely, Clang erroneously rejects the following: ``` template<typename T> struct A { struct B; struct C : B { }; struct B : C { }; // error: circular inheritance between 'C' and 'A::B' }; ``` Though it may seem like no valid specialization of this template can be instantiated, an explicit specialization of either member classes for an implicit instantiated specialization of `A` would permit the definition of the other member class to be instantiated, e.g.: ``` template<> struct A<int>::B { }; A<int>::C c; // ok ``` So this patch also does away with this error. This means that circular inheritance is diagnosed during instantiation of the definition as a consequence of requiring the base class type to be complete (matching the behavior of GCC and EDG).

…2739) Removes two XFAILed tests, the other tests are marked UNSUPPORTED only on windows.

…ction template explicit specializations after C++14 (llvm#92449) Clang incorrectly accepts the following when using C++14 or later: ``` struct A { template<typename T> void f() const; template<> constexpr void f<int>(); }; ``` Non-static member functions declared `constexpr` are only implicitly `const` in C++11. This patch makes clang reject the explicit specialization of `f` in language modes after C++11.

Doh! CMake cache scripts don't have generator variables set yet, so the script can't depend on the generator variables. Instead I've added a variable that a user can specify to enable the distribution settings.

It's really great that we have the same information duplicated in TargetLibraryInfo and RuntimeLibcalls which both assume everything by default. Should fix issue reported after llvm#92287

Fixes error in GlobalISel CTLZ lowering caused by [llvm#88512](llvm#88512). --------- Co-authored-by: Leon Clark <leoclark@amd.com>

…ewritePattern (llvm#91987) * Implements `TransferWritePermutationLowering`, `TransferReadPermutationLowering` and `TransferWriteNonPermutationLowering` as a MaskableOpRewritePattern. Allowing to exit gracefully when such use of a xferOp is inside a `vector::MaskOp` * Updates MaskableOpRewritePattern to handle MemRefs and buffer semantics providing empty `Value()` as a return value for `matchAndRewriteMaskableOp` now represents successful rewriting without value to replace the original op. Split of llvm#90835

…92619) The current definition is a bit fuzzy... replace it with something that's somewhat rigorous. For functions, the definition is pretty narrow; as a consequence of language-level non-determinism, it's impossible to tell whether two functions are equivalent, so just embrace the non-determinism. For constants, we're pretty strict; otherwise you end up concluding constants can actually change value, which is bad for alias analysis. I think C++ standard don't allow any non-deterministic operations in constants, so we should be okay there? Poison is per-byte to allow some ambiguity in the way padding is defined.

Similar to llvm#92613, but for types. Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>

…92595) We need to insert a constrained canonicalize. Depends llvm#92594

The ops supported are: `add`, `sub`, `xor`, `or`, `umax`, `uadd.sat` Proofs: https://alive2.llvm.org/ce/z/8ZMSRg The `add` case actually comes up in SPECInt, the rest are here mostly for completeness. Closes llvm#88579

…2738) This avoids the following build time warning, when building with the latest nightly Clang: warning: cast from 'FARPROC' (aka 'int (*)() __attribute__((stdcall))') to 'GetSystemTimeAsFileTimePtr' (aka 'void (*)(_FILETIME *) __attribute__((stdcall))') converts to incompatible function type [-Wcast-function-type-mismatch] This warning seems to have appeared since Clang commit 999d4f8, which restructured. The GetProcAddress function returns a `FARPROC` type, which is `int (WINAPI *)()`. Directly casting this to another function pointer type triggers this warning, but casting to a `void*` inbetween avoids this issue. (On Unix-like platforms, dlsym returns a `void*`, which doesn't exhibit this casting problem.)

…ps (llvm#90814) Implement folding and rewrite logic to eliminate no-op tensor and memref operations. This handles two specific cases: 1. tensor.insert_slice operations where the size of the inserted slice is known to be 0. 2. memref.copy operations where either the source or target memrefs are known to be emtpy. Co-authored-by: Spenser Bauman <sabauma@fastmail>

In building AddrSpaceQualType (llvm#90048), there is a bug in removeAddrSpaceQualType() for arrays. Arrays are weird because qualifiers on the element type also count as qualifiers on the type, so getSingleStepDesugaredType() can't remove the sugar on arrays. This results in an infinite loop in removeAddrSpaceQualType. To fix the issue, we use ASTContext::getUnqualifiedArrayType instead, which strips the qualifier off the element type, then reconstruct the array type.

…92796) This would consistently fail for me locally, to the point where I could not run ninja libc-unit-tests without ninja libc_setjmp_unittests failing. Turns out that since I enabled -ftrivial-auto-var-init=pattern in commit 1d5c16d ("[libc] default enable -ftrivial-auto-var-init=pattern (llvm#78776)") this has been a problem. Our x86_64 setjmp definition disabled -Wuninitialized, so we wound up clobbering these registers and instead backing up 0xAAAAAAAAAAAAAAAA rather than the actual register value. The implemenation should be rewritten entirely. I've proposed three different ways to do so (linked below). Until we decide which way to go, at least disable this hardening feature for this function for now so that the unit tests go back to green. Link: llvm#87837 Link: llvm#88054 Link: llvm#88157 Fixes: llvm#91164

These are untested and unsupported platforms. The pattern used makes sense for platform specific error numbers, but these are platforms we do not support. Excise this code. Link: llvm#91150

) This patch changes uses of llvm::function_ref for std::function when storing the callback inside of a class. The LLVM Programmer's manual mentions that llvm::function_ref is not safe to store as it contains pointers to external memory that are not guaranteed to exist in the future when it is stored. This causes issues when setting callbacks inside of a class that manages MCA state. Passing a lambda directly to the set callback functions will end up causing UB/segfaults when the lambda is called as some external memory is now invalid. This is easy to work around (create a separate std::function, pass that into the function setting the callback), but isn't ideal.

Currently only linalg.copy is recognized when trying to specialize linalg.generics back to named op. This diff enables recognition of more generic to named op e.g. linalg.fill, elemwise unary/binary.

Use it for 2 places in LegalizeIntegerTypes that created a VP_AND.

…NFC (llvm#92816)

@nikic

This reverts commit 89e1f77. llvm#88270 (comment) llvm#88270 (comment) Main concerns from @nikic are the interaction between the 'IndVars' and 'LoopDeletion' passes, increasing build times and adding extra complexity.

…ysis tests.

This change adds bindings for `mlirDenseElementsAttrGet` which accepts a list of MLIR attributes and constructs a DenseElementsAttr. This allows for creating `DenseElementsAttr`s of types not natively supported by Python (e.g. BF16) without requiring other dependencies (e.g. `numpy` + `ml-dtypes`).

... back into range of the array.

…ands before folding to AVG Pulled out of llvm#92096 - ensure we have completed a topological simplification of the SRA/SRL shift operands before we try to combine to a AVG node, as its difficult to later simplify through AVG nodes.

Look through SExt with a precondition that the operand is signed positive. https://alive2.llvm.org/ce/z/zvVVHj

…mandedOp` (llvm#92753) In `TargetLowering::ShrinkDemandedOp`, types of lhs and rhs may differ before legalization. In the original case, `VT` is `i64` and `SmallVT` is `i32`, but the type of rhs is `i8`. Then invalid truncate nodes will be created. See the description of ISD::SHL for further information: > After legalization, the type of the shift amount is known to be TLI.getShiftAmountTy(). Before legalization, the shift amount can be any type, but care must be taken to ensure it is large enough. https://github.com/llvm/llvm-project/blob/605ae4e93be8976095c7eedf5c08bfdb9ff71257/llvm/include/llvm/CodeGen/ISDOpcodes.h#L691-L712 This patch stops handling ISD::SHL in `TargetLowering::ShrinkDemandedOp` and duplicates the logic in `TargetLowering::SimplifyDemandedBits`. Additionally, it adds some additional checks like `isNarrowingProfitable` and `isTypeDesirableForOp` to improve the codegen on AArch64. Fixes llvm#92720.

Emit diagnostic messages for invalid modifiers in "reduction" clause. Fixes llvm#92397

…aintenence (llvm#92976) I had some trouble understanding why `removeReady` removed nodes from the Pending queue, since my intuition told me that the Pending queue did not represent a node that was ready. I took a deeper look and found that pickOnlyNode and pickNodeFromQueue only picked nodes from the Available queue too. I found that need to nodes from the Available and Pending queues that correspond to the opposite direction that we ended up choosing from (IsTopNode vs !IsTopNode). It took me a little longer than I would have liked to understand this fact, so I figured that I would add a comment in the code that makes it clear for future readers.

…vm#91459) OpenMP loop transformation did not work on a for-loop using an iterator or range-based for-loops. The first reason is that it combined the iterator's type for generated loops with the type of `NumIterations` as generated for any `OMPLoopBasedDirective` which is an integer. Fixed by basing all generated loop variables on `NumIterations`. Second, C++11 range-based for-loops include syntactic sugar that needs to be executed before the loop. This additional code is now added to the construct's Pre-Init lists. Third, C++20 added an initializer statement to range-based for-loops which is also added to the pre-init statement. PreInits used to be a `DeclStmt` which made it difficult to add arbitrary statements from `CXXRangeForStmt`'s syntactic sugar, especially the for-loops init statement which does not need to be a declaration. Change it to be a general `Stmt` that can be a `CompoundStmt` to hold arbitrary Stmts, including DeclStmts. This also avoids the `PointerUnion` workaround used by `checkTransformableLoopNest`. End-to-end tests are added to verify the expected number and order of loop execution and evaluations of expressions (such as iterator dereference). The order and number of evaluations of expressions in canonical loops is explicitly undefined by OpenMP but checked here for clarification and for changes to be noticed.

This function will return nullptr instead of returning a constant expression now, so be sure to handle that. Fixes llvm#93017.

…. NFC.

Redefines the amd_kernel_code_t struct with MCExprs for members that would be derived from SIProgramInfo MCExpr members.

This commit eliminates a redundant matcher subexpression from the implementation of the "sizeof-pointer-to-aggregate" part of the clang-tidy check `bugprone-sizeof-expression`. I'm fairly certain that anything that was previously matched by the deleted matcher `StructAddrOfExpr` is also covered by the more general `PointerToStructExpr` (which remains in the same `anyOf`). This commit is made to "prepare the ground" for a followup change that would merge the functionality of the Clang Static Analyzer checker `alpha.core.SizeofPtr` into this clang-tidy check.

I believe these were forgotten when copying the clang in llvm#86816. This was flagged because the CHECK lines for CHECK-LD-ANY* had no associated RUN line. See llvm#92387 (comment)

Pulled from llvm#92096

This resolves an older FIXME comment.

…m#92548) This patch overrides the clearsSuperRegisters method defined in MCInstrAnalysis to identify register writes that clear the upper portion of all super-registers on AArch64 architecture. On AArch64, a write to a general-purpose register of 32-bit data size is defined to use the lower 32-bits of the register and zero extend the upper 32-bits. Similarly, SIMD and FP instructions operating on scalar data only access the lower bits of the SIMD&FP register. The unused upper bits are cleared to zero on a write. This also applies to SIMD vector registers when the element size in bits multiplied by the number of lanes is lower than 128. The upper 64 bits of the vector register are cleared to zero on a write.

davemgreen and others added 30 commits May 20, 2024 18:27

[VectorCombine] Some more tests for different cmp's and fp consts. NFC

285f139

[mlir] Remove redundant include in Passes.h header (NFC)

a0e3e76

[gn build] Port 4f5bc4b

e246105

[AMDGPU] Refactor int_amdgcn_mov_dpp8 patterns. NFC. (llvm#92764)

549fdda

I still don't see why we need to select to different Real instructions on different targets, but at least this is less verbose.

Fix test for non-Itanium ABIs.

3591da9

This amends 702a2b6 to hopefully get the test passing for Windows again.

[LAA] Move logic to compute start and end of a pointer to helper (NFC).

bce3680

This allows use at other places, in particular an updated version of llvm#92307.

[Flang][OpenMP] Disable all OpenMP semantics tests on Windows (llvm#9…

a91d5c0

…2739) Removes two XFAILed tests, the other tests are marked UNSUPPORTED only on windows.

[HLSL][CMake] Cache files don't have generator vars (llvm#92793)

6430939

Doh! CMake cache scripts don't have generator variables set yet, so the script can't depend on the generator variables. Instead I've added a variable that a user can specify to enable the distribution settings.

CodeGen: Fix libcall names for exp10 on the various darwins (llvm#92520)

1eb7f05

It's really great that we have the same information duplicated in TargetLibraryInfo and RuntimeLibcalls which both assume everything by default. Should fix issue reported after llvm#92287

[AMDGPU] Fix error in llvm#88512. (llvm#92770)

e1c06c3

Fixes error in GlobalISel CTLZ lowering caused by [llvm#88512](llvm#88512). --------- Co-authored-by: Leon Clark <leoclark@amd.com>

[mlir][polynomial] split polynomial types tablegen (llvm#92805)

0da1a6c

Similar to llvm#92613, but for types. Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>

AMDGPU: Don't fold rootn(x, 1) to input for strictfp functions (llvm#…

3cb1fe6

…92595) We need to insert a constrained canonicalize. Depends llvm#92594

[ValueTracking] Add tests for isKnowNonZero of X op (X != 0); NFC

2a45f89

[ValueTracking] Recognize X op (X != 0) as non-zero

2232843

The ops supported are: `add`, `sub`, `xor`, `or`, `umax`, `uadd.sat` Proofs: https://alive2.llvm.org/ce/z/8ZMSRg The `add` case actually comes up in SPECInt, the rest are here mostly for completeness. Closes llvm#88579

[libc][errno] remove mips+sparc specific errnos (llvm#92798)

dce197a

These are untested and unsupported platforms. The pattern used makes sense for platform specific error numbers, but these are platforms we do not support. Excise this code. Link: llvm#91150

[MLIR][Linalg] Add more specialize patterns (llvm#91153)

33b7833

Currently only linalg.copy is recognized when trying to specialize linalg.generics back to named op. This diff enables recognition of more generic to named op e.g. linalg.fill, elemwise unary/binary.

[SelectionDAG] Add getVPZeroExtendInReg. NFC (llvm#92792)

110f6a7

Use it for 2 places in LegalizeIntegerTypes that created a VP_AND.

[LegalizeTypes] Use SelectionDAG::SplitVector to simplify some code. …

8018e4c

…NFC (llvm#92816)

CarlosAlbertoEnciso and others added 24 commits May 22, 2024 11:36

[NFC][LLVM] Autogenerate check lines for some Analysis/LoopAccessAnal…

5bd210a

…ysis tests.

[NFC][LLVM] Fix typos in llvm/test/MC/AArch64/SVE

9051fc7

[NFC] Fix typo in llvm/test/Transforms/Util/add-TLI-mappings.ll

25c021a

[clang][Interp] Allow stepping back from a one-past-the-end pointer

9604e5c

... back into range of the array.

[gn build] Port 11b97da

8619054

[clang][Interp] Fix checking unions for initialization

f685481

[ConstraintElim] Look through SExt with precond Op sge 0.

ba0e871

Look through SExt with a precondition that the operand is signed positive. https://alive2.llvm.org/ce/z/zvVVHj

[flang][OpenMP] Diagnose invalid reduction modifiers (llvm#92406)

2aa218c

Emit diagnostic messages for invalid modifiers in "reduction" clause. Fixes llvm#92397

[InstCombine] Handle ConstantFoldCompareInstOperands() failure

0748a98

This function will return nullptr instead of returning a constant expression now, so be sure to handle that. Fixes llvm#93017.

[X86] combineBitcast - merge isa<>/cast<> into single dyn_cast<> call…

cdcd653

…. NFC.

MCExpr-ify amd_kernel_code_t (llvm#91587)

a699ccb

Redefines the amd_kernel_code_t struct with MCExprs for members that would be derived from SIProgramInfo MCExpr members.

[flang][Driver][test] add missing run lines to fopenmp test (llvm#92784)

b99b6b7

I believe these were forgotten when copying the clang in llvm#86816. This was flagged because the CHECK lines for CHECK-LD-ANY* had no associated RUN line. See llvm#92387 (comment)

[DAG] ComputeNumSignBits - add AVGCEILS/AVGFLOORS handling (llvm#93021)

f78febf

Pulled from llvm#92096

[clang][Interp][NFC] Retrieve active union field in Pointer::toRValue()

e3bd627

[clang][Interp][NFC] Propagate IsActive state in unions properly

7d9634e

This resolves an older FIXME comment.

[AutoBump] Merge with 267de85 (May 22)

1815baa

mgehre-amd requested a review from cferry-AMD August 26, 2024 09:08

cferry-AMD approved these changes Aug 26, 2024

View reviewed changes

Base automatically changed from bump_to_de483ad5 to feature/fused-ops September 4, 2024 05:02

An error occurred while trying to automatically change base from bump_to_de483ad5 to feature/fused-ops September 4, 2024 05:02

mgehre-amd merged commit efc7b5a into feature/fused-ops Sep 4, 2024
5 checks passed

mgehre-amd deleted the bump_to_267de854 branch September 4, 2024 05:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with 267de854 (May 22) (50) #309

[AutoBump] Merge with 267de854 (May 22) (50) #309

mgehre-amd commented Aug 24, 2024

[AutoBump] Merge with 267de854 (May 22) (50) #309

[AutoBump] Merge with 267de854 (May 22) (50) #309

Conversation

mgehre-amd commented Aug 24, 2024