Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with 267de854 (May 22) (50) #309

Merged
merged 577 commits into from
Sep 4, 2024

Conversation

mgehre-amd
Copy link
Collaborator

No description provided.

davemgreen and others added 30 commits May 20, 2024 18:27
…y implicit class member access expressions (llvm#92318)

According to [expr.prim.id.general] p2:
> If an _id-expression_ `E` denotes a non-static non-type member of some
class `C` at a point where the current class is `X` and
> - `E` is potentially evaluated or `C` is `X` or a base class of `X`,
and
> - `E` is not the _id-expression_ of a class member access expression,
and
> - if `E` is a _qualified-id_, `E` is not the un-parenthesized operand
of the unary `&` operator,
>
> the _id-expression_ is transformed into a class member access
expression using `(*this)` as the object expression.

Consider the following:
```
struct A
{
    void f0();

    template<typename T>
    void f1();
};

template<typename T>
struct B : T
{
    auto g0() -> decltype(T::f0()); // ok

    auto g1() -> decltype(T::template f1<int>()); // error: call to non-static member function without an object argument
};

template struct B<A>;
```

Clang incorrectly rejects the call to `f1` in the _trailing-return-type_
of `g1`. Furthermore, the following snippet results in a crash during
codegen:
```
struct A
{
    void f();
};

template<typename T>
struct B : T
{
    template<typename U>
    static void g();
    
    template<>
    void g<int>()
    {
        return T::f(); // crash here
    }
};

template struct B<A>;
```
This happens because we unconditionally build a
`CXXDependentScopeMemberExpr` (with an implicit object expression) for
`T::f` when parsing the template definition, even though we don't know
whether `g` is an implicit object member function yet.

This patch fixes these issues by instead building
`DependentScopeDeclRefExpr`s for such expressions, and only transforming
them into implicit class member access expressions during instantiation.
Since we implemented the MS "unqualified lookup into dependent bases"
extension by building an implicit class member access (and relying on
the first component name of the _nested-name-specifier_ to be looked up
in the context of the object expression during instantiation), we
instead pre-append a fake _nested-name-specifier_ that refers to the
injected-class-name of the enclosing class. This patch also refactors
`Sema::BuildQualifiedDeclarationNameExpr` and
`Sema::BuildQualifiedTemplateIdExpr`, streamlining their implementation
and removing any redundant checks.
…m#92742)

Previously `report_fatal_error` is used for reporting something goes
wrong in the backend, but this is confusing because `report_fatal_error`
basically means there are something unexpected & crashed in the backend.

So, turn this "crash" into an elegant error reporting. After this patch,
clang can diagnose it:

    bpf-crash.c:4:30: error: Invalid usage of the XADD return value
4 | u32 next_event_id() { return __sync_fetch_and_add(&GLOBAL_EVENT_ID,
1); }
        |                              ^
    1 error generated.
I still don't see why we need to select to different Real instructions
on different targets, but at least this is less verbose.
This amends 702a2b6 to hopefully get
the test passing for Windows again.
Related to the poor performance of MCAssembler based constant folding
(see `bool MCExpr::evaluateAsAbsolute(int64_t &Res, const MCAssembler *Asm) const` and
`AttemptToFoldSymbolOffsetDifference`),
commit 9500a5d (llvm#91082) caused -O0 -g
compile time regression.

9500a5d special cased .eh_frame FDE
emitting. This patch adds a special case to .debug_* emitting as well to
mitigate the rest regression.

The MCAssembler based constant folding strategy should be improved to
remove the two special cases.
This allows use at other places, in particular an updated version of
llvm#92307.
…se class (llvm#92597)

Consider the following:
```
template<typename T>
struct A
{
    struct B : A { };
};
```
According to [class.derived.general] p2:
> [...] A _class-or-decltype_ shall denote a (possibly cv-qualified)
class type that is not an incompletely defined class; any cv-qualifiers
are ignored. [...]

Although GCC and EDG rejects this, Clang accepts it. This is incorrect,
as `A` is incomplete within its own definition (outside of a
complete-class context). This patch correctly diagnoses instances where
the current instantiation is used as a base class before it is complete.

Conversely, Clang erroneously rejects the following:
```
template<typename T>
struct A 
{
    struct B;

    struct C : B { };

    struct B : C { }; // error: circular inheritance between 'C' and 'A::B'
};
```
Though it may seem like no valid specialization of this template can be
instantiated, an explicit specialization of either member classes for an
implicit instantiated specialization of `A` would permit the definition
of the other member class to be instantiated, e.g.:
```
template<>
struct A<int>::B { };

A<int>::C c; // ok
```
So this patch also does away with this error. This means that circular
inheritance is diagnosed during instantiation of the definition as a
consequence of requiring the base class type to be complete (matching
the behavior of GCC and EDG).
…2739)

Removes two XFAILed tests, the other tests are marked UNSUPPORTED only
on windows.
…ction template explicit specializations after C++14 (llvm#92449)

Clang incorrectly accepts the following when using C++14 or later:
```
struct A {
  template<typename T>
  void f() const;

  template<>
  constexpr void f<int>();
};
```
Non-static member functions declared `constexpr` are only implicitly
`const` in C++11. This patch makes clang reject the explicit
specialization of `f` in language modes after C++11.
Doh! CMake cache scripts don't have generator variables set yet, so the
script can't depend on the generator variables. Instead I've added a
variable that a user can specify to enable the distribution settings.
It's really great that we have the same information duplicated in
TargetLibraryInfo and RuntimeLibcalls which both assume everything by
default.

Should fix issue reported after llvm#92287
Fixes error in GlobalISel CTLZ lowering caused by
[llvm#88512](llvm#88512).

---------

Co-authored-by: Leon Clark <leoclark@amd.com>
…ewritePattern (llvm#91987)

* Implements `TransferWritePermutationLowering`,
`TransferReadPermutationLowering` and
`TransferWriteNonPermutationLowering` as a MaskableOpRewritePattern.
Allowing to exit gracefully when such use of a xferOp is inside a
`vector::MaskOp`
* Updates MaskableOpRewritePattern to handle MemRefs and buffer
semantics providing empty `Value()` as a return value for
`matchAndRewriteMaskableOp` now represents successful rewriting without
value to replace the original op.

Split of llvm#90835
…92619)

The current definition is a bit fuzzy... replace it with something
that's somewhat rigorous.

For functions, the definition is pretty narrow; as a consequence of
language-level non-determinism, it's impossible to tell whether two
functions are equivalent, so just embrace the non-determinism. For
constants, we're pretty strict; otherwise you end up concluding
constants can actually change value, which is bad for alias analysis. I
think C++ standard don't allow any non-deterministic operations in
constants, so we should be okay there? Poison is per-byte to allow some
ambiguity in the way padding is defined.
Similar to llvm#92613, but for
types.

Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>
The ops supported are: `add`, `sub`, `xor`, `or`, `umax`, `uadd.sat`

Proofs: https://alive2.llvm.org/ce/z/8ZMSRg

The `add` case actually comes up in SPECInt, the rest are here mostly
for completeness.

Closes llvm#88579
…2738)

This avoids the following build time warning, when building with the
latest nightly Clang:

warning: cast from 'FARPROC' (aka 'int (*)() __attribute__((stdcall))')
to
'GetSystemTimeAsFileTimePtr' (aka 'void (*)(_FILETIME *)
__attribute__((stdcall))')
converts to incompatible function type [-Wcast-function-type-mismatch]

This warning seems to have appeared since Clang commit
999d4f8, which restructured.

The GetProcAddress function returns a `FARPROC` type, which is `int
(WINAPI *)()`. Directly casting this to another function pointer type
triggers this warning, but casting to a `void*` inbetween avoids this
issue. (On Unix-like platforms, dlsym returns a `void*`, which doesn't
exhibit this casting problem.)
…ps (llvm#90814)

Implement folding and rewrite logic to eliminate no-op tensor and memref
operations. This handles two specific cases:

1. tensor.insert_slice operations where the size of the inserted slice
is known to be 0.
2. memref.copy operations where either the source or target memrefs are
known to be emtpy.

Co-authored-by: Spenser Bauman <sabauma@fastmail>
In building AddrSpaceQualType
(llvm#90048), there is a bug in
removeAddrSpaceQualType() for arrays. Arrays are weird because
qualifiers on the element type also count as qualifiers on the type, so
getSingleStepDesugaredType() can't remove the sugar on arrays. This
results in an infinite loop in removeAddrSpaceQualType. To fix the
issue, we use ASTContext::getUnqualifiedArrayType instead, which strips
the qualifier off the element type, then reconstruct the array type.
…92796)

This would consistently fail for me locally, to the point where I could not run
ninja libc-unit-tests without ninja libc_setjmp_unittests failing.

Turns out that since I enabled -ftrivial-auto-var-init=pattern in
commit 1d5c16d ("[libc] default enable -ftrivial-auto-var-init=pattern (llvm#78776)")
this has been a problem. Our x86_64 setjmp definition disabled -Wuninitialized,
so we wound up clobbering these registers and instead backing up
0xAAAAAAAAAAAAAAAA rather than the actual register value.

The implemenation should be rewritten entirely. I've proposed three different
ways to do so (linked below). Until we decide which way to go, at least disable
this hardening feature for this function for now so that the unit tests go back
to green.

Link: llvm#87837
Link: llvm#88054
Link: llvm#88157
Fixes: llvm#91164
These are untested and unsupported platforms. The pattern used makes sense for
platform specific error numbers, but these are platforms we do not support.
Excise this code.

Link: llvm#91150
)

This patch changes uses of llvm::function_ref for std::function when
storing the callback inside of a class. The LLVM Programmer's manual
mentions that llvm::function_ref is not safe to store as it contains
pointers to external memory that are not guaranteed to exist in the
future when it is stored.

This causes issues when setting callbacks inside of a class that manages
MCA state. Passing a lambda directly to the set callback functions will
end up causing UB/segfaults when the lambda is called as some external
memory is now invalid. This is easy to work around (create a separate
std::function, pass that into the function setting the callback), but
isn't ideal.
Currently only linalg.copy is recognized when trying to specialize
linalg.generics back to named op. This diff enables recognition of more
generic to named op e.g. linalg.fill, elemwise unary/binary.
Use it for 2 places in LegalizeIntegerTypes that created a VP_AND.
CarlosAlbertoEnciso and others added 24 commits May 22, 2024 11:36
This reverts commit 89e1f77.

llvm#88270 (comment)
llvm#88270 (comment)

Main concerns from @nikic are the interaction between the
'IndVars' and 'LoopDeletion' passes, increasing build times
and adding extra complexity.
This change adds bindings for `mlirDenseElementsAttrGet` which accepts a
list of MLIR attributes and constructs a DenseElementsAttr. This allows
for creating `DenseElementsAttr`s of types not natively supported by
Python (e.g. BF16) without requiring other dependencies (e.g. `numpy` +
`ml-dtypes`).
…ands before folding to AVG

Pulled out of llvm#92096 - ensure we have completed a topological simplification of the SRA/SRL shift operands before we try to combine to a AVG node, as its difficult to later simplify through AVG nodes.
Look through SExt with a precondition that the operand is signed
positive.

https://alive2.llvm.org/ce/z/zvVVHj
…mandedOp` (llvm#92753)

In `TargetLowering::ShrinkDemandedOp`, types of lhs and rhs may differ
before legalization.
In the original case, `VT` is `i64` and `SmallVT` is `i32`, but the type
of rhs is `i8`. Then invalid truncate nodes will be created.

See the description of ISD::SHL for further information:
> After legalization, the type of the shift amount is known to be
TLI.getShiftAmountTy(). Before legalization, the shift amount can be any
type, but care must be taken to ensure it is large enough.


https://github.com/llvm/llvm-project/blob/605ae4e93be8976095c7eedf5c08bfdb9ff71257/llvm/include/llvm/CodeGen/ISDOpcodes.h#L691-L712

This patch stops handling ISD::SHL in `TargetLowering::ShrinkDemandedOp`
and duplicates the logic in `TargetLowering::SimplifyDemandedBits`.
Additionally, it adds some additional checks like
`isNarrowingProfitable` and `isTypeDesirableForOp` to improve the
codegen on AArch64.

Fixes llvm#92720.
Emit diagnostic messages for invalid modifiers in "reduction" clause.

Fixes llvm#92397
…aintenence (llvm#92976)

I had some trouble understanding why `removeReady` removed nodes from
the Pending queue, since my intuition told me that the Pending queue did
not represent a node that was ready. I took a deeper look and found that
pickOnlyNode and pickNodeFromQueue only picked nodes from the Available
queue too.

I found that need to nodes from the Available and Pending queues that
correspond to the opposite direction that we ended up choosing from
(IsTopNode vs !IsTopNode).

It took me a little longer than I would have liked to understand this
fact, so I figured that I would add a comment in the code that makes it
clear for future readers.
…vm#91459)

OpenMP loop transformation did not work on a for-loop using an iterator
or range-based for-loops. The first reason is that it combined the
iterator's type for generated loops with the type of `NumIterations` as
generated for any `OMPLoopBasedDirective` which is an integer. Fixed by
basing all generated loop variables on `NumIterations`.

Second, C++11 range-based for-loops include syntactic sugar that needs
to be executed before the loop. This additional code is now added to the
construct's Pre-Init lists.

Third, C++20 added an initializer statement to range-based for-loops
which is also added to the pre-init statement. PreInits used to be a
`DeclStmt` which made it difficult to add arbitrary statements from
`CXXRangeForStmt`'s syntactic sugar, especially the for-loops init
statement which does not need to be a declaration. Change it to be a
general `Stmt` that can be a `CompoundStmt` to hold arbitrary Stmts,
including DeclStmts. This also avoids the `PointerUnion` workaround used
by `checkTransformableLoopNest`.

End-to-end tests are added to verify the expected number and order of
loop execution and evaluations of expressions (such as iterator
dereference). The order and number of evaluations of expressions in
canonical loops is explicitly undefined by OpenMP but checked here for
clarification and for changes to be noticed.
This function will return nullptr instead of returning a constant
expression now, so be sure to handle that.

Fixes llvm#93017.
Redefines the amd_kernel_code_t struct with MCExprs for members that would be
derived from SIProgramInfo MCExpr members.
This commit eliminates a redundant matcher subexpression from the
implementation of the "sizeof-pointer-to-aggregate" part of the
clang-tidy check `bugprone-sizeof-expression`.

I'm fairly certain that anything that was previously matched by the
deleted matcher `StructAddrOfExpr` is also covered by the more general
`PointerToStructExpr` (which remains in the same `anyOf`).

This commit is made to "prepare the ground" for a followup change that
would merge the functionality of the Clang Static Analyzer checker
`alpha.core.SizeofPtr` into this clang-tidy check.
I believe these were forgotten when copying the clang in llvm#86816.

This was flagged because the CHECK lines for CHECK-LD-ANY* had no
associated RUN line. See
llvm#92387 (comment)
…m#92548)

This patch overrides the clearsSuperRegisters method defined in
MCInstrAnalysis to identify register writes that clear the upper portion
of all super-registers on AArch64 architecture.

On AArch64, a write to a general-purpose register of 32-bit data size is
defined to use the lower 32-bits of the register and zero extend the
upper 32-bits.
Similarly, SIMD and FP instructions operating on scalar data only access
the lower bits of the SIMD&FP register. The unused upper bits are
cleared to zero on a write.
This also applies to SIMD vector registers when the element size in bits
multiplied by the number of lanes is lower than 128. The upper 64 bits
of the vector register are cleared to zero on a write.
@mgehre-amd mgehre-amd requested a review from cferry-AMD August 26, 2024 09:08
Base automatically changed from bump_to_de483ad5 to feature/fused-ops September 4, 2024 05:02
An error occurred while trying to automatically change base from bump_to_de483ad5 to feature/fused-ops September 4, 2024 05:02
@mgehre-amd mgehre-amd merged commit efc7b5a into feature/fused-ops Sep 4, 2024
5 checks passed
@mgehre-amd mgehre-amd deleted the bump_to_267de854 branch September 4, 2024 05:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.