Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with d99bb014 (3) #250

Merged
merged 69 commits into from
Aug 14, 2024
Merged

Conversation

mgehre-amd
Copy link
Collaborator

No description provided.

AMP999 and others added 30 commits March 11, 2024 04:23
…lvm#77092)

Consistent with `__is_trivially_copyable(volatile int) == true` and
`__is_trivially_relocatable(volatile Trivial) == true`,
`__is_trivially_relocatable(volatile int)` should also be `true`.

Fixes llvm#77091

[clang] [test] New tests for __is_trivially_relocatable(cv-qualified
type)
…g constraint expression

Previously we disabled to compute ODR hash for declarations from the
global module fragment. However, we missed the case that the functions
lives in the concept requiments (see the attached the test files for
example). And the mismatch causes the potential crashment.

Due to we will set the function body as lazy after we deserialize it and
we will only take its body when needed. However, we don't allow to take
the body during deserializing. So it is actually potentially problematic
if we set the body as lazy first and computing the hash value of the
function, which requires to deserialize its body. So we will meet a
crash here.

This patch tries to solve the issue by not taking the body of the
function from GMF. Note that we can't skip comparing the constraint
expression from the GMF directly since it is an key part of the
function selecting and it may be the reason why we can't return 0
directly for `FunctionDecl::getODRHash()` from the GMF.
Previously, we tried to create an integer extending load. We need to a
non-extending FP load instead.

Fixes llvm#84541.
On converting an instruction to an early-clobber definition in
convertToThreeAddress, we must also update live intervals for the
register to start at the early-clobber index.
…vm#82214)

Small mergeable read only data was place on the sdata before, but it
also means it lose the mergeable property, which means lose some code
size optimization opportunity during link time.
…e. NFC

This removes lots of unneeded `template getFile<ELFT>()`.
…#78564)

Extend repairIntervalsInRange to completely recompute the interva for a
register if subregister defs exist without precise subrange matches
(LaneMask exactly matching subregister).
This occurs when register sequences are lowered to copies such that the
size of the copies do not match any uses of the subregisters formed
(i.e. during twoaddressinstruction).

The subranges without this change are probably legal, but do not match
those generated by live interval computation. This creates problems with
other code that assumes subranges precisely cover all subregisters
defined, e.g. shrinkToUses().
If we're in a blocking call, we need to run the signal immediately, as
the call may not return for a very long time (if ever). Not running the
handler can cause deadlocks if the rest of the program waits (in one way
or another) for the signal handler to execute.

I've gone through the list of functions in
sanitizer_common_interceptors and marked as blocking those that I know
can block, but I don't claim the list to be exhaustive. In particular, I
did not mark libc FILE* functions as blocking, because these can end up
calling user functions. To do that correctly, /I think/ it would be
necessary to clear the "is in blocking call" flag inside the fopencookie
wrappers.

The test for the bug (deadlock) uses the read call (which is the one
that I ran into originally), but the same kind of test could be written
for any other blocking syscall.
…ble (llvm#84133)

This avoids a known libFormat bug where the heuristic can OOM on certain
large files (particularly single-header libraries such as miniaudio.h).

The OOM will still happen on affected files if you actually try to
format them (this is harder to avoid since the underlyting issue affects
the actual formatting logic, not just the language-guessing heuristic),
but at least it's avoided during non-modifying operations like hover,
and modifying operations that do local formatting like code completion.

Fixes clangd/clangd#719
Fixes clangd/clangd#1384
Fixes llvm#70945
If all variables in the module are absolute, this means we're running
the pass again on an already lowered module, and that works.
If none of them are absolute, lowering can proceed as usual.
Only diagnose cases where we have a mix of absolute/non-absolute GVs,
which means we added LDS GVs after lowering, which is broken.

See llvm#81491
Split from llvm#75333
llvm#82629 added additional overloads to `replaceAllUsesWith` and
`replaceUsesWithIf`. This caused a build breakage with MSVC when called
with ops that can implicitly convert to `Value`.

```
external/llvm-project/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp(881): error C2666: 'mlir::RewriterBase::replaceAllUsesWith': 2 overloads have similar conversions
external/llvm-project/mlir/include\mlir/IR/PatternMatch.h(631): note: could be 'void mlir::RewriterBase::replaceAllUsesWith(mlir::Operation *,mlir::ValueRange)'
external/llvm-project/mlir/include\mlir/IR/PatternMatch.h(626): note: or       'void mlir::RewriterBase::replaceAllUsesWith(mlir::ValueRange,mlir::ValueRange)'
external/llvm-project/mlir/include\mlir/IR/PatternMatch.h(616): note: or       'void mlir::RewriterBase::replaceAllUsesWith(mlir::Value,mlir::Value)'
external/llvm-project/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp(882): note: while trying to match the argument list '(mlir::tensor::ExtractSliceOp, T)'
        with
        [
            T=mlir::Value
        ]
```

Note: The LLVM build bots (Linux and Windows) did not break, this seems
to be an issue with `Tools\MSVC\14.29.30133\bin\HostX64\x64\cl.exe`.

This change renames the newly added overloads to `replaceAllOpUsesWith`
and `replaceOpUsesWithIf`.
A potentially erroneous code construction with the work we've done to
remove debug intrinsics, is inserting PHIs into blocks when the position
hasn't been "sourced correctly". Specifically, if you have:

    %foo = PHI
    #dbg_value
    %bar = add i32...

And plan on inserting a new PHI, you have to use the iterator form of
`getFirstNonPHI` or getFirstInsertionPt (or begin()) to acquire an
iterator that tells the debug-info maintenance code "this is supposed to
be at the start of the block, put it in front of #dbg_value". We can
detect call-sites that aren't doing this at runtime, and should do with
this assertion. It might invalidate code that's doing something very
unexpected, like walking backwards to find a PHI, then going forwards,
then inserting: however that's just an inefficient way of calling
`getFirstNonPHI`.
…mise_type (llvm#84193)"

This reverts commit 35d3b33.

See the comments in llvm#84193 for
details
Modifies the privatization logic so that the emitted code only used the
HLFIR base (i.e. SSA value `#0` returned from `hlfir.declare`). Before
that, that emitted privatization logic was a mix of using `#0` and `#1`
which leads to some difficulties trying to move to delayed privatization
(see the discussion on llvm#84033).
This commit provides better cost estimates for
the llvm.vector.reduce.add intrinsic on SystemZ. These apply to all
vector lengths and integer types up to i128. For integer types larger
than i128, we fall back to the default cost estimate.

This has the effect of lowering the estimated costs of most common
instances of the intrinsic. The expected performance impact of this is
minimal with a tendency to slightly improve performance of some
benchmarks.

This commit also provides a test to check the proper computation of the
new estimates, as well as the fallback for types larger than i128.
…pes. NFCI (llvm#84125)

I noticed this from a discrepancy in fillUpExtensionSupport between how
we apparently need to check for legal types for ISD::{ZERO,SIGN}_EXTEND,
but we don't need to for RISCVISD::V{Z,S}EXT_VL.

Prior to llvm#72340, combineBinOp_VLToVWBinOp_VL only ran after type
legalization because it only operated on _VL nodes. _VL nodes are only
emitted during op legalization, which takes place **after** type
legalization, which is presumably why the existing code didn't need to
check for legal types.

After llvm#72340 we now handle generic ops like ISD::ADD that exist before
op legalization and thus **before** type legalization. This meant that
we needed to add extra checks that the narrow type was legal in llvm#76785.

I think the easiest thing to do here is to just maintain the invariant
that the types are legal and only run the combine after type
legalization.
Move the narrow types assert from the ZERO_EXTEND/SIGN_EXTEND case in
fillUpExtensionSupport to getOrCreateExtendedOp so we check the other nodes
too.
This allows relying on VPBasicBlock::insert to make sure insertion is
well formed, i.e. by updating the recipe's parent as well as other
potential invariants in the future.
…lvm#83126)

RuntimeInterfaceBuilder wires up JITed expressions with the hardcoded
Interpreter runtime. It's used only for value printing right now, but it
is not limited to that. The default implementation focuses on an
evaluation process where the Interpreter has direct access to the memory
of JITed expressions (in-process execution or shared memory).

We need a different approach to support out-of-process evaluation or
variations of the runtime. It seems reasonable to expose a minimal
interface for it. The new RuntimeInterfaceBuilder is an abstract base
class in the public header. For that, the TypeVisitor had to become a
component (instead of inheriting from it). FindRuntimeInterface() was
adjusted to return an instance of the RuntimeInterfaceBuilder and it can
be overridden from derived classes.
Select POSIX 2008 standard to avoid including Darwin extensions.
Otherwise, Darwin's math.h header defines HUGE, which conflicts
with Flang's HUGE function.

This started happening after 4762c65 (llvm#82443), that added the
"utility" include, which seems to include "math.h".
The RelocationEntry's fields are poorly ordered when considering
padding. This reordering reduces the size from 56 bytes to 40 bytes (on
LP64).
This fixes tests that are going to be upstreamed in the near future.
Currently they are failing downstream in the Apple open source fork.

Failing tests
  Clang :: APINotes/retain-count-convention.m
  Clang :: APINotes/types.m
  Clang :: APINotes/versioned-multi.c
  Clang :: APINotes/versioned.m

Since 2e5af56 got merged, Clang now enables `LangOpts.APINotesModules`
when reading a precompiled module that was built with API Notes enabled.
This is correct. The logic in APINotesManager needs to be adjusted to
handle this.

rdar://123526142
ldionne and others added 25 commits March 11, 2024 09:51
That test is using std::toupper.
…lvm#84664)

Summary:
Currently we have a conditional that turns the full build on by default
if it is a default target. This used to work fine when the GPU was the
only target that was ever present. However, we've recently changed to
allow building multiple of these at the same time. That means we should
have the ability to build overlay mode in the CPU mode and full build in
the GPU mode. This patch makes some simple adjustments to pass the
arguments per-triple. This slightly extends the existing `-DRUNTIMES_`
argument support to also transform any extra CMake inputs rather than
just the passed CMake variables.
…#84667)

Summary:
The libc build has a few utilties that need to be built before we can do
everything in the full build. The one requirement currently is the
`libc-hdrgen` binary. If we are doing a full build runtimes mode we
first add `libc` to the projects list and then only use the `projects`
portion to buld the `libc` portion. We also use utilities for the GPU
build, namely the loader utilities. Previously we would build these
tools on-demand inside of the cross-build, which tool some hacky
workarounds for the dependency finding and target triple. This patch
instead just builds them similarly to libc-hdrgen and then passses them
in. We now either pass it manually it it was built, or just look it up
like we do with the other `clang` tools.

Depends on llvm#84664
Re-land 634b024.

T1 allow for an optional registers list,
the register list must be {d0-d15}.
T2 define a mandatory register list,
the register list must be {d0-d31}.

The requirements for T1/T2 are as follows:
                T1              T2
Require:        v8-M.Main,      v8.1-M.Main,
                secure state    secure state
16 D Regs       valid           valid
32 D Regs       UNDEFINED       valid
No D Regs       NOP             NOP
Someone pointed out a typo (Value* RsrcRes = RsrcRes = ...) in PR the
address space 7 lowering, this commit fixes it.
…at-pointers

Fixes failing tests after llvm#84308

LLVM :: CodeGen/AMDGPU/GlobalISel/irtranslator-non-integral-address-spaces-vectors.ll
LLVM :: CodeGen/AMDGPU/GlobalISel/irtranslator-non-integral-address-spaces.ll
LLVM :: CodeGen/AMDGPU/lower-buffer-fat-pointers-calls.ll
LLVM :: CodeGen/AMDGPU/lower-buffer-fat-pointers-constants.ll
LLVM :: CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll
LLVM :: CodeGen/AMDGPU/pal-metadata-3.0.ll

Buildbots: https://lab.llvm.org/buildbot/#/builders/121/builds/39855
Previously, we always used the wave64 encodings for EH registers
regardless of whether we were compiling for wave32, which seems wrong.
We don't seem to use the EH registers, so this commit is mostly just
about papering over code that converts from non-EH dwarf registers to
LLVM registers while claiming they are EH dwarf registers. That kind of
code should be okay on any non-darwin target (since darwin is the only
target that uses a different encoding for EH registers).
…fmv.f.s [nfc] (llvm#84563)

The prior naming scheme is incredibly hard to make sense out of. I
suspect the usage was actually backwards from intent - though that
didn't matter for any in tree schedule model.
…pp` (llvm#83734)

The `parse.pass.cpp` tests doen't need to call
`test_format_context_create` to create a `basic_format_context`, so they
shouldn't include `test_format_context.h`.

The `to_address` mechanism works around the iterator debugging
mechanisms of MSVC STL. Related to
[LWG3989](https://cplusplus.github.io/LWG/issue3989).

Discovered when implementing `formatter<tuple>` in MSVC STL. With the
inclusion removed, `std/utilities/format/format.tuple/parse.pass.cpp`
when using enhanced MSVC STL (and `/utf-8` option for MSVC).
…y are meant to be used (llvm#84707)

This patch fixes the unconditional forward-declarations of ABI-functions
in exception_ptr.h, and makes it dependent on the availability macro, as
it should've been from the beginning.

The declarations being unconditional break the build with libcxxrt
before 045c52ce8 [1], now they are opt-out.

[1]: libcxxrt/libcxxrt@045c52c
On some architectures (currently gfx90a, gfx94*, and gfx10**), we can
implement an LDS barrier using compiler intrinsics instead of inline
assembly, improving optimization possibilities and decreasing the
fragility of the underlying code.

Other AMDGPU chipsets continue to require inline assembly to implement
this barrier, as, by the default, the LLVM backend will insert waits on
global memory (s_waintcnt vmcnt(0)) before barriers in order to ensure
memory watchpoints set by debuggers work correctly.

Use of amdgpu.lds_barrier, on these architectures, imposes a tradeoff
between debugability and performance. The documentation, as well as the
generated inline assembly, have been updated to explicitly call
attention to this fact.

For chipsets that did not require the inline assembly hack, we move to
the s.waitcnt and s.barrier intrinsics, which have been added to the
ROCDL dialect. The magic constants used as an argument to the waitcnt
intrinsic can be derived from
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
llvm#84380)

Runtime unit tests used `new[]` to allocate memory, which then was
released using `free`.

This was detected by address sanitizer.
…#84770)

MachineRegisterInfo already knows the MF so there is no need to pass it
in as an argument.
…ing it to performance_testing. (llvm#84646)

Removing all the diff tests.
Base automatically changed from bump_to_fab2bb8b to feature/fused-ops August 13, 2024 11:11
An error occurred while trying to automatically change base from bump_to_fab2bb8b to feature/fused-ops August 13, 2024 11:11
@mgehre-amd mgehre-amd merged commit 94924fc into feature/fused-ops Aug 14, 2024
10 checks passed
@mgehre-amd mgehre-amd deleted the bump_to_d99bb014 branch August 14, 2024 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.