Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge 24.07.2024 #126

Merged
merged 973 commits into from
Jul 25, 2024
Merged

Merge 24.07.2024 #126

merged 973 commits into from
Jul 25, 2024

Conversation

ergawy
Copy link

@ergawy ergawy commented Jul 25, 2024

No description provided.

AreaZR and others added 30 commits July 22, 2024 22:53
This commit is to enable 128 vector feature by default, in order to be
consistent with gcc.
The newly added strings `la64v1.0` and `la64v1.1` in `-march` are as
described in LoongArch toolchains conventions (see [1]).

The target-cpu/feature attributes are forwarded to compiler when
specifying particular `-march` parameter. The default cpu `loongarch64`
is returned when archname is `la64v1.0` or `la64v1.1`.

In addition, this commit adds `la64v1.0`/`la64v1.1` to
"__loongarch_arch" and adds definition for macro "__loongarch_frecipe".

[1]: https://github.com/loongson/la-toolchain-conventions
All of `MCAsmBackend`, `MCCodeEmitter`, and `MCObjectWriter` must be
non-null.
Similar to llvm#99836 for AArch64.

Non-unique names save .strtab space and match GNU assembler.

Pull Request: llvm#99903
Similar to llvm#99836 for AArch64.

Non-unique names save .strtab space and match GNU assembler.
…cks (llvm#99925)

Make the clang-tidy check misc-const-correctness work with
function-try-blocks.

Fixes llvm#99860.
This PR adds `f8E4M3` type to mlir.

`f8E4M3` type  follows IEEE 754 convention

```c
f8E4M3 (IEEE 754)
- Exponent bias: 7
- Maximum stored exponent value: 14 (binary 1110)
- Maximum unbiased exponent value: 14 - 7 = 7
- Minimum stored exponent value: 1 (binary 0001)
- Minimum unbiased exponent value: 1 − 7 = −6
- Precision specifies the total number of bits used for the significand (mantisa), 
    including implicit leading integer bit = 3 + 1 = 4
- Follows IEEE 754 conventions for representation of special values
- Has Positive and Negative zero
- Has Positive and Negative infinity
- Has NaNs

Additional details:
- Max exp (unbiased): 7
- Min exp (unbiased): -6
- Infinities (+/-): S.1111.000
- Zeros (+/-): S.0000.000
- NaNs: S.1111.{001, 010, 011, 100, 101, 110, 111}
- Max normal number: S.1110.111 = +/-2^(7) x (1 + 0.875) = +/-240
- Min normal number: S.0001.000 = +/-2^(-6)
- Max subnormal number: S.0000.111 = +/-2^(-6) x 0.875 = +/-2^(-9) x 7
- Min subnormal number: S.0000.001 = +/-2^(-6) x 0.125 = +/-2^(-9)
```

Related PRs:
- [PR-97179](llvm#97179) [APFloat]
Add support for f8E4M3 IEEE 754 type
This commit updates the LLVM dialect CallOp and InvokeOp to always print
the variadic callee type (previously callee type) if present. An
additional verifier checks that only variadic calls have a non-null
variadic callee type, and the builders are adapted accordingly to set
the variadic callee type for variadic calls only. Finally, the CallOp
and InvokeOp verifiers are strengthened to check that the variadic
callee type matches the call argument and result types.

The motivation of this change is that CallOp and InvokeOp don't have
hidden state that is not pretty printed, but used during the export to
LLVM IR. Previously, it could happen that a call looked correct in MLIR,
but the return type changed after exporting to LLVM IR (since it has
been taken from the hidden callee type attribute). After landing this
change, this is not possible anymore since the variadic callee type is
always printed if present.
)

Some template function instantiations don't have a body, even though
their templates did have a body.
Examples are: `std::move`, `std::forward`, `std::addressof` etc.

They had bodies before
llvm@72315d0

After that change, the sentiment was that these special functions should
be considered and treated as builtin functions.

Fixes llvm#94193

CPP-5358
…esent (llvm#99281)

After changes in PR llvm#87144 and llvm#93923 regressions appeared in some
cases. The problem was that if multiple anonymous enums are present in a
class and are imported as new the import of the second enum can fail
because it is detected as different from the first and causes ODR error.

Now in case of enums without name an existing similar enum is searched,
if not found the enum is imported. ODR error is not detected. This may
be incorrect if non-matching structures are imported, but this is the
less important case (import of matching classes is more important to
work).
The test file ignore_free_hooks.cpp (added in
https://github.com/llvm/llvm-project/pull/96749/files) fails on mac
because `|&` doesn't work on mac. Replace with `2>&1`.
…lvm#99898)

As discussed at the last sync-up call, mark Zacas as experimental until
this ABI issue is resolved
<riscv-non-isa/riscv-elf-psabi-doc#444>.

Don't return Zacas in getHostCPUFeatures (leaving a TODO there) as even if requesting detection of "native" features, the user likely doesn't want to automatically opt in to experimental codegen.
A new ProcessorModel called `la664` is defined in LoongArch.td to
support `-march/-mtune=la664`.
Currently, automatic vectorization will be enabled with `-mlsx/-mlasx`
enabled.
…cks (llvm#99925)

Make the clang-tidy check misc-const-correctness work with
function-try-blocks.

Fixes llvm#99860.
…m#100040)

A FieldDecl that's an empty struct may not show up in CGRecordLayout. Go
ahead and ignore such a field as it shouldn't make a difference to these
calculations.

Fixes: 1f6f97e ("[Clang] Loop over FieldDecls instead of all Decls (llvm#99574)")
Co-authored-by: Eli Friedman <efriedma@quicinc.com>
On GFX11.5 shaders having completed exports need to execute/wait at a
lower priority than shaders still executing exports.
Add code to maintain normal priority of 2 for shaders that export and
drop to priority 0 after exports.
This tool shouldn't be used in the driver build until it is converted to
use `OptTable` for option parsing, otherwise the `cl::opt` options might
conflict with options in other tools resulting in link failures.
john-brawn-arm and others added 27 commits July 24, 2024 10:49
PR llvm#91843 changed the algorithm used to find the next unplaced block so
that it iterates through the blocks in BlockFilter instead of iterating
through the blocks in the function and checking if they are in the block
filter. Unfortunately this sometimes results in a different block
ordering being chosen, as the order of blocks in BlockFilter comes from
the order in MachineLoopInfo, and in some cases this differs from the
order they are in the function. This can also give an end result that
has worse performance.

Fix this by making collectLoopBlockSet place blocks in its output in the
order that they are in the function.
…peForFunctionPointerAuth` (llvm#99763)

This prevent the warning from compiler.
Currently, the `mlir-tblgen -verify-openmp-ops` pseudo-backend, which
only performs an OpenMP dialect-specific set of checks and produces no
output, is prevented from being added as a dependency to the
`MLIROpenMPOpsIncGen` tablegen target.

However, a consequence of this is that it is not triggered with every
modification of the OpenMPOps.td file it's intended to check, although
it should. This patch fixes the issue by letting the empty output file
to be added to the `TABLEGEN_OUTPUT` CMake variable used by the
`add_public_tablegen_target` command below to set up dependencies.
A lot of cases have differing AVLs which aren't foldable, update them so the peephole triggers on them and add explicit cases for non-foldable AVLs.

Also rename it to vmv.v.v-peephole.ll since it's not actually a DAG combine.

And remove a TODO, it's correct to fold if the two passthrus are the same.
…m#98941)

Add support for checking mismatched ownership_returns/ownership_takes attributes.

Closes llvm#76861
…lvm#93350)"

This reverts commit 9628777.

More details in llvm#93350, but
this broke the PowerPC sanitizer bots.
…00317)

Adds proper mapping of common block elements to block arguments in
parallel regions when delayed privatization is enabled.
…gObjectAggressive` (llvm#100102)"

Added handling for `AllocaInst`.

This reverts commit 1ee686a.
…es (llvm#100316)

See the following case:
```
define i16 @pr100298() {
entry:
  br label %for.inc

for.inc:
  %indvar = phi i32 [ -15, %entry ], [ %mask, %for.inc ]
  %add = add nsw i32 %indvar, 9
  %mask = and i32 %add, 65535
  %cmp1 = icmp ugt i32 %mask, 5
  br i1 %cmp1, label %for.inc, label %for.end

for.end:
  %conv = trunc i32 %add to i16
  %cmp2 = icmp ugt i32 %mask, 3
  %shl = shl nuw i16 %conv, 14
  %res = select i1 %cmp2, i16 %conv, i16 %shl
  ret i16 %res
}
```

When computing knownbits of `%shl` with `%cmp2=false`, we cannot use
this condition in the analysis of `%mask (%for.inc -> %for.inc)`.
 
Fixes llvm#100298.
Summary:
Previously, the GPU built the `libc` in a fat binary version that was
used to pass this to the link job in offloading languages like CUDA or
OpenMP. This was mostly required because NVIDIA couldn't consume the
standard static library version. Recent patches have now created the
`clang-nvlink-wrapper` which lets us do that. Now, the C library is just
included implicitly by the toolchain (or passed with -Xoffload-linker
-lc).

This code can be fully removed, which will heavily simplify the build
(and removed some bugs and garbage files I've encoutnered).
llvm#98863 merged AMDGPUAsanInstrumentation module which missed
TransformUtils to be linked to AMDGPUUtils.
This PR moves AMDGPUAsanInstrumentation files outside utils folder and
adds them to AMDGPUCodegen lib.
Summary:
We can enable the `sscanf` function on the GPU now.
This patch preserves `undef` SDNodes that are `volatile` qualified.
Previously, these nodes would be discarded. The motivation behind this
change is to adhere to the
[LangRef](https://llvm.org/docs/LangRef.html#volatile-memory-accesses),
even though that doc is mostly in terms of LLVM-IR, it seems reasonable
to imply that the volatile constraints also imply to SDNodes.

> Certain memory accesses, such as
[load](https://llvm.org/docs/LangRef.html#i-load)’s,
[store](https://llvm.org/docs/LangRef.html#i-store)’s, and
[llvm.memcpy](https://llvm.org/docs/LangRef.html#int-memcpy)’s may be
marked volatile. The optimizers must not change the number of volatile
operations or change their order of execution relative to other volatile
operations. The optimizers may change the order of volatile operations
relative to non-volatile operations. This is not Java’s “volatile” and
has no cross-thread synchronization behavior.

Source: https://llvm.org/docs/LangRef.html#volatile-memory-accesses
This is not a hidden bug, it's just a very slow test under emulation.
Summary:
This fails tests in some situations, revert until it can be fixed.
This reverts commit 445bb35.
Summary:
I forgot that the OpenMP tests still look for this, reverting for now
until I can make a fix.

This reverts commit c1c6ed8.
In PowerPC ABI, a few initial arguments are passed through registers,
but their places in parameter save area are reserved, arguments passed
by memory goes after the reserved location.

For debugging purpose, we may want to save copy of the pass-by-reg
arguments into correct places on stack. The new option achieves by
adding new function level attribute and make argument lowering part
aware of it.
A couple of previous commits leaded to wrong endif placement inside the
source that caused build problem in
https://lab.llvm.org/buildbot/#/builders/13/builds/1020

See llvm#99613 llvm#99049
@ergawy ergawy merged commit b3b35ea into ROCm:amd-trunk-dev Jul 25, 2024
4 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.