Merge 24.07.2024 #126

ergawy · 2024-07-25T04:15:40Z

No description provided.

This commit is to enable 128 vector feature by default, in order to be consistent with gcc.

The newly added strings `la64v1.0` and `la64v1.1` in `-march` are as described in LoongArch toolchains conventions (see [1]). The target-cpu/feature attributes are forwarded to compiler when specifying particular `-march` parameter. The default cpu `loongarch64` is returned when archname is `la64v1.0` or `la64v1.1`. In addition, this commit adds `la64v1.0`/`la64v1.1` to "__loongarch_arch" and adds definition for macro "__loongarch_frecipe". [1]: https://github.com/loongson/la-toolchain-conventions

All of `MCAsmBackend`, `MCCodeEmitter`, and `MCObjectWriter` must be non-null.

Similar to llvm#99836 for AArch64. Non-unique names save .strtab space and match GNU assembler. Pull Request: llvm#99903

Similar to llvm#99836 for AArch64. Non-unique names save .strtab space and match GNU assembler.

…cks (llvm#99925) Make the clang-tidy check misc-const-correctness work with function-try-blocks. Fixes llvm#99860.

This PR adds `f8E4M3` type to mlir. `f8E4M3` type follows IEEE 754 convention ```c f8E4M3 (IEEE 754) - Exponent bias: 7 - Maximum stored exponent value: 14 (binary 1110) - Maximum unbiased exponent value: 14 - 7 = 7 - Minimum stored exponent value: 1 (binary 0001) - Minimum unbiased exponent value: 1 − 7 = −6 - Precision specifies the total number of bits used for the significand (mantisa), including implicit leading integer bit = 3 + 1 = 4 - Follows IEEE 754 conventions for representation of special values - Has Positive and Negative zero - Has Positive and Negative infinity - Has NaNs Additional details: - Max exp (unbiased): 7 - Min exp (unbiased): -6 - Infinities (+/-): S.1111.000 - Zeros (+/-): S.0000.000 - NaNs: S.1111.{001, 010, 011, 100, 101, 110, 111} - Max normal number: S.1110.111 = +/-2^(7) x (1 + 0.875) = +/-240 - Min normal number: S.0001.000 = +/-2^(-6) - Max subnormal number: S.0000.111 = +/-2^(-6) x 0.875 = +/-2^(-9) x 7 - Min subnormal number: S.0000.001 = +/-2^(-6) x 0.125 = +/-2^(-9) ``` Related PRs: - [PR-97179](llvm#97179) [APFloat] Add support for f8E4M3 IEEE 754 type

This commit updates the LLVM dialect CallOp and InvokeOp to always print the variadic callee type (previously callee type) if present. An additional verifier checks that only variadic calls have a non-null variadic callee type, and the builders are adapted accordingly to set the variadic callee type for variadic calls only. Finally, the CallOp and InvokeOp verifiers are strengthened to check that the variadic callee type matches the call argument and result types. The motivation of this change is that CallOp and InvokeOp don't have hidden state that is not pretty printed, but used during the export to LLVM IR. Previously, it could happen that a call looked correct in MLIR, but the return type changed after exporting to LLVM IR (since it has been taken from the hidden callee type attribute). After landing this change, this is not possible anymore since the variadic callee type is always printed if present.

) Some template function instantiations don't have a body, even though their templates did have a body. Examples are: `std::move`, `std::forward`, `std::addressof` etc. They had bodies before llvm@72315d0 After that change, the sentiment was that these special functions should be considered and treated as builtin functions. Fixes llvm#94193 CPP-5358

…esent (llvm#99281) After changes in PR llvm#87144 and llvm#93923 regressions appeared in some cases. The problem was that if multiple anonymous enums are present in a class and are imported as new the import of the second enum can fail because it is detected as different from the first and causes ODR error. Now in case of enums without name an existing similar enum is searched, if not found the enum is imported. ODR error is not detected. This may be incorrect if non-matching structures are imported, but this is the less important case (import of matching classes is more important to work).

…-try-blocks" (llvm#100069) Reverts llvm#99925

The test file ignore_free_hooks.cpp (added in https://github.com/llvm/llvm-project/pull/96749/files) fails on mac because `|&` doesn't work on mac. Replace with `2>&1`.

…lvm#99898) As discussed at the last sync-up call, mark Zacas as experimental until this ABI issue is resolved <riscv-non-isa/riscv-elf-psabi-doc#444>. Don't return Zacas in getHostCPUFeatures (leaving a TODO there) as even if requesting detection of "native" features, the user likely doesn't want to automatically opt in to experimental codegen.

A new ProcessorModel called `la664` is defined in LoongArch.td to support `-march/-mtune=la664`.

Currently, automatic vectorization will be enabled with `-mlsx/-mlasx` enabled.

…cks (llvm#99925) Make the clang-tidy check misc-const-correctness work with function-try-blocks. Fixes llvm#99860.

Related to llvm#99925.

…m#100040) A FieldDecl that's an empty struct may not show up in CGRecordLayout. Go ahead and ignore such a field as it shouldn't make a difference to these calculations. Fixes: 1f6f97e ("[Clang] Loop over FieldDecls instead of all Decls (llvm#99574)") Co-authored-by: Eli Friedman <efriedma@quicinc.com>

On GFX11.5 shaders having completed exports need to execute/wait at a lower priority than shaders still executing exports. Add code to maintain normal priority of 2 for shaders that export and drop to priority 0 after exports.

…vm#96161)

This tool shouldn't be used in the driver build until it is converted to use `OptTable` for option parsing, otherwise the `cl::opt` options might conflict with options in other tools resulting in link failures.

PR llvm#91843 changed the algorithm used to find the next unplaced block so that it iterates through the blocks in BlockFilter instead of iterating through the blocks in the function and checking if they are in the block filter. Unfortunately this sometimes results in a different block ordering being chosen, as the order of blocks in BlockFilter comes from the order in MachineLoopInfo, and in some cases this differs from the order they are in the function. This can also give an end result that has worse performance. Fix this by making collectLoopBlockSet place blocks in its output in the order that they are in the function.

…peForFunctionPointerAuth` (llvm#99763) This prevent the warning from compiler.

Currently, the `mlir-tblgen -verify-openmp-ops` pseudo-backend, which only performs an OpenMP dialect-specific set of checks and produces no output, is prevented from being added as a dependency to the `MLIROpenMPOpsIncGen` tablegen target. However, a consequence of this is that it is not triggered with every modification of the OpenMPOps.td file it's intended to check, although it should. This patch fixes the issue by letting the empty output file to be added to the `TABLEGEN_OUTPUT` CMake variable used by the `add_public_tablegen_target` command below to set up dependencies.

A lot of cases have differing AVLs which aren't foldable, update them so the peephole triggers on them and add explicit cases for non-foldable AVLs. Also rename it to vmv.v.v-peephole.ll since it's not actually a DAG combine. And remove a TODO, it's correct to fold if the two passthrus are the same.

…m#98941) Add support for checking mismatched ownership_returns/ownership_takes attributes. Closes llvm#76861

…lvm#93350)" This reverts commit 9628777. More details in llvm#93350, but this broke the PowerPC sanitizer bots.

…00317) Adds proper mapping of common block elements to block arguments in parallel regions when delayed privatization is enabled.

…gObjectAggressive` (llvm#100102)" Added handling for `AllocaInst`. This reverts commit 1ee686a.

…es (llvm#100316) See the following case: ``` define i16 @pr100298() { entry: br label %for.inc for.inc: %indvar = phi i32 [ -15, %entry ], [ %mask, %for.inc ] %add = add nsw i32 %indvar, 9 %mask = and i32 %add, 65535 %cmp1 = icmp ugt i32 %mask, 5 br i1 %cmp1, label %for.inc, label %for.end for.end: %conv = trunc i32 %add to i16 %cmp2 = icmp ugt i32 %mask, 3 %shl = shl nuw i16 %conv, 14 %res = select i1 %cmp2, i16 %conv, i16 %shl ret i16 %res } ``` When computing knownbits of `%shl` with `%cmp2=false`, we cannot use this condition in the analysis of `%mask (%for.inc -> %for.inc)`. Fixes llvm#100298.

Summary: Previously, the GPU built the `libc` in a fat binary version that was used to pass this to the link job in offloading languages like CUDA or OpenMP. This was mostly required because NVIDIA couldn't consume the standard static library version. Recent patches have now created the `clang-nvlink-wrapper` which lets us do that. Now, the C library is just included implicitly by the toolchain (or passed with -Xoffload-linker -lc). This code can be fully removed, which will heavily simplify the build (and removed some bugs and garbage files I've encoutnered).

llvm#98863 merged AMDGPUAsanInstrumentation module which missed TransformUtils to be linked to AMDGPUUtils. This PR moves AMDGPUAsanInstrumentation files outside utils folder and adds them to AMDGPUCodegen lib.

…lvm#100339) Those are not needed now that <llvm#98400> is submitted.

Summary: We can enable the `sscanf` function on the GPU now.

This patch preserves `undef` SDNodes that are `volatile` qualified. Previously, these nodes would be discarded. The motivation behind this change is to adhere to the [LangRef](https://llvm.org/docs/LangRef.html#volatile-memory-accesses), even though that doc is mostly in terms of LLVM-IR, it seems reasonable to imply that the volatile constraints also imply to SDNodes. > Certain memory accesses, such as [load](https://llvm.org/docs/LangRef.html#i-load)’s, [store](https://llvm.org/docs/LangRef.html#i-store)’s, and [llvm.memcpy](https://llvm.org/docs/LangRef.html#int-memcpy)’s may be marked volatile. The optimizers must not change the number of volatile operations or change their order of execution relative to other volatile operations. The optimizers may change the order of volatile operations relative to non-volatile operations. This is not Java’s “volatile” and has no cross-thread synchronization behavior. Source: https://llvm.org/docs/LangRef.html#volatile-memory-accesses

This is not a hidden bug, it's just a very slow test under emulation.

Summary: This fails tests in some situations, revert until it can be fixed. This reverts commit 445bb35.

Summary: I forgot that the OpenMP tests still look for this, reverting for now until I can make a fix. This reverts commit c1c6ed8.

In PowerPC ABI, a few initial arguments are passed through registers, but their places in parameter save area are reserved, arguments passed by memory goes after the reserved location. For debugging purpose, we may want to save copy of the pass-by-reg arguments into correct places on stack. The new option achieves by adding new function level attribute and make argument lowering part aware of it.

A couple of previous commits leaded to wrong endif placement inside the source that caused build problem in https://lab.llvm.org/buildbot/#/builders/13/builds/1020 See llvm#99613 llvm#99049

AreaZR and others added 30 commits July 22, 2024 22:53

[Utils] Fix clang-tidy warning: Use boolean false, not 0 (NFC) (llvm#…

786b491

…99828)

[NFC] changes all run lines

b830790

Fix llvm#99888

[LoongArch] Enable 128-bits vector by default (llvm#100056)

b4ef0ba

This commit is to enable 128 vector feature by default, in order to be consistent with gcc.

MCObjectStreamer: Remove an unneeded getBackendPtr test

2114947

All of `MCAsmBackend`, `MCCodeEmitter`, and `MCObjectWriter` must be non-null.

[RISCV] Create mapping symbols with non-unique names

f9c349f

Similar to llvm#99836 for AArch64. Non-unique names save .strtab space and match GNU assembler. Pull Request: llvm#99903

[CSKY] Create mapping symbols with non-unique names

de2bfe0

Similar to llvm#99836 for AArch64. Non-unique names save .strtab space and match GNU assembler.

[clang-tidy] fix misc-const-correctness to work with function-try-blo…

cd9e42c

…cks (llvm#99925) Make the clang-tidy check misc-const-correctness work with function-try-blocks. Fixes llvm#99860.

[clang][Interp] Calculate APValue offsets for base classes

ea48629

MCAssembler: Move SubsectionsViaSymbols; to MCObjectWriter

f017d89

MCAssembler: Remove unused functions

b2f5ac6

Revert "[clang-tidy] fix misc-const-correctness to work with function…

f0fad9f

…-try-blocks" (llvm#100069) Reverts llvm#99925

Replace |& with 2>&1 in ignore_free_hooks test. (llvm#100004)

404ca22

The test file ignore_free_hooks.cpp (added in https://github.com/llvm/llvm-project/pull/96749/files) fails on mac because `|&` doesn't work on mac. Replace with `2>&1`.

[LoongArch] Support la664 (llvm#100068)

fcec298

A new ProcessorModel called `la664` is defined in LoongArch.td to support `-march/-mtune=la664`.

ELFObjectWriter: Remove unneeded subclasses

2db576c

[LoongArch] Remove experimental auto-vec feature. (llvm#100070)

89d1eb6

Currently, automatic vectorization will be enabled with `-mlsx/-mlasx` enabled.

[LoongArch] Summary the release notes for LLVM 19

8a615bc

[LoongArch] Fix test issue of init-loongarch.c

d59925c

[clang][test][RISCV] Add missing test change from llvm#99898

2de1333

[clang-tidy] fix misc-const-correctness to work with function-try-blo…

26c99c4

…cks (llvm#99925) Make the clang-tidy check misc-const-correctness work with function-try-blocks. Fixes llvm#99860.

[clang-tidy][NFC] Added -fexceptions to const-correctness-values.cp

2dd82c5

Related to llvm#99925.

[AMDGPU] Define constrained multi-dword scalar load instructions. (ll…

eeb7feb

…vm#96161)

[llvm-cgdata] Remove GENERATE_DRIVER option (llvm#100066)

96d4121

This tool shouldn't be used in the driver build until it is converted to use `OptTable` for option parsing, otherwise the `cl::opt` options might conflict with options in other tools resulting in link failures.

john-brawn-arm and others added 27 commits July 24, 2024 10:49

[clang][Interp] Bail out on value dependent variable initializers

d36edf8

[ASTContext] Make the end of the switch case unreachable in `encodeTy…

aa53f0d

…peForFunctionPointerAuth` (llvm#99763) This prevent the warning from compiler.

[clang][analyzer] Support ownership_{returns,takes} attributes (llv…

893a303

…m#98941) Add support for checking mismatched ownership_returns/ownership_takes attributes. Closes llvm#76861

Revert "[libc++][math] Fix undue overflowing of std::hypot(x,y,z) (l…

1031335

…lvm#93350)" This reverts commit 9628777. More details in llvm#93350, but this broke the PowerPC sanitizer bots.

ARM: Avoid using MachineFunction::getMMI

2ce865d

[flang][OpenMP] Handle common blocks in delayed privatization (llvm#1…

68a0d0c

…00317) Adds proper mapping of common block elements to block arguments in parallel regions when delayed privatization is enabled.

[gn] port 73ac953

a5bc549

Reapply "[FunctionAttrs] Determine underlying object by `getUnderlyin…

559be8e

…gObjectAggressive` (llvm#100102)" Added handling for `AllocaInst`. This reverts commit 1ee686a.

[gn build] Port 2ca300f

99c5140

[gn build] Port ddb75ca

48e1eb4

[AMDGPU] Move AMDGPUAsanInstrumentation outside of utils (llvm#100323)

e6c20e1

llvm#98863 merged AMDGPUAsanInstrumentation module which missed TransformUtils to be linked to AMDGPUUtils. This PR moves AMDGPUAsanInstrumentation files outside utils folder and adds them to AMDGPUCodegen lib.

[AMDGPU][MC][NFC] Drop remaining -wavesize32/64 attributes in tests. (l…

e1052fa

…lvm#100339) Those are not needed now that <llvm#98400> is submitted.

[libc] Enable 'sscanf' on the GPU (llvm#100211)

445bb35

Summary: We can enable the `sscanf` function on the GPU now.

[libcxx][test] Explain picolib unsupported in sort.pass.cpp

929b474

This is not a hidden bug, it's just a very slow test under emulation.

Revert "[libc] Enable 'sscanf' on the GPU (llvm#100211)"

9914609

Summary: This fails tests in some situations, revert until it can be fixed. This reverts commit 445bb35.

Revert "[libc] Remove 'packaged' GPU build support (llvm#100208)"

550b83d

Summary: I forgot that the OpenMP tests still look for this, reverting for now until I can make a fix. This reverts commit c1c6ed8.

[clang] Define ATOMIC_FLAG_INIT correctly for C++. (llvm#97534)

4bb3a1e

Merge remote-tracking branch 'upstream/main' into amd-trunk-dev

1a9dfc9

[compiler-rt] Move endif to correct place (llvm#100342)

558a895

A couple of previous commits leaded to wrong endif placement inside the source that caused build problem in https://lab.llvm.org/buildbot/#/builders/13/builds/1020 See llvm#99613 llvm#99049

Merge remote-tracking branch 'upstream/main' into amd-trunk-dev

69a15f5

ergawy requested review from antiagainst and kuhar as code owners July 25, 2024 04:15

ergawy merged commit b3b35ea into ROCm:amd-trunk-dev Jul 25, 2024
4 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge 24.07.2024 #126

Merge 24.07.2024 #126

ergawy commented Jul 25, 2024 •

edited

Loading

Merge 24.07.2024 #126

Merge 24.07.2024 #126

Conversation

ergawy commented Jul 25, 2024 • edited Loading

ergawy commented Jul 25, 2024 •

edited

Loading