forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge 24.07.2024 #126
Merged
Merged
Merge 24.07.2024 #126
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit is to enable 128 vector feature by default, in order to be consistent with gcc.
The newly added strings `la64v1.0` and `la64v1.1` in `-march` are as described in LoongArch toolchains conventions (see [1]). The target-cpu/feature attributes are forwarded to compiler when specifying particular `-march` parameter. The default cpu `loongarch64` is returned when archname is `la64v1.0` or `la64v1.1`. In addition, this commit adds `la64v1.0`/`la64v1.1` to "__loongarch_arch" and adds definition for macro "__loongarch_frecipe". [1]: https://github.com/loongson/la-toolchain-conventions
All of `MCAsmBackend`, `MCCodeEmitter`, and `MCObjectWriter` must be non-null.
Similar to llvm#99836 for AArch64. Non-unique names save .strtab space and match GNU assembler. Pull Request: llvm#99903
Similar to llvm#99836 for AArch64. Non-unique names save .strtab space and match GNU assembler.
…cks (llvm#99925) Make the clang-tidy check misc-const-correctness work with function-try-blocks. Fixes llvm#99860.
This PR adds `f8E4M3` type to mlir. `f8E4M3` type follows IEEE 754 convention ```c f8E4M3 (IEEE 754) - Exponent bias: 7 - Maximum stored exponent value: 14 (binary 1110) - Maximum unbiased exponent value: 14 - 7 = 7 - Minimum stored exponent value: 1 (binary 0001) - Minimum unbiased exponent value: 1 − 7 = −6 - Precision specifies the total number of bits used for the significand (mantisa), including implicit leading integer bit = 3 + 1 = 4 - Follows IEEE 754 conventions for representation of special values - Has Positive and Negative zero - Has Positive and Negative infinity - Has NaNs Additional details: - Max exp (unbiased): 7 - Min exp (unbiased): -6 - Infinities (+/-): S.1111.000 - Zeros (+/-): S.0000.000 - NaNs: S.1111.{001, 010, 011, 100, 101, 110, 111} - Max normal number: S.1110.111 = +/-2^(7) x (1 + 0.875) = +/-240 - Min normal number: S.0001.000 = +/-2^(-6) - Max subnormal number: S.0000.111 = +/-2^(-6) x 0.875 = +/-2^(-9) x 7 - Min subnormal number: S.0000.001 = +/-2^(-6) x 0.125 = +/-2^(-9) ``` Related PRs: - [PR-97179](llvm#97179) [APFloat] Add support for f8E4M3 IEEE 754 type
This commit updates the LLVM dialect CallOp and InvokeOp to always print the variadic callee type (previously callee type) if present. An additional verifier checks that only variadic calls have a non-null variadic callee type, and the builders are adapted accordingly to set the variadic callee type for variadic calls only. Finally, the CallOp and InvokeOp verifiers are strengthened to check that the variadic callee type matches the call argument and result types. The motivation of this change is that CallOp and InvokeOp don't have hidden state that is not pretty printed, but used during the export to LLVM IR. Previously, it could happen that a call looked correct in MLIR, but the return type changed after exporting to LLVM IR (since it has been taken from the hidden callee type attribute). After landing this change, this is not possible anymore since the variadic callee type is always printed if present.
) Some template function instantiations don't have a body, even though their templates did have a body. Examples are: `std::move`, `std::forward`, `std::addressof` etc. They had bodies before llvm@72315d0 After that change, the sentiment was that these special functions should be considered and treated as builtin functions. Fixes llvm#94193 CPP-5358
…esent (llvm#99281) After changes in PR llvm#87144 and llvm#93923 regressions appeared in some cases. The problem was that if multiple anonymous enums are present in a class and are imported as new the import of the second enum can fail because it is detected as different from the first and causes ODR error. Now in case of enums without name an existing similar enum is searched, if not found the enum is imported. ODR error is not detected. This may be incorrect if non-matching structures are imported, but this is the less important case (import of matching classes is more important to work).
…-try-blocks" (llvm#100069) Reverts llvm#99925
The test file ignore_free_hooks.cpp (added in https://github.com/llvm/llvm-project/pull/96749/files) fails on mac because `|&` doesn't work on mac. Replace with `2>&1`.
…lvm#99898) As discussed at the last sync-up call, mark Zacas as experimental until this ABI issue is resolved <riscv-non-isa/riscv-elf-psabi-doc#444>. Don't return Zacas in getHostCPUFeatures (leaving a TODO there) as even if requesting detection of "native" features, the user likely doesn't want to automatically opt in to experimental codegen.
A new ProcessorModel called `la664` is defined in LoongArch.td to support `-march/-mtune=la664`.
Currently, automatic vectorization will be enabled with `-mlsx/-mlasx` enabled.
…cks (llvm#99925) Make the clang-tidy check misc-const-correctness work with function-try-blocks. Fixes llvm#99860.
…m#100040) A FieldDecl that's an empty struct may not show up in CGRecordLayout. Go ahead and ignore such a field as it shouldn't make a difference to these calculations. Fixes: 1f6f97e ("[Clang] Loop over FieldDecls instead of all Decls (llvm#99574)") Co-authored-by: Eli Friedman <efriedma@quicinc.com>
On GFX11.5 shaders having completed exports need to execute/wait at a lower priority than shaders still executing exports. Add code to maintain normal priority of 2 for shaders that export and drop to priority 0 after exports.
This tool shouldn't be used in the driver build until it is converted to use `OptTable` for option parsing, otherwise the `cl::opt` options might conflict with options in other tools resulting in link failures.
PR llvm#91843 changed the algorithm used to find the next unplaced block so that it iterates through the blocks in BlockFilter instead of iterating through the blocks in the function and checking if they are in the block filter. Unfortunately this sometimes results in a different block ordering being chosen, as the order of blocks in BlockFilter comes from the order in MachineLoopInfo, and in some cases this differs from the order they are in the function. This can also give an end result that has worse performance. Fix this by making collectLoopBlockSet place blocks in its output in the order that they are in the function.
…peForFunctionPointerAuth` (llvm#99763) This prevent the warning from compiler.
Currently, the `mlir-tblgen -verify-openmp-ops` pseudo-backend, which only performs an OpenMP dialect-specific set of checks and produces no output, is prevented from being added as a dependency to the `MLIROpenMPOpsIncGen` tablegen target. However, a consequence of this is that it is not triggered with every modification of the OpenMPOps.td file it's intended to check, although it should. This patch fixes the issue by letting the empty output file to be added to the `TABLEGEN_OUTPUT` CMake variable used by the `add_public_tablegen_target` command below to set up dependencies.
A lot of cases have differing AVLs which aren't foldable, update them so the peephole triggers on them and add explicit cases for non-foldable AVLs. Also rename it to vmv.v.v-peephole.ll since it's not actually a DAG combine. And remove a TODO, it's correct to fold if the two passthrus are the same.
…m#98941) Add support for checking mismatched ownership_returns/ownership_takes attributes. Closes llvm#76861
…lvm#93350)" This reverts commit 9628777. More details in llvm#93350, but this broke the PowerPC sanitizer bots.
…00317) Adds proper mapping of common block elements to block arguments in parallel regions when delayed privatization is enabled.
…gObjectAggressive` (llvm#100102)" Added handling for `AllocaInst`. This reverts commit 1ee686a.
…es (llvm#100316) See the following case: ``` define i16 @pr100298() { entry: br label %for.inc for.inc: %indvar = phi i32 [ -15, %entry ], [ %mask, %for.inc ] %add = add nsw i32 %indvar, 9 %mask = and i32 %add, 65535 %cmp1 = icmp ugt i32 %mask, 5 br i1 %cmp1, label %for.inc, label %for.end for.end: %conv = trunc i32 %add to i16 %cmp2 = icmp ugt i32 %mask, 3 %shl = shl nuw i16 %conv, 14 %res = select i1 %cmp2, i16 %conv, i16 %shl ret i16 %res } ``` When computing knownbits of `%shl` with `%cmp2=false`, we cannot use this condition in the analysis of `%mask (%for.inc -> %for.inc)`. Fixes llvm#100298.
Summary: Previously, the GPU built the `libc` in a fat binary version that was used to pass this to the link job in offloading languages like CUDA or OpenMP. This was mostly required because NVIDIA couldn't consume the standard static library version. Recent patches have now created the `clang-nvlink-wrapper` which lets us do that. Now, the C library is just included implicitly by the toolchain (or passed with -Xoffload-linker -lc). This code can be fully removed, which will heavily simplify the build (and removed some bugs and garbage files I've encoutnered).
llvm#98863 merged AMDGPUAsanInstrumentation module which missed TransformUtils to be linked to AMDGPUUtils. This PR moves AMDGPUAsanInstrumentation files outside utils folder and adds them to AMDGPUCodegen lib.
…lvm#100339) Those are not needed now that <llvm#98400> is submitted.
Summary: We can enable the `sscanf` function on the GPU now.
This patch preserves `undef` SDNodes that are `volatile` qualified. Previously, these nodes would be discarded. The motivation behind this change is to adhere to the [LangRef](https://llvm.org/docs/LangRef.html#volatile-memory-accesses), even though that doc is mostly in terms of LLVM-IR, it seems reasonable to imply that the volatile constraints also imply to SDNodes. > Certain memory accesses, such as [load](https://llvm.org/docs/LangRef.html#i-load)’s, [store](https://llvm.org/docs/LangRef.html#i-store)’s, and [llvm.memcpy](https://llvm.org/docs/LangRef.html#int-memcpy)’s may be marked volatile. The optimizers must not change the number of volatile operations or change their order of execution relative to other volatile operations. The optimizers may change the order of volatile operations relative to non-volatile operations. This is not Java’s “volatile” and has no cross-thread synchronization behavior. Source: https://llvm.org/docs/LangRef.html#volatile-memory-accesses
This is not a hidden bug, it's just a very slow test under emulation.
Summary: This fails tests in some situations, revert until it can be fixed. This reverts commit 445bb35.
Summary: I forgot that the OpenMP tests still look for this, reverting for now until I can make a fix. This reverts commit c1c6ed8.
In PowerPC ABI, a few initial arguments are passed through registers, but their places in parameter save area are reserved, arguments passed by memory goes after the reserved location. For debugging purpose, we may want to save copy of the pass-by-reg arguments into correct places on stack. The new option achieves by adding new function level attribute and make argument lowering part aware of it.
A couple of previous commits leaded to wrong endif placement inside the source that caused build problem in https://lab.llvm.org/buildbot/#/builders/13/builds/1020 See llvm#99613 llvm#99049
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.