[AutoBump] Merge with d99bb014 (3) #250

mgehre-amd · 2024-08-12T16:11:21Z

No description provided.

…lvm#77092) Consistent with `__is_trivially_copyable(volatile int) == true` and `__is_trivially_relocatable(volatile Trivial) == true`, `__is_trivially_relocatable(volatile int)` should also be `true`. Fixes llvm#77091 [clang] [test] New tests for __is_trivially_relocatable(cv-qualified type)

…g constraint expression Previously we disabled to compute ODR hash for declarations from the global module fragment. However, we missed the case that the functions lives in the concept requiments (see the attached the test files for example). And the mismatch causes the potential crashment. Due to we will set the function body as lazy after we deserialize it and we will only take its body when needed. However, we don't allow to take the body during deserializing. So it is actually potentially problematic if we set the body as lazy first and computing the hash value of the function, which requires to deserialize its body. So we will meet a crash here. This patch tries to solve the issue by not taking the body of the function from GMF. Note that we can't skip comparing the constraint expression from the GMF directly since it is an key part of the function selecting and it may be the reason why we can't return 0 directly for `FunctionDecl::getODRHash()` from the GMF.

Previously, we tried to create an integer extending load. We need to a non-extending FP load instead. Fixes llvm#84541.

On converting an instruction to an early-clobber definition in convertToThreeAddress, we must also update live intervals for the register to start at the early-clobber index.

…vm#82214) Small mergeable read only data was place on the sdata before, but it also means it lose the mergeable property, which means lose some code size optimization opportunity during link time.

…e. NFC This removes lots of unneeded `template getFile<ELFT>()`.

…#78564) Extend repairIntervalsInRange to completely recompute the interva for a register if subregister defs exist without precise subrange matches (LaneMask exactly matching subregister). This occurs when register sequences are lowered to copies such that the size of the copies do not match any uses of the subregisters formed (i.e. during twoaddressinstruction). The subranges without this change are probably legal, but do not match those generated by live interval computation. This creates problems with other code that assumes subranges precisely cover all subregisters defined, e.g. shrinkToUses().

If we're in a blocking call, we need to run the signal immediately, as the call may not return for a very long time (if ever). Not running the handler can cause deadlocks if the rest of the program waits (in one way or another) for the signal handler to execute. I've gone through the list of functions in sanitizer_common_interceptors and marked as blocking those that I know can block, but I don't claim the list to be exhaustive. In particular, I did not mark libc FILE* functions as blocking, because these can end up calling user functions. To do that correctly, /I think/ it would be necessary to clear the "is in blocking call" flag inside the fopencookie wrappers. The test for the bug (deadlock) uses the read call (which is the one that I ran into originally), but the same kind of test could be written for any other blocking syscall.

…ap. (llvm#81690)" This reverts commit 0813b90. Fixes miscompile reported in llvm#84718.

…ble (llvm#84133) This avoids a known libFormat bug where the heuristic can OOM on certain large files (particularly single-header libraries such as miniaudio.h). The OOM will still happen on affected files if you actually try to format them (this is harder to avoid since the underlyting issue affects the actual formatting logic, not just the language-guessing heuristic), but at least it's avoided during non-modifying operations like hover, and modifying operations that do local formatting like code completion. Fixes clangd/clangd#719 Fixes clangd/clangd#1384 Fixes llvm#70945

If all variables in the module are absolute, this means we're running the pass again on an already lowered module, and that works. If none of them are absolute, lowering can proceed as usual. Only diagnose cases where we have a mix of absolute/non-absolute GVs, which means we added LDS GVs after lowering, which is broken. See llvm#81491 Split from llvm#75333

llvm#82629 added additional overloads to `replaceAllUsesWith` and `replaceUsesWithIf`. This caused a build breakage with MSVC when called with ops that can implicitly convert to `Value`. ``` external/llvm-project/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp(881): error C2666: 'mlir::RewriterBase::replaceAllUsesWith': 2 overloads have similar conversions external/llvm-project/mlir/include\mlir/IR/PatternMatch.h(631): note: could be 'void mlir::RewriterBase::replaceAllUsesWith(mlir::Operation *,mlir::ValueRange)' external/llvm-project/mlir/include\mlir/IR/PatternMatch.h(626): note: or 'void mlir::RewriterBase::replaceAllUsesWith(mlir::ValueRange,mlir::ValueRange)' external/llvm-project/mlir/include\mlir/IR/PatternMatch.h(616): note: or 'void mlir::RewriterBase::replaceAllUsesWith(mlir::Value,mlir::Value)' external/llvm-project/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp(882): note: while trying to match the argument list '(mlir::tensor::ExtractSliceOp, T)' with [ T=mlir::Value ] ``` Note: The LLVM build bots (Linux and Windows) did not break, this seems to be an issue with `Tools\MSVC\14.29.30133\bin\HostX64\x64\cl.exe`. This change renames the newly added overloads to `replaceAllOpUsesWith` and `replaceOpUsesWithIf`.

A potentially erroneous code construction with the work we've done to remove debug intrinsics, is inserting PHIs into blocks when the position hasn't been "sourced correctly". Specifically, if you have: %foo = PHI #dbg_value %bar = add i32... And plan on inserting a new PHI, you have to use the iterator form of `getFirstNonPHI` or getFirstInsertionPt (or begin()) to acquire an iterator that tells the debug-info maintenance code "this is supposed to be at the start of the block, put it in front of #dbg_value". We can detect call-sites that aren't doing this at runtime, and should do with this assertion. It might invalidate code that's doing something very unexpected, like walking backwards to find a PHI, then going forwards, then inserting: however that's just an inefficient way of calling `getFirstNonPHI`.

…mise_type (llvm#84193)" This reverts commit 35d3b33. See the comments in llvm#84193 for details

Modifies the privatization logic so that the emitted code only used the HLFIR base (i.e. SSA value `#0` returned from `hlfir.declare`). Before that, that emitted privatization logic was a mix of using `#0` and `#1` which leads to some difficulties trying to move to delayed privatization (see the discussion on llvm#84033).

This commit provides better cost estimates for the llvm.vector.reduce.add intrinsic on SystemZ. These apply to all vector lengths and integer types up to i128. For integer types larger than i128, we fall back to the default cost estimate. This has the effect of lowering the estimated costs of most common instances of the intrinsic. The expected performance impact of this is minimal with a tendency to slightly improve performance of some benchmarks. This commit also provides a test to check the proper computation of the new estimates, as well as the fallback for types larger than i128.

…pes. NFCI (llvm#84125) I noticed this from a discrepancy in fillUpExtensionSupport between how we apparently need to check for legal types for ISD::{ZERO,SIGN}_EXTEND, but we don't need to for RISCVISD::V{Z,S}EXT_VL. Prior to llvm#72340, combineBinOp_VLToVWBinOp_VL only ran after type legalization because it only operated on _VL nodes. _VL nodes are only emitted during op legalization, which takes place **after** type legalization, which is presumably why the existing code didn't need to check for legal types. After llvm#72340 we now handle generic ops like ISD::ADD that exist before op legalization and thus **before** type legalization. This meant that we needed to add extra checks that the narrow type was legal in llvm#76785. I think the easiest thing to do here is to just maintain the invariant that the types are legal and only run the combine after type legalization.

Move the narrow types assert from the ZERO_EXTEND/SIGN_EXTEND case in fillUpExtensionSupport to getOrCreateExtendedOp so we check the other nodes too.

This allows relying on VPBasicBlock::insert to make sure insertion is well formed, i.e. by updating the recipe's parent as well as other potential invariants in the future.

…lvm#83126) RuntimeInterfaceBuilder wires up JITed expressions with the hardcoded Interpreter runtime. It's used only for value printing right now, but it is not limited to that. The default implementation focuses on an evaluation process where the Interpreter has direct access to the memory of JITed expressions (in-process execution or shared memory). We need a different approach to support out-of-process evaluation or variations of the runtime. It seems reasonable to expose a minimal interface for it. The new RuntimeInterfaceBuilder is an abstract base class in the public header. For that, the TypeVisitor had to become a component (instead of inheriting from it). FindRuntimeInterface() was adjusted to return an instance of the RuntimeInterfaceBuilder and it can be overridden from derived classes.

…ith generic shift opcodes. NFC.

Select POSIX 2008 standard to avoid including Darwin extensions. Otherwise, Darwin's math.h header defines HUGE, which conflicts with Flang's HUGE function. This started happening after 4762c65 (llvm#82443), that added the "utility" include, which seems to include "math.h".

The RelocationEntry's fields are poorly ordered when considering padding. This reordering reduces the size from 56 bytes to 40 bytes (on LP64).

Separated from llvm#83251

This fixes tests that are going to be upstreamed in the near future. Currently they are failing downstream in the Apple open source fork. Failing tests Clang :: APINotes/retain-count-convention.m Clang :: APINotes/types.m Clang :: APINotes/versioned-multi.c Clang :: APINotes/versioned.m Since 2e5af56 got merged, Clang now enables `LangOpts.APINotesModules` when reading a precompiled module that was built with API Notes enabled. This is correct. The logic in APINotesManager needs to be adjusted to handle this. rdar://123526142

That test is using std::toupper.

…lvm#84664) Summary: Currently we have a conditional that turns the full build on by default if it is a default target. This used to work fine when the GPU was the only target that was ever present. However, we've recently changed to allow building multiple of these at the same time. That means we should have the ability to build overlay mode in the CPU mode and full build in the GPU mode. This patch makes some simple adjustments to pass the arguments per-triple. This slightly extends the existing `-DRUNTIMES_` argument support to also transform any extra CMake inputs rather than just the passed CMake variables.

… A) or 0 (llvm#82280) - Fixes: llvm#82177 - Alive2: https://alive2.llvm.org/ce/z/Q7mMC3

…#84667) Summary: The libc build has a few utilties that need to be built before we can do everything in the full build. The one requirement currently is the `libc-hdrgen` binary. If we are doing a full build runtimes mode we first add `libc` to the projects list and then only use the `projects` portion to buld the `libc` portion. We also use utilities for the GPU build, namely the loader utilities. Previously we would build these tools on-demand inside of the cross-build, which tool some hacky workarounds for the dependency finding and target triple. This patch instead just builds them similarly to libc-hdrgen and then passses them in. We now either pass it manually it it was built, or just look it up like we do with the other `clang` tools. Depends on llvm#84664

Re-land 634b024. T1 allow for an optional registers list, the register list must be {d0-d15}. T2 define a mandatory register list, the register list must be {d0-d31}. The requirements for T1/T2 are as follows: T1 T2 Require: v8-M.Main, v8.1-M.Main, secure state secure state 16 D Regs valid valid 32 D Regs UNDEFINED valid No D Regs NOP NOP

Someone pointed out a typo (Value* RsrcRes = RsrcRes = ...) in PR the address space 7 lowering, this commit fixes it.

…at-pointers Fixes failing tests after llvm#84308 LLVM :: CodeGen/AMDGPU/GlobalISel/irtranslator-non-integral-address-spaces-vectors.ll LLVM :: CodeGen/AMDGPU/GlobalISel/irtranslator-non-integral-address-spaces.ll LLVM :: CodeGen/AMDGPU/lower-buffer-fat-pointers-calls.ll LLVM :: CodeGen/AMDGPU/lower-buffer-fat-pointers-constants.ll LLVM :: CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll LLVM :: CodeGen/AMDGPU/pal-metadata-3.0.ll Buildbots: https://lab.llvm.org/buildbot/#/builders/121/builds/39855

Previously, we always used the wave64 encodings for EH registers regardless of whether we were compiling for wave32, which seems wrong. We don't seem to use the EH registers, so this commit is mostly just about papering over code that converts from non-EH dwarf registers to LLVM registers while claiming they are EH dwarf registers. That kind of code should be okay on any non-darwin target (since darwin is the only target that uses a different encoding for EH registers).

…#84757) Test failures fixed in d0117b7

…fmv.f.s [nfc] (llvm#84563) The prior naming scheme is incredibly hard to make sense out of. I suspect the usage was actually backwards from intent - though that didn't matter for any in tree schedule model.

…pp` (llvm#83734) The `parse.pass.cpp` tests doen't need to call `test_format_context_create` to create a `basic_format_context`, so they shouldn't include `test_format_context.h`. The `to_address` mechanism works around the iterator debugging mechanisms of MSVC STL. Related to [LWG3989](https://cplusplus.github.io/LWG/issue3989). Discovered when implementing `formatter<tuple>` in MSVC STL. With the inclusion removed, `std/utilities/format/format.tuple/parse.pass.cpp` when using enhanced MSVC STL (and `/utf-8` option for MSVC).

…y are meant to be used (llvm#84707) This patch fixes the unconditional forward-declarations of ABI-functions in exception_ptr.h, and makes it dependent on the availability macro, as it should've been from the beginning. The declarations being unconditional break the build with libcxxrt before 045c52ce8 [1], now they are opt-out. [1]: libcxxrt/libcxxrt@045c52c

On some architectures (currently gfx90a, gfx94*, and gfx10**), we can implement an LDS barrier using compiler intrinsics instead of inline assembly, improving optimization possibilities and decreasing the fragility of the underlying code. Other AMDGPU chipsets continue to require inline assembly to implement this barrier, as, by the default, the LLVM backend will insert waits on global memory (s_waintcnt vmcnt(0)) before barriers in order to ensure memory watchpoints set by debuggers work correctly. Use of amdgpu.lds_barrier, on these architectures, imposes a tradeoff between debugability and performance. The documentation, as well as the generated inline assembly, have been updated to explicitly call attention to this fact. For chipsets that did not require the inline assembly hack, we move to the s.waitcnt and s.barrier intrinsics, which have been added to the ROCDL dialect. The magic constants used as an argument to the waitcnt intrinsic can be derived from llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

llvm#84380) Runtime unit tests used `new[]` to allocate memory, which then was released using `free`. This was detected by address sanitizer.

…#84770) MachineRegisterInfo already knows the MF so there is no need to pass it in as an argument.

…ing it to performance_testing. (llvm#84646) Removing all the diff tests.

AMP999 and others added 30 commits March 11, 2024 04:23

Fix broken build after llvm#84678 (sorry).

099be86

[RISCV] Handle FP riscv_masked_strided_load with 0 stride. (llvm#84576)

d8d2dea

Previously, we tried to create an integer extending load. We need to a non-extending FP load instead. Fixes llvm#84541.

[AMDGPU] Update LiveInterval def index for early-clobber (llvm#79285)

d9e6aa7

On converting an instruction to an early-clobber definition in convertToThreeAddress, we must also update live intervals for the register to start at the early-clobber index.

[RISCV] Place mergeable small read only data into srodata section (ll…

b7f97d3

…vm#82214) Small mergeable read only data was place on the sdata before, but it also means it lose the mergeable property, which means lose some code size optimization opportunity during link time.

[ELF] Move getSymbol/getRelocTargetSym from ObjFile<ELFT> to InputFil…

f645560

…e. NFC This removes lots of unneeded `template getFile<ELFT>()`.

[SelectionDAG] Switch to LiveRegUnits (llvm#84197)

4e0e9b1

Revert "[TypePromotion] Support positive addition amounts in isSafeWr…

561ddb1

…ap. (llvm#81690)" This reverts commit 0813b90. Fixes miscompile reported in llvm#84718.

Revert "[C++20][Coroutines] Lambda-coroutine with operator new in pro…

0f501c3

…mise_type (llvm#84193)" This reverts commit 35d3b33. See the comments in llvm#84193 for details

Typo: ponit

d3ec8c2

[RISCV] Move NodeExtensionHelper assert to getOrCreateExtendedOp. NFC

0ef61ed

Move the narrow types assert from the ZERO_EXTEND/SIGN_EXTEND case in fillUpExtensionSupport to getOrCreateExtendedOp so we check the other nodes too.

[VPlan] Funnel recipe insert* through VPBasicBlock::insert (NFCI).

9277a32

This allows relying on VPBasicBlock::insert to make sure insertion is well formed, i.e. by updating the recipe's parent as well as other potential invariants in the future.

[gn build] Port ec2875c

483c336

[X86] Assert that the supportedVectorShift* helpers are only called w…

7b90a67

…ith generic shift opcodes. NFC.

Reorder fields for better packing (llvm#77998)

66f0984

The RelocationEntry's fields are poorly ordered when considering padding. This reordering reduces the size from 56 bytes to 40 bytes (on LP64).

[lldb][Docs] Add libxml2 to apt install command

e77f5fe

[RemoveDIs] Add additional debug-mode verifier checks (llvm#84308)

a84eb24

Separated from llvm#83251

ldionne and others added 25 commits March 11, 2024 09:51

[libc++] Add missing include in test (llvm#84579)

02e0b7d

That test is using std::toupper.

[NFC] Remove duplicate 'see' in CMake.rst (llvm#84680)

b1be69f

[InstCombine] Fold usub_sat((sub nuw C1, A), C2) to usub_sat(C1 - C2,…

3f302ea

… A) or 0 (llvm#82280) - Fixes: llvm#82177 - Alive2: https://alive2.llvm.org/ce/z/Q7mMC3

[NFC][AMDGPU] Fix redundant assignment from llvm#77952 (llvm#84586)

769eab4

Someone pointed out a typo (Value* RsrcRes = RsrcRes = ...) in PR the address space 7 lowering, this commit fixes it.

Reapply "[RemoveDIs] Add additional debug-mode verifier checks" (llvm…

2953d9c

…#84757) Test failures fixed in d0117b7

[RISCV] Rename schedule classes for vmv.s.x, vmv.x.s, vfmv.s.f, and v…

f14224d

…fmv.f.s [nfc] (llvm#84563) The prior naming scheme is incredibly hard to make sense out of. I suspect the usage was actually backwards from intent - though that didn't matter for any in tree schedule model.

[AMDGPU] Make generic versioning docs easier to find (llvm#84761)

63c77d8

[OpenMP] Remove dead code of checking int > INT_MAX (llvm#83305)

b4e39ad

[OpenMP] Make sure ptr is used after NULL check (llvm#83304)

1ed463d

[OpenMP] Remove unnecessary check of ap (llvm#83303)

de4d701

[OpenMP] Fixup while loops to avoid bad NULL check (llvm#83302)

9b1c496

[flang][unittests] Use malloc when memory will be deallcated with free (

cd55046

llvm#84380) Runtime unit tests used `new[]` to allocate memory, which then was released using `free`. This was detected by address sanitizer.

[CodeGen] Do not pass MF into MachineRegisterInfo methods. NFC. (llvm…

63a5dc4

…#84770) MachineRegisterInfo already knows the MF so there is no need to pass it in as an argument.

[libc][NFC] Clean up test/src/math/differential_testing folder, renam…

d99bb01

…ing it to performance_testing. (llvm#84646) Removing all the diff tests.

[AutoBump] Merge with de4d701

437d00a

[AutoBump] Merge with 63a5dc4

6b90052

[AutoBump] Merge with d99bb01

65cf2cd

Base automatically changed from bump_to_fab2bb8b to feature/fused-ops August 13, 2024 11:11

An error occurred while trying to automatically change base from bump_to_fab2bb8b to feature/fused-ops August 13, 2024 11:11

cferry-AMD approved these changes Aug 13, 2024

View reviewed changes

mgehre-amd merged commit 94924fc into feature/fused-ops Aug 14, 2024
10 checks passed

mgehre-amd deleted the bump_to_d99bb014 branch August 14, 2024 13:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with d99bb014 (3) #250

[AutoBump] Merge with d99bb014 (3) #250

mgehre-amd commented Aug 12, 2024

[AutoBump] Merge with d99bb014 (3) #250

[AutoBump] Merge with d99bb014 (3) #250

Conversation

mgehre-amd commented Aug 12, 2024