forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoBump] Merge with d99bb014 (3) #250
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…lvm#77092) Consistent with `__is_trivially_copyable(volatile int) == true` and `__is_trivially_relocatable(volatile Trivial) == true`, `__is_trivially_relocatable(volatile int)` should also be `true`. Fixes llvm#77091 [clang] [test] New tests for __is_trivially_relocatable(cv-qualified type)
…g constraint expression Previously we disabled to compute ODR hash for declarations from the global module fragment. However, we missed the case that the functions lives in the concept requiments (see the attached the test files for example). And the mismatch causes the potential crashment. Due to we will set the function body as lazy after we deserialize it and we will only take its body when needed. However, we don't allow to take the body during deserializing. So it is actually potentially problematic if we set the body as lazy first and computing the hash value of the function, which requires to deserialize its body. So we will meet a crash here. This patch tries to solve the issue by not taking the body of the function from GMF. Note that we can't skip comparing the constraint expression from the GMF directly since it is an key part of the function selecting and it may be the reason why we can't return 0 directly for `FunctionDecl::getODRHash()` from the GMF.
Previously, we tried to create an integer extending load. We need to a non-extending FP load instead. Fixes llvm#84541.
On converting an instruction to an early-clobber definition in convertToThreeAddress, we must also update live intervals for the register to start at the early-clobber index.
…vm#82214) Small mergeable read only data was place on the sdata before, but it also means it lose the mergeable property, which means lose some code size optimization opportunity during link time.
…e. NFC This removes lots of unneeded `template getFile<ELFT>()`.
…#78564) Extend repairIntervalsInRange to completely recompute the interva for a register if subregister defs exist without precise subrange matches (LaneMask exactly matching subregister). This occurs when register sequences are lowered to copies such that the size of the copies do not match any uses of the subregisters formed (i.e. during twoaddressinstruction). The subranges without this change are probably legal, but do not match those generated by live interval computation. This creates problems with other code that assumes subranges precisely cover all subregisters defined, e.g. shrinkToUses().
If we're in a blocking call, we need to run the signal immediately, as the call may not return for a very long time (if ever). Not running the handler can cause deadlocks if the rest of the program waits (in one way or another) for the signal handler to execute. I've gone through the list of functions in sanitizer_common_interceptors and marked as blocking those that I know can block, but I don't claim the list to be exhaustive. In particular, I did not mark libc FILE* functions as blocking, because these can end up calling user functions. To do that correctly, /I think/ it would be necessary to clear the "is in blocking call" flag inside the fopencookie wrappers. The test for the bug (deadlock) uses the read call (which is the one that I ran into originally), but the same kind of test could be written for any other blocking syscall.
…ap. (llvm#81690)" This reverts commit 0813b90. Fixes miscompile reported in llvm#84718.
…ble (llvm#84133) This avoids a known libFormat bug where the heuristic can OOM on certain large files (particularly single-header libraries such as miniaudio.h). The OOM will still happen on affected files if you actually try to format them (this is harder to avoid since the underlyting issue affects the actual formatting logic, not just the language-guessing heuristic), but at least it's avoided during non-modifying operations like hover, and modifying operations that do local formatting like code completion. Fixes clangd/clangd#719 Fixes clangd/clangd#1384 Fixes llvm#70945
If all variables in the module are absolute, this means we're running the pass again on an already lowered module, and that works. If none of them are absolute, lowering can proceed as usual. Only diagnose cases where we have a mix of absolute/non-absolute GVs, which means we added LDS GVs after lowering, which is broken. See llvm#81491 Split from llvm#75333
llvm#82629 added additional overloads to `replaceAllUsesWith` and `replaceUsesWithIf`. This caused a build breakage with MSVC when called with ops that can implicitly convert to `Value`. ``` external/llvm-project/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp(881): error C2666: 'mlir::RewriterBase::replaceAllUsesWith': 2 overloads have similar conversions external/llvm-project/mlir/include\mlir/IR/PatternMatch.h(631): note: could be 'void mlir::RewriterBase::replaceAllUsesWith(mlir::Operation *,mlir::ValueRange)' external/llvm-project/mlir/include\mlir/IR/PatternMatch.h(626): note: or 'void mlir::RewriterBase::replaceAllUsesWith(mlir::ValueRange,mlir::ValueRange)' external/llvm-project/mlir/include\mlir/IR/PatternMatch.h(616): note: or 'void mlir::RewriterBase::replaceAllUsesWith(mlir::Value,mlir::Value)' external/llvm-project/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp(882): note: while trying to match the argument list '(mlir::tensor::ExtractSliceOp, T)' with [ T=mlir::Value ] ``` Note: The LLVM build bots (Linux and Windows) did not break, this seems to be an issue with `Tools\MSVC\14.29.30133\bin\HostX64\x64\cl.exe`. This change renames the newly added overloads to `replaceAllOpUsesWith` and `replaceOpUsesWithIf`.
A potentially erroneous code construction with the work we've done to remove debug intrinsics, is inserting PHIs into blocks when the position hasn't been "sourced correctly". Specifically, if you have: %foo = PHI #dbg_value %bar = add i32... And plan on inserting a new PHI, you have to use the iterator form of `getFirstNonPHI` or getFirstInsertionPt (or begin()) to acquire an iterator that tells the debug-info maintenance code "this is supposed to be at the start of the block, put it in front of #dbg_value". We can detect call-sites that aren't doing this at runtime, and should do with this assertion. It might invalidate code that's doing something very unexpected, like walking backwards to find a PHI, then going forwards, then inserting: however that's just an inefficient way of calling `getFirstNonPHI`.
…mise_type (llvm#84193)" This reverts commit 35d3b33. See the comments in llvm#84193 for details
Modifies the privatization logic so that the emitted code only used the HLFIR base (i.e. SSA value `#0` returned from `hlfir.declare`). Before that, that emitted privatization logic was a mix of using `#0` and `#1` which leads to some difficulties trying to move to delayed privatization (see the discussion on llvm#84033).
This commit provides better cost estimates for the llvm.vector.reduce.add intrinsic on SystemZ. These apply to all vector lengths and integer types up to i128. For integer types larger than i128, we fall back to the default cost estimate. This has the effect of lowering the estimated costs of most common instances of the intrinsic. The expected performance impact of this is minimal with a tendency to slightly improve performance of some benchmarks. This commit also provides a test to check the proper computation of the new estimates, as well as the fallback for types larger than i128.
…pes. NFCI (llvm#84125) I noticed this from a discrepancy in fillUpExtensionSupport between how we apparently need to check for legal types for ISD::{ZERO,SIGN}_EXTEND, but we don't need to for RISCVISD::V{Z,S}EXT_VL. Prior to llvm#72340, combineBinOp_VLToVWBinOp_VL only ran after type legalization because it only operated on _VL nodes. _VL nodes are only emitted during op legalization, which takes place **after** type legalization, which is presumably why the existing code didn't need to check for legal types. After llvm#72340 we now handle generic ops like ISD::ADD that exist before op legalization and thus **before** type legalization. This meant that we needed to add extra checks that the narrow type was legal in llvm#76785. I think the easiest thing to do here is to just maintain the invariant that the types are legal and only run the combine after type legalization.
Move the narrow types assert from the ZERO_EXTEND/SIGN_EXTEND case in fillUpExtensionSupport to getOrCreateExtendedOp so we check the other nodes too.
This allows relying on VPBasicBlock::insert to make sure insertion is well formed, i.e. by updating the recipe's parent as well as other potential invariants in the future.
…lvm#83126) RuntimeInterfaceBuilder wires up JITed expressions with the hardcoded Interpreter runtime. It's used only for value printing right now, but it is not limited to that. The default implementation focuses on an evaluation process where the Interpreter has direct access to the memory of JITed expressions (in-process execution or shared memory). We need a different approach to support out-of-process evaluation or variations of the runtime. It seems reasonable to expose a minimal interface for it. The new RuntimeInterfaceBuilder is an abstract base class in the public header. For that, the TypeVisitor had to become a component (instead of inheriting from it). FindRuntimeInterface() was adjusted to return an instance of the RuntimeInterfaceBuilder and it can be overridden from derived classes.
…ith generic shift opcodes. NFC.
Select POSIX 2008 standard to avoid including Darwin extensions. Otherwise, Darwin's math.h header defines HUGE, which conflicts with Flang's HUGE function. This started happening after 4762c65 (llvm#82443), that added the "utility" include, which seems to include "math.h".
The RelocationEntry's fields are poorly ordered when considering padding. This reordering reduces the size from 56 bytes to 40 bytes (on LP64).
This fixes tests that are going to be upstreamed in the near future. Currently they are failing downstream in the Apple open source fork. Failing tests Clang :: APINotes/retain-count-convention.m Clang :: APINotes/types.m Clang :: APINotes/versioned-multi.c Clang :: APINotes/versioned.m Since 2e5af56 got merged, Clang now enables `LangOpts.APINotesModules` when reading a precompiled module that was built with API Notes enabled. This is correct. The logic in APINotesManager needs to be adjusted to handle this. rdar://123526142
That test is using std::toupper.
…lvm#84664) Summary: Currently we have a conditional that turns the full build on by default if it is a default target. This used to work fine when the GPU was the only target that was ever present. However, we've recently changed to allow building multiple of these at the same time. That means we should have the ability to build overlay mode in the CPU mode and full build in the GPU mode. This patch makes some simple adjustments to pass the arguments per-triple. This slightly extends the existing `-DRUNTIMES_` argument support to also transform any extra CMake inputs rather than just the passed CMake variables.
… A) or 0 (llvm#82280) - Fixes: llvm#82177 - Alive2: https://alive2.llvm.org/ce/z/Q7mMC3
…#84667) Summary: The libc build has a few utilties that need to be built before we can do everything in the full build. The one requirement currently is the `libc-hdrgen` binary. If we are doing a full build runtimes mode we first add `libc` to the projects list and then only use the `projects` portion to buld the `libc` portion. We also use utilities for the GPU build, namely the loader utilities. Previously we would build these tools on-demand inside of the cross-build, which tool some hacky workarounds for the dependency finding and target triple. This patch instead just builds them similarly to libc-hdrgen and then passses them in. We now either pass it manually it it was built, or just look it up like we do with the other `clang` tools. Depends on llvm#84664
Re-land 634b024. T1 allow for an optional registers list, the register list must be {d0-d15}. T2 define a mandatory register list, the register list must be {d0-d31}. The requirements for T1/T2 are as follows: T1 T2 Require: v8-M.Main, v8.1-M.Main, secure state secure state 16 D Regs valid valid 32 D Regs UNDEFINED valid No D Regs NOP NOP
Someone pointed out a typo (Value* RsrcRes = RsrcRes = ...) in PR the address space 7 lowering, this commit fixes it.
…at-pointers Fixes failing tests after llvm#84308 LLVM :: CodeGen/AMDGPU/GlobalISel/irtranslator-non-integral-address-spaces-vectors.ll LLVM :: CodeGen/AMDGPU/GlobalISel/irtranslator-non-integral-address-spaces.ll LLVM :: CodeGen/AMDGPU/lower-buffer-fat-pointers-calls.ll LLVM :: CodeGen/AMDGPU/lower-buffer-fat-pointers-constants.ll LLVM :: CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll LLVM :: CodeGen/AMDGPU/pal-metadata-3.0.ll Buildbots: https://lab.llvm.org/buildbot/#/builders/121/builds/39855
Previously, we always used the wave64 encodings for EH registers regardless of whether we were compiling for wave32, which seems wrong. We don't seem to use the EH registers, so this commit is mostly just about papering over code that converts from non-EH dwarf registers to LLVM registers while claiming they are EH dwarf registers. That kind of code should be okay on any non-darwin target (since darwin is the only target that uses a different encoding for EH registers).
…fmv.f.s [nfc] (llvm#84563) The prior naming scheme is incredibly hard to make sense out of. I suspect the usage was actually backwards from intent - though that didn't matter for any in tree schedule model.
…pp` (llvm#83734) The `parse.pass.cpp` tests doen't need to call `test_format_context_create` to create a `basic_format_context`, so they shouldn't include `test_format_context.h`. The `to_address` mechanism works around the iterator debugging mechanisms of MSVC STL. Related to [LWG3989](https://cplusplus.github.io/LWG/issue3989). Discovered when implementing `formatter<tuple>` in MSVC STL. With the inclusion removed, `std/utilities/format/format.tuple/parse.pass.cpp` when using enhanced MSVC STL (and `/utf-8` option for MSVC).
…y are meant to be used (llvm#84707) This patch fixes the unconditional forward-declarations of ABI-functions in exception_ptr.h, and makes it dependent on the availability macro, as it should've been from the beginning. The declarations being unconditional break the build with libcxxrt before 045c52ce8 [1], now they are opt-out. [1]: libcxxrt/libcxxrt@045c52c
On some architectures (currently gfx90a, gfx94*, and gfx10**), we can implement an LDS barrier using compiler intrinsics instead of inline assembly, improving optimization possibilities and decreasing the fragility of the underlying code. Other AMDGPU chipsets continue to require inline assembly to implement this barrier, as, by the default, the LLVM backend will insert waits on global memory (s_waintcnt vmcnt(0)) before barriers in order to ensure memory watchpoints set by debuggers work correctly. Use of amdgpu.lds_barrier, on these architectures, imposes a tradeoff between debugability and performance. The documentation, as well as the generated inline assembly, have been updated to explicitly call attention to this fact. For chipsets that did not require the inline assembly hack, we move to the s.waitcnt and s.barrier intrinsics, which have been added to the ROCDL dialect. The magic constants used as an argument to the waitcnt intrinsic can be derived from llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
llvm#84380) Runtime unit tests used `new[]` to allocate memory, which then was released using `free`. This was detected by address sanitizer.
…#84770) MachineRegisterInfo already knows the MF so there is no need to pass it in as an argument.
…ing it to performance_testing. (llvm#84646) Removing all the diff tests.
An error occurred while trying to automatically change base from
bump_to_fab2bb8b
to
feature/fused-ops
August 13, 2024 11:11
cferry-AMD
approved these changes
Aug 13, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.