forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoBump] Merge with 6cf3e7d0 (Aug 14) (2) #355
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ADS is not defined When LLVM_ENABLE_THREADS is not defined, llvm::get_threadid returns 0 which makes this test case fail. This is a pretty niche setting, Linaro uses it to stop lld crashing our 32 bit containers. So the test will get plenty of runs elsewhere. In lldb's code it's not getting the current thread ID anyway, it's using a value it got from ptrace. So even if that copy of lldb was built with LLVM_ENABLE_THREADS off, it should still be able to debug threads.
On PlayStation, allow users to supply -static to the linker, via the driver. An initial step. Later changes will have the PS5 driver supply additional options to the linker, if and when -static is passed. SIE tracker: TOOLCHAIN-16704
We only need to see that 1 frame of the stack is in user code. No need to carry on looking. Doing so actually caused a test failure on Armv8 Ubuntu Jammy where a libc function does not have a display name. I'm sure I'm going to get stung by this elsewhere, but for this test, breaking early sidesteps the problem.
The Mul factor was zero-extended here, resulting in incorrect results for integers larger than 64-bit. As we currently only multiply by 1 or -1, just split this into two cases -- there's no need for a full multiplication here. Fixes llvm#102597.
Implement FEAT_SME_B16B16 to enable ZA-targeting non-widening SME BFloat16 instructions. Remove the now redundant FEAT_B16B16 which has been replaced by FEAT_SVE_B16B16 and FEAT_SME_B16B16 (this commit), see llvm#101480 for the details and reasoning of this change to LLVM. FEAT_SME_B16B16 is documented under the latest Armv9.4 feature documentation: https://developer.arm.com/documentation/109697/0100/Feature-descriptions/The-Armv9-4-architecture-extensio - Changes to Clang AArch64 frontend - Change target guard of SME2 ZA-targeting non-widening BFloat16 intrinsics to 'sme-b16b16' - Changes to LLVM AArch64 backend - llvm/lib/Target/AArch64/AArch64Features.td - Create FeatureSMEB16B16, which implies FeatureSME2 and FeatureSVEB16B16 - Remove FeatureB16B16 - Fix description of FeatureSVEB16B16 - llvm/lib/Target/AArch64/AArch64InstrInfo.td - Create HasSMEB16B16 predicate - llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td - Change predictication of SME2 ZA-targeting non-widening BFloat16 instructions to new HasSMEB16B16 - llvm/lib/Target/AArch64/AArch64.td - Add HasSMEB16B16 to SME2Unsupported (FEAT_SME_B16B16 implies FEAT_SME2) - llvm/lib/AArch64/AsmParser/AArch64AsmParser.cpp - Remove flag 'b16b16' mapping to removed FeatureB16B16 - Add flag 'sme-b16b16' mapping to new FeatureSMEB16B16 - Changes to LLVM unit tests - llvm/unittests/TargetParser/TargetParserTest.cpp - Add new sme-b16b16 flag to existing target parser tests - Add tests for the sme-b16b16 dependencies: - 'sme-b16b16' should enable 'sme2', 'sve-b16b16'. - Remove 'b16b16' from bf16 dependency test - Added MC tests - llvm/test/MC/AArch64/SME2p1 - To ensure that ZA-targeting multi-vector non-widening BFloat16 instructions are enabled by +sme-b16b16, and that this feature is removed by +nosme-b61b6. - Modidified tests - All CodeGen, Semantic, and MC tests that are effected by the removal of 'b16b16', have been modified to supply and/or expect 'sme-b16b16' where appropriate.
Include chain of ops feeding inductions in cost precomputation for inductions, not just the induction increment. In VPlan, those instructions will be cleaned up, as both phi and increment are generated by VPWidenIntOrFpInductionRecipe independently. Fixes llvm#101337.
This PR fixes emission of valid OpLifestart/OpLifestop instructions. According to https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpLifetimeStart: "Size must be 0 if Pointer is a pointer to a non-void type or the Addresses [capability](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#Capability) is not declared.". The `Size` argument is set the corresponding intrinsics arguments, so Size is not zero we must ensure that Pointer has the required type by inserting a bitcast if needed.
…01732) This PR contains changes in virtual register processing aimed to improve correctness of emitted MIR between passes from the perspective of MachineVerifier. This potentially helps to detect previously missed flaws in code emission and harden the test suite. As a measure of correctness and usefulness of this PR we may use a mode with expensive checks set on, and MachineVerifier reports problems in the test suite. In order to satisfy Machine Verifier requirements to MIR correctness not only a rework of usage of virtual registers' types and classes is required, but also corrections into pre-legalizer and instruction selection logics. Namely, the following changes are introduced: * scalar virtual registers have proper bit width, * detect register class by SPIR-V type, * add a superclass for id virtual register classes, * fix Tablegen rules used for instruction selection, * fixes of minor existed issues (missed flag for proper representation of a null constant for OpenCL vs. HLSL, wrong usage of integer virtual registers as a synonym of any non-type virtual register).
…ble. Currently SLP vectorizer tries to keep only GEPs as scalar, if they are vectorized but used externally. Same approach can be used for all scalar values. This patch tries to keep original scalars if all its operands remain scalar or externally used, the cost of the original scalar is lower than the cost of the extractelement instruction, or if the number of externally used scalars in the same entry is power of 2. Last criterion allows better revectorization for multiply used scalars. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: llvm#100904
Now that the branches to the scalar epilogue are modeled in VPlan directly, check the VPlan to see if a scalar epilogue is required. Preparation for llvm#100658.
…te unitAttr. (llvm#102340) Adds a new ComposableOpInterface for OpenMP operations that can represent a single leaf of a composite OpenMP construct. This is patch 1/2 in a series of patches. Patch 2 - llvm#102341.
… add verifier checks (llvm#102341) This patch sets the omp.composite unit attr for composite wrapper ops and also add appropriate checks to the verifiers of supported ops for the presence/absence of the attribute. This is patch 2/2 in a series of patches. Patch 1 - llvm#102340.
Just a simple check to ignore Inline asm fwait insertion Fixes llvm#101613
Split out from llvm#80309.
Split out from llvm#80309.
Split out from llvm#80309.
Split out from llvm#80309.
Split out from llvm#80309.
llvm#101283) `lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged. Fixes llvm#90923, fixes llvm#56346. This is the part 1 of the replacement of llvm#100670. In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fiх llvm#97537.
…ss (llvm#102633) Add missing math.atan to spirv.CL.atan and math.atan2 to spirv.CL.atan2 in MathToSPIRV. Add math.atan to spirv.GL.atan too.
This has been flaky on our Windows on Arm bot: https://lab.llvm.org/buildbot/#/builders/141/builds/1497 Despite passing when first landed.
…m#102824) We were calling initialize() unconditionally when copying the union.
… boolean argument (llvm#102902) This PR resolves a TODO in `generateGroupInst()` (`lib/Target/SPIRV/SPIRVBuiltins.cpp`) and Issues llvm#97311 and llvm#97312 by implementing support for non-const arguments in a Group builtin that requires a boolean argument.
…lvm#101353) This specifically handles the case of a transpose from a vector type like `vector<8x[4]xf32>` to `vector<[4]x8xf32>`. Such transposes occur fairly frequently when scalably vectorizing `linalg.generic`s. There is no direct lowering for these (as types like `vector<[4]x8xf32>` cannot be represented in LLVM-IR). However, if the only use of the transpose is a write, then it is possible to lower the `transfer_write(transpose)` as a VLA loop. Example: ```mlir %transpose = vector.transpose %vec, [1, 0] : vector<4x[4]xf32> to vector<[4]x4xf32> vector.transfer_write %transpose, %dest[%i, %j] {in_bounds = [true, true]} : vector<[4]x4xf32>, memref<?x?xf32> ``` Becomes: ```mlir %c1 = arith.constant 1 : index %c4 = arith.constant 4 : index %c0 = arith.constant 0 : index %0 = vector.extract %arg0[0] : vector<[4]xf32> from vector<4x[4]xf32> %1 = vector.extract %arg0[1] : vector<[4]xf32> from vector<4x[4]xf32> %2 = vector.extract %arg0[2] : vector<[4]xf32> from vector<4x[4]xf32> %3 = vector.extract %arg0[3] : vector<[4]xf32> from vector<4x[4]xf32> %vscale = vector.vscale %c4_vscale = arith.muli %vscale, %c4 : index scf.for %idx = %c0 to %c4_vscale step %c1 { %4 = vector.extract %0[%idx] : f32 from vector<[4]xf32> %5 = vector.extract %1[%idx] : f32 from vector<[4]xf32> %6 = vector.extract %2[%idx] : f32 from vector<[4]xf32> %7 = vector.extract %3[%idx] : f32 from vector<[4]xf32> %slice_i = affine.apply #map(%idx)[%i] %slice = vector.from_elements %4, %5, %6, %7 : vector<4xf32> vector.transfer_write %slice, %arg1[%slice_i, %j] {in_bounds = [true]} : vector<4xf32>, memref<?x?xf32> } ```
Add small test that I missed adding to llvm#102341.
…m#102686) The extra field in the descriptor carries multiple information and cannot be deducted anymore when doing a reboxing. This patch updates the codegen to retrieve the extra field value from the inboc and set it in the new box.
This sorts DWARF op descriptions in `DWARFExpression.cpp` by opcode and version, packing the standardised ops together. A few ops also had the wrong version listed, so this fixes those versions as well. (The version does not appear to actually be used currently.)
There's only a single user (MCMachOStreamer), so it makes more sense to move the version emission to the source of the data.
In order to guarantee that extracting 64 bits doesn't require more than 2 words, the word size would need to be 64 bits or more. If the word size was smaller than 64, like 32, you may need to read 3 words to get 64 bits.
…lvm#102482) This PR adds a field to the pass builder options struct, `AAPipeline`, exposed through a C API `LLVMPassBuilderOptionsSetAAPipeline`, that is used to set an alias analysis pipeline to be used in stead of the default one. x-ref https://discourse.llvm.org/t/newpm-c-api-questions/80598
) Inside computeConstantDifference(), handle the case where both sides are of the form `C * %x`, in which case we can strip off the common multiplication (as long as we remember to multiply by it for the following difference calculation). There is an obvious alternative implementation here, which would be to directly decompose multiplies inside the "Multiplicity" accumulation. This does work, but I've found this to be both significantly slower (because everything has to work on APInt) and more complex in implementation (e.g. because we now need to match back the new More/Less with an arbitrary factor) without providing more power in practice. As such, I went for the simpler variant here. This is the last step to make computeConstantDifference() sufficiently powerful to replace existing uses of `cast<SCEVConstant>(getMinusSCEV())` with it.
Avoid one heap allocation per function per constructed TLI. The BitVector is never resized, so a bitset is sufficient. Pull Request: llvm#103411
This metadata is queried quite often, so avoiding frequent lookups in the hash map is beneficial. Therefore, cache the metadata node directly in the module. Pull Request: llvm#103410
This only has a single use and is equally well served by the existing constructor -- blocks of a loop are already an array. Pull Request: llvm#103399
Aggregate type specification doesn't have the size component. Don't abuse LayoutAlignElem to avoid confusion.
This is a reland of llvm#96287. This change makes tests in logf128.ll ignore the sign of NaNs for negative value tests and moves an #include <cmath> to be blocked behind #ifndef _GLIBCXX_MATH_H.
Syntacore SCR4 is a microcontroller-class processor core that has much in common with SCR3, but also supports F and D extensions. Overview: https://syntacore.com/products/scr4 Syntacore SCR5 is an entry-level Linux-capable 32/64-bit RISC-V processor core which scheduling model almost match SCR4. Overview: https://syntacore.com/products/scr5 Co-authored-by: Dmitrii Petrov <dmitrii.petrov@syntacore.com> Co-authored-by: Anton Afanasyev <anton.afanasyev@syntacore.com>
A truncate is considered saturated if no additional conversion is required between the target and return values. If the target is saturated when attempting to truncate from a vector, there is an opportunity to optimize it. Previously, each architecture had its own attempt at optimization, leading to redundant code. This patch implements common logic by introducing three new ISDs: `ISD::TRUNCATE_SSAT_S`: When the operand is a signed value and the range of values matches the range of signed values of the destination type. `ISD::TRUNCATE_SSAT_U`: When the operand is a signed value and the range of values matches the range of unsigned values of the destination type. `ISD::TRUNCATE_USAT_U`: When the operand is an unsigned value and the range of values matches the range of unsigned values of the destination type. These ISDs indicate a saturated truncate. Fixes llvm#85903
This patch removes the `ClauseProcessor::processDefault` method due to it having been implemented in `DataSharingProcessor::collectDefaultSymbols` instead. Also, some `genXyzClauses` functions are updated to avoid triggering TODO errors for clauses not supported by the corresponding construct and to keep alphabetical sorting on the order in which clauses are processed.
Add a test where diff checks are generated initial and then re-generated when re-trying with runtime checks. At the moment, the order doesn't match the order they are created in, as the DiffChecks field in LAI isn't cleared as other fields holding runtime checks.
DiffChecks will get populated twice when re-trying with runtime checks. Without clearing it like the regular Checks vector, it will contain some duplicates and the order the checks are created may not match the order the checks have been queued when re-trying.
…#101472) This patch add `void* PlatformArgs` parameter to `__init_riscv_feature_bits`. `PlatformArgs` allows the platform to provide pre-computed data and access it without extra effort. For example, Linux could pass the vDSO object to avoid an extra system call. ``` __init_riscv_feature_bits() -> __init_riscv_feature_bits(void *PlatformArgs) ```
…mission of the OpGroupBroadcast instruction (llvm#103050) This PR addresses a TODO in lib/Target/SPIRV/SPIRVInstructionSelector.cpp by adding implementation of the non-const G_BUILD_VECTOR, and fix emission of the OpGroupBroadcast instruction for the case when the `..._group_broadcast` builtin has more than one `local_id` argument and `OpGroupBroadcast` requires a newly constructed vector with 2 or 3 components instead of originally passed series of `local_id` arguments. This PR may resolve llvm#97310 if the reason for the reported fail is an incorrectly generated OpGroupBroadcast instruction that was definitely a case. Existing test is hardened and a new test is added to cover this special case of the OpGroupBroadcast instruction emission.
llvm#101449) …ImplID This patch 1. remove the vendorId from `__riscv_vendor_feature_bits` 2. Define a new structure for vendorID, ArchID and ImplID 3. Update the relate init code
This reverts commit 3cab7c5. The modified test fails on ppc64le buildbots.
Split out from llvm#98608.
…lvm#101827) Adding a pass that is expected to run after the deallocation pipeline and will move buffer deallocations right after their last user or dependency, thus optimizing the allocation liveness.
This also adds a default constructor and a few uses of it.
cferry-AMD
approved these changes
Sep 30, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.