[AutoBump] Merge with 6cf3e7d0 (Aug 14) (2) #355

mgehre-amd · 2024-09-19T22:35:36Z

No description provided.

…ADS is not defined When LLVM_ENABLE_THREADS is not defined, llvm::get_threadid returns 0 which makes this test case fail. This is a pretty niche setting, Linaro uses it to stop lld crashing our 32 bit containers. So the test will get plenty of runs elsewhere. In lldb's code it's not getting the current thread ID anyway, it's using a value it got from ptrace. So even if that copy of lldb was built with LLVM_ENABLE_THREADS off, it should still be able to debug threads.

…uced in llvm#99732 (llvm#102716)

On PlayStation, allow users to supply -static to the linker, via the driver. An initial step. Later changes will have the PS5 driver supply additional options to the linker, if and when -static is passed. SIE tracker: TOOLCHAIN-16704

We only need to see that 1 frame of the stack is in user code. No need to carry on looking. Doing so actually caused a test failure on Armv8 Ubuntu Jammy where a libc function does not have a display name. I'm sure I'm going to get stung by this elsewhere, but for this test, breaking early sidesteps the problem.

The Mul factor was zero-extended here, resulting in incorrect results for integers larger than 64-bit. As we currently only multiply by 1 or -1, just split this into two cases -- there's no need for a full multiplication here. Fixes llvm#102597.

Implement FEAT_SME_B16B16 to enable ZA-targeting non-widening SME BFloat16 instructions. Remove the now redundant FEAT_B16B16 which has been replaced by FEAT_SVE_B16B16 and FEAT_SME_B16B16 (this commit), see llvm#101480 for the details and reasoning of this change to LLVM. FEAT_SME_B16B16 is documented under the latest Armv9.4 feature documentation: https://developer.arm.com/documentation/109697/0100/Feature-descriptions/The-Armv9-4-architecture-extensio - Changes to Clang AArch64 frontend - Change target guard of SME2 ZA-targeting non-widening BFloat16 intrinsics to 'sme-b16b16' - Changes to LLVM AArch64 backend - llvm/lib/Target/AArch64/AArch64Features.td - Create FeatureSMEB16B16, which implies FeatureSME2 and FeatureSVEB16B16 - Remove FeatureB16B16 - Fix description of FeatureSVEB16B16 - llvm/lib/Target/AArch64/AArch64InstrInfo.td - Create HasSMEB16B16 predicate - llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td - Change predictication of SME2 ZA-targeting non-widening BFloat16 instructions to new HasSMEB16B16 - llvm/lib/Target/AArch64/AArch64.td - Add HasSMEB16B16 to SME2Unsupported (FEAT_SME_B16B16 implies FEAT_SME2) - llvm/lib/AArch64/AsmParser/AArch64AsmParser.cpp - Remove flag 'b16b16' mapping to removed FeatureB16B16 - Add flag 'sme-b16b16' mapping to new FeatureSMEB16B16 - Changes to LLVM unit tests - llvm/unittests/TargetParser/TargetParserTest.cpp - Add new sme-b16b16 flag to existing target parser tests - Add tests for the sme-b16b16 dependencies: - 'sme-b16b16' should enable 'sme2', 'sve-b16b16'. - Remove 'b16b16' from bf16 dependency test - Added MC tests - llvm/test/MC/AArch64/SME2p1 - To ensure that ZA-targeting multi-vector non-widening BFloat16 instructions are enabled by +sme-b16b16, and that this feature is removed by +nosme-b61b6. - Modidified tests - All CodeGen, Semantic, and MC tests that are effected by the removal of 'b16b16', have been modified to supply and/or expect 'sme-b16b16' where appropriate.

Include chain of ops feeding inductions in cost precomputation for inductions, not just the induction increment. In VPlan, those instructions will be cleaned up, as both phi and increment are generated by VPWidenIntOrFpInductionRecipe independently. Fixes llvm#101337.

This PR fixes emission of valid OpLifestart/OpLifestop instructions. According to https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpLifetimeStart: "Size must be 0 if Pointer is a pointer to a non-void type or the Addresses [capability](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#Capability) is not declared.". The `Size` argument is set the corresponding intrinsics arguments, so Size is not zero we must ensure that Pointer has the required type by inserting a bitcast if needed.

…01732) This PR contains changes in virtual register processing aimed to improve correctness of emitted MIR between passes from the perspective of MachineVerifier. This potentially helps to detect previously missed flaws in code emission and harden the test suite. As a measure of correctness and usefulness of this PR we may use a mode with expensive checks set on, and MachineVerifier reports problems in the test suite. In order to satisfy Machine Verifier requirements to MIR correctness not only a rework of usage of virtual registers' types and classes is required, but also corrections into pre-legalizer and instruction selection logics. Namely, the following changes are introduced: * scalar virtual registers have proper bit width, * detect register class by SPIR-V type, * add a superclass for id virtual register classes, * fix Tablegen rules used for instruction selection, * fixes of minor existed issues (missed flag for proper representation of a null constant for OpenCL vs. HLSL, wrong usage of integer virtual registers as a synonym of any non-type virtual register).

…ble. Currently SLP vectorizer tries to keep only GEPs as scalar, if they are vectorized but used externally. Same approach can be used for all scalar values. This patch tries to keep original scalars if all its operands remain scalar or externally used, the cost of the original scalar is lower than the cost of the extractelement instruction, or if the number of externally used scalars in the same entry is power of 2. Last criterion allows better revectorization for multiply used scalars. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: llvm#100904

Now that the branches to the scalar epilogue are modeled in VPlan directly, check the VPlan to see if a scalar epilogue is required. Preparation for llvm#100658.

…te unitAttr. (llvm#102340) Adds a new ComposableOpInterface for OpenMP operations that can represent a single leaf of a composite OpenMP construct. This is patch 1/2 in a series of patches. Patch 2 - llvm#102341.

… add verifier checks (llvm#102341) This patch sets the omp.composite unit attr for composite wrapper ops and also add appropriate checks to the verifiers of supported ops for the presence/absence of the attribute. This is patch 2/2 in a series of patches. Patch 1 - llvm#102340.

Just a simple check to ignore Inline asm fwait insertion Fixes llvm#101613

Split out from llvm#80309.

llvm#101283) `lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged. Fixes llvm#90923, fixes llvm#56346. This is the part 1 of the replacement of llvm#100670. In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fiх llvm#97537.

…ss (llvm#102633) Add missing math.atan to spirv.CL.atan and math.atan2 to spirv.CL.atan2 in MathToSPIRV. Add math.atan to spirv.GL.atan too.

This has been flaky on our Windows on Arm bot: https://lab.llvm.org/buildbot/#/builders/141/builds/1497 Despite passing when first landed.

…m#102824) We were calling initialize() unconditionally when copying the union.

… boolean argument (llvm#102902) This PR resolves a TODO in `generateGroupInst()` (`lib/Target/SPIRV/SPIRVBuiltins.cpp`) and Issues llvm#97311 and llvm#97312 by implementing support for non-const arguments in a Group builtin that requires a boolean argument.

…lvm#101353) This specifically handles the case of a transpose from a vector type like `vector<8x[4]xf32>` to `vector<[4]x8xf32>`. Such transposes occur fairly frequently when scalably vectorizing `linalg.generic`s. There is no direct lowering for these (as types like `vector<[4]x8xf32>` cannot be represented in LLVM-IR). However, if the only use of the transpose is a write, then it is possible to lower the `transfer_write(transpose)` as a VLA loop. Example: ```mlir %transpose = vector.transpose %vec, [1, 0] : vector<4x[4]xf32> to vector<[4]x4xf32> vector.transfer_write %transpose, %dest[%i, %j] {in_bounds = [true, true]} : vector<[4]x4xf32>, memref<?x?xf32> ``` Becomes: ```mlir %c1 = arith.constant 1 : index %c4 = arith.constant 4 : index %c0 = arith.constant 0 : index %0 = vector.extract %arg0[0] : vector<[4]xf32> from vector<4x[4]xf32> %1 = vector.extract %arg0[1] : vector<[4]xf32> from vector<4x[4]xf32> %2 = vector.extract %arg0[2] : vector<[4]xf32> from vector<4x[4]xf32> %3 = vector.extract %arg0[3] : vector<[4]xf32> from vector<4x[4]xf32> %vscale = vector.vscale %c4_vscale = arith.muli %vscale, %c4 : index scf.for %idx = %c0 to %c4_vscale step %c1 { %4 = vector.extract %0[%idx] : f32 from vector<[4]xf32> %5 = vector.extract %1[%idx] : f32 from vector<[4]xf32> %6 = vector.extract %2[%idx] : f32 from vector<[4]xf32> %7 = vector.extract %3[%idx] : f32 from vector<[4]xf32> %slice_i = affine.apply #map(%idx)[%i] %slice = vector.from_elements %4, %5, %6, %7 : vector<4xf32> vector.transfer_write %slice, %arg1[%slice_i, %j] {in_bounds = [true]} : vector<4xf32>, memref<?x?xf32> } ```

Add small test that I missed adding to llvm#102341.

…m#102686) The extra field in the descriptor carries multiple information and cannot be deducted anymore when doing a reboxing. This patch updates the codegen to retrieve the extra field value from the inboc and set it in the new box.

This sorts DWARF op descriptions in `DWARFExpression.cpp` by opcode and version, packing the standardised ops together. A few ops also had the wrong version listed, so this fixes those versions as well. (The version does not appear to actually be used currently.)

There's only a single user (MCMachOStreamer), so it makes more sense to move the version emission to the source of the data.

In order to guarantee that extracting 64 bits doesn't require more than 2 words, the word size would need to be 64 bits or more. If the word size was smaller than 64, like 32, you may need to read 3 words to get 64 bits.

…lvm#88091)

…lvm#102482) This PR adds a field to the pass builder options struct, `AAPipeline`, exposed through a C API `LLVMPassBuilderOptionsSetAAPipeline`, that is used to set an alias analysis pipeline to be used in stead of the default one. x-ref https://discourse.llvm.org/t/newpm-c-api-questions/80598

) Inside computeConstantDifference(), handle the case where both sides are of the form `C * %x`, in which case we can strip off the common multiplication (as long as we remember to multiply by it for the following difference calculation). There is an obvious alternative implementation here, which would be to directly decompose multiplies inside the "Multiplicity" accumulation. This does work, but I've found this to be both significantly slower (because everything has to work on APInt) and more complex in implementation (e.g. because we now need to match back the new More/Less with an arbitrary factor) without providing more power in practice. As such, I went for the simpler variant here. This is the last step to make computeConstantDifference() sufficiently powerful to replace existing uses of `cast<SCEVConstant>(getMinusSCEV())` with it.

Avoid one heap allocation per function per constructed TLI. The BitVector is never resized, so a bitset is sufficient. Pull Request: llvm#103411

This metadata is queried quite often, so avoiding frequent lookups in the hash map is beneficial. Therefore, cache the metadata node directly in the module. Pull Request: llvm#103410

This only has a single use and is equally well served by the existing constructor -- blocks of a loop are already an array. Pull Request: llvm#103399

Aggregate type specification doesn't have the size component. Don't abuse LayoutAlignElem to avoid confusion.

This is a reland of llvm#96287. This change makes tests in logf128.ll ignore the sign of NaNs for negative value tests and moves an #include <cmath> to be blocked behind #ifndef _GLIBCXX_MATH_H.

…m#103015)

Syntacore SCR4 is a microcontroller-class processor core that has much in common with SCR3, but also supports F and D extensions. Overview: https://syntacore.com/products/scr4 Syntacore SCR5 is an entry-level Linux-capable 32/64-bit RISC-V processor core which scheduling model almost match SCR4. Overview: https://syntacore.com/products/scr5 Co-authored-by: Dmitrii Petrov <dmitrii.petrov@syntacore.com> Co-authored-by: Anton Afanasyev <anton.afanasyev@syntacore.com>

A truncate is considered saturated if no additional conversion is required between the target and return values. If the target is saturated when attempting to truncate from a vector, there is an opportunity to optimize it. Previously, each architecture had its own attempt at optimization, leading to redundant code. This patch implements common logic by introducing three new ISDs: `ISD::TRUNCATE_SSAT_S`: When the operand is a signed value and the range of values matches the range of signed values of the destination type. `ISD::TRUNCATE_SSAT_U`: When the operand is a signed value and the range of values matches the range of unsigned values of the destination type. `ISD::TRUNCATE_USAT_U`: When the operand is an unsigned value and the range of values matches the range of unsigned values of the destination type. These ISDs indicate a saturated truncate. Fixes llvm#85903

This patch removes the `ClauseProcessor::processDefault` method due to it having been implemented in `DataSharingProcessor::collectDefaultSymbols` instead. Also, some `genXyzClauses` functions are updated to avoid triggering TODO errors for clauses not supported by the corresponding construct and to keep alphabetical sorting on the order in which clauses are processed.

Add a test where diff checks are generated initial and then re-generated when re-trying with runtime checks. At the moment, the order doesn't match the order they are created in, as the DiffChecks field in LAI isn't cleared as other fields holding runtime checks.

DiffChecks will get populated twice when re-trying with runtime checks. Without clearing it like the regular Checks vector, it will contain some duplicates and the order the checks are created may not match the order the checks have been queued when re-trying.

…#101472) This patch add `void* PlatformArgs` parameter to `__init_riscv_feature_bits`. `PlatformArgs` allows the platform to provide pre-computed data and access it without extra effort. For example, Linux could pass the vDSO object to avoid an extra system call. ``` __init_riscv_feature_bits() -> __init_riscv_feature_bits(void *PlatformArgs) ```

…mission of the OpGroupBroadcast instruction (llvm#103050) This PR addresses a TODO in lib/Target/SPIRV/SPIRVInstructionSelector.cpp by adding implementation of the non-const G_BUILD_VECTOR, and fix emission of the OpGroupBroadcast instruction for the case when the `..._group_broadcast` builtin has more than one `local_id` argument and `OpGroupBroadcast` requires a newly constructed vector with 2 or 3 components instead of originally passed series of `local_id` arguments. This PR may resolve llvm#97310 if the reason for the reported fail is an incorrectly generated OpGroupBroadcast instruction that was definitely a case. Existing test is hardened and a new test is added to cover this special case of the OpGroupBroadcast instruction emission.

llvm#101449) …ImplID This patch 1. remove the vendorId from `__riscv_vendor_feature_bits` 2. Define a new structure for vendorID, ArchID and ImplID 3. Update the relate init code

This reverts commit 3cab7c5. The modified test fails on ppc64le buildbots.

Split out from llvm#98608.

…llvm#103713) Fixes llvm#103500

…orize.

…lvm#101827) Adding a pass that is expected to run after the deallocation pipeline and will move buffer deallocations right after their last user or dependency, thus optimizing the allocation liveness.

This also adds a default constructor and a few uses of it.

DavidSpickett and others added 30 commits August 12, 2024 12:41

[Clang][OpenMP] Fix the wrong transform of num_teams claused introd…

aa86e5b

…uced in llvm#99732 (llvm#102716)

[IndVars] Add test for llvm#102597 (NFC)

c876761

[SLP][NFC]Use local getShuffleCost function across the code, NFC.

34514ce

[VPlan] Check successors in VPlan to check if scalar epi required (NFC)

c7a44ec

Now that the branches to the scalar epilogue are modeled in VPlan directly, check the VPlan to see if a scalar epilogue is required. Preparation for llvm#100658.

Simple check to ignore Inline asm fwait insertion (llvm#101686)

db3c3fc

Just a simple check to ignore Inline asm fwait insertion Fixes llvm#101613

[OMPIRBuilder] Use getAllOnesValue()

dc831e8

Split out from llvm#80309.

[ConstantFolding] Use getSigned()

246c236

Split out from llvm#80309.

[CGP] Use getAllOnesValue()

3b27fce

Split out from llvm#80309.

[ExpandMemCmp] Use getAllOnesValue()

06f64e8

Split out from llvm#80309.

[ConstraintElimination] Use getAllOnesValue()

73d835c

Split out from llvm#80309.

[mlir][spirv] Add atan and atan2 pattern to MathToSPIRV Conversion pa…

49777d7

…ss (llvm#102633) Add missing math.atan to spirv.CL.atan and math.atan2 to spirv.CL.atan2 in MathToSPIRV. Add math.atan to spirv.GL.atan too.

[lldb][test] Diable dwp hash collision test on Windows

7027cc6

This has been flaky on our Windows on Arm bot: https://lab.llvm.org/buildbot/#/builders/141/builds/1497 Despite passing when first landed.

[clang][Interp] Fix diagnosing uninitialized nested union fields (llv…

3825a7c

…m#102824) We were calling initialize() unconditionally when copying the union.

[OpenMP][MLIR] Add test missed by llvm#102341

05d85ec

Add small test that I missed adding to llvm#102341.

aengelke and others added 26 commits August 14, 2024 09:24

[MC] Remove Darwin SDK/Version from ObjFileInfo (llvm#103025)

f1cb64b

There's only a single user (MCMachOStreamer), so it makes more sense to move the version emission to the source of the data.

[APInt] Correct backwards static_assert condition. (llvm#103641)

cbd3068

In order to guarantee that extracting 64 bits doesn't require more than 2 words, the word size would need to be 64 bits or more. If the word size was smaller than 64, like 32, you may need to read 3 words to get 64 bits.

[libc++] <experimental/simd> Add ++/-- operators for simd reference (l…

812ae91

…lvm#88091)

[Analysis][NFC] Don't use BitVector for nobuiltin overrides

ce8cb7c

Avoid one heap allocation per function per constructed TLI. The BitVector is never resized, so a bitset is sufficient. Pull Request: llvm#103411

[IR] Cache llvm.module.flags metadata in Module

dfa488b

This metadata is queried quite often, so avoiding frequent lookups in the hash map is beneficial. Therefore, cache the metadata node directly in the module. Pull Request: llvm#103410

[Transforms][NFC] Remove second CodeExtractor constructor

fa658ac

This only has a single use and is equally well served by the existing constructor -- blocks of a loop are already an array. Pull Request: llvm#103399

[DataLayout] Split StructAlignment into two fields (NFC) (llvm#103700)

8eadf21

Aggregate type specification doesn't have the size component. Don't abuse LayoutAlignElem to avoid confusion.

Reland logf128 constant folding (llvm#103217)

3cab7c5

This is a reland of llvm#96287. This change makes tests in logf128.ll ignore the sign of NaNs for negative value tests and moves an #include <cmath> to be blocked behind #ifndef _GLIBCXX_MATH_H.

[clang][Interp] Diagnose pointer subtraction on zero-size arrays (llv…

13008aa

…m#103015)

[RISCV][compiler-rt] create __riscv__cpu_model for vendorID, ArchID, … (

8bf298f

llvm#101449) …ImplID This patch 1. remove the vendorId from `__riscv_vendor_feature_bits` 2. Define a new structure for vendorID, ArchID and ImplID 3. Update the relate init code

Revert "Reland logf128 constant folding (llvm#103217)"

6300233

This reverts commit 3cab7c5. The modified test fails on ppc64le buildbots.

[BasicAA] Remove unused variables (NFC)

2287532

Split out from llvm#98608.

BasicAA: Fix assert when indexing address spaces with different sizes (…

9c7c3f9

…llvm#103713) Fixes llvm#103500

[LLVM] Regenerate some test outputs for llvm/test/Transforms/LoopVect…

9e318ba

…orize.

[mlir][bufferization] Adding the optimize-allocation-liveness pass (l…

6de04e6

…lvm#101827) Adding a pass that is expected to run after the deallocation pipeline and will move buffer deallocations right after their last user or dependency, thus optimizing the allocation liveness.

[DataLayout] Use member initialization (NFC) (llvm#103712)

6cf3e7d

This also adds a default constructor and a few uses of it.

[AutoBump] Merge with 6cf3e7d0 (Aug 14)

41be2c7

cferry-AMD approved these changes Sep 30, 2024

View reviewed changes

Base automatically changed from bump_to_5855237 to feature/fused-ops October 2, 2024 07:14

mgehre-amd merged commit 79d2891 into feature/fused-ops Oct 4, 2024
6 checks passed

mgehre-amd deleted the bump_to_6cf3e7d0 branch October 4, 2024 14:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with 6cf3e7d0 (Aug 14) (2) #355

[AutoBump] Merge with 6cf3e7d0 (Aug 14) (2) #355

mgehre-amd commented Sep 19, 2024

[AutoBump] Merge with 6cf3e7d0 (Aug 14) (2) #355

[AutoBump] Merge with 6cf3e7d0 (Aug 14) (2) #355

Conversation

mgehre-amd commented Sep 19, 2024