Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with a853d799 (25) #283

Merged
merged 52 commits into from
Aug 21, 2024
Merged

Conversation

cferry-AMD
Copy link
Collaborator

No description provided.

kazutakahirata and others added 30 commits April 3, 2024 09:55
This patch fixes:

  clang/lib/CodeGen/CGExpr.cpp:5607:11: error: variable 'Result' is
  used uninitialized whenever 'if' condition is false
  [-Werror,-Wsometimes-uninitialized]
The readme only states the goal and has links to further information,
e.g., our meetings.

---------

Co-authored-by: Shilei Tian <i@tianshilei.me>
Have to compare actual type size to pick up proper cast operation
opcode.
Straightforward computation of `A − FLOOR (A / P) * P` should
produce NaN, when P is infinity. The -menable-no-infs lowering
can still use the relaxed operations sequence.
This paper did not add any normative changes for us to check
conformance against. It added a note describing a potential behavioral
difference between compile-time and runtime evaluation of negative
floating-point values in the presence of rounding modes.
This was accidentally removed in
https://reviews.llvm.org/D137799#4657404 /
https://reviews.llvm.org/D137799#C3933303OL44, and downstream projects
are forced to add it back. For example,
https://git.savannah.gnu.org/cgit/guix.git/commit/?id=4e26331a5ee87928a16888c36d51e270f0f10f90

Fix this, by re-adding it.

Co-authored-by: MarcoFalke <*~=`'#}+{/-|&$^_@721217.xyz>
…r reordering.

If the node has cmp instruction with 3 or more different but swappable
predicates, need to keep same kind of main/alternate opcodes to avoid
incorrect detection of opcodes after reordering. Reordering changes the
order and we may erroneously consider swappable opcodes as
non-compatible/alternate, which may lead to a later compiler crash.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: llvm#87267
This addition catches common cases of malformed `tosa.reshape` ops. This
prevents the `--tosa-to-tensor` pass from asserting when fed invalid
operations, as these will be caught ahead of time by the verifier.

Closes llvm#87396
Justifications:
- LWG3950: Done in llvm#66206
- LWG3975: Wording changes only
- LWG4011: Wording changes only
- LWG4030: Wording changes only
- LWG4043: Wording changes only
- LWG3036 and P2875R4: We implemented neither, but the latter reverts
the former, so now we implement both without doing anything!
`DefaultTimingManager::clear()` uses `out` to initialize `TimerImpl`,
but the `out` is `nullptr` by default. This means if
`DefaultTimingManager::setOutput()` is never called,
`DefaultTimingManager` destructor may generate SIGSEGV.
…r available_externally functions (llvm#87279)

This is to fix an assertion error. Apparently, `pseudo_probe_desc` could
still be available for import functions, and its checksum mismatch state
can be different from import function's `profile-checksum-mismatch`
attr. This happens when unstable IR or ODR violation issue occurs, the
definitions of the same function across different translation units
could be different and result in different checksums. During link time
deduplication, the internal function definition (the checksum in desc is
computed based on) is substituted by the `available_externally`
definition, which cause the inconsistency. Hence, we fix it to by always
checking the state for the new `available_externally` definition, which
is saved in the function attribute.
…ields"" (llvm#87529)

Reverts llvm#87518

Revert is not needed as the regression was fixed with
1189e87.

I assumed the crash and warning are different issues, but according to
https://lab.llvm.org/buildbot/#/builders/240/builds/26629
fixing warning resolves the crash.
…lvm#87467)

By generic intrinsics this mean things like dup, ext, zip and bsl that
can always be executed with integer s16 operations and do not require
fullfp16. This makes them always available, and brings them inline with
GCC.
https://godbolt.org/z/azs8eMv54

The relevant test cases have been moved into their own files, to allow
them to be tested with armv8-a and armv8.2-a+fp16.
Before all the call probe ids are after block ids, in this change, it
mixed the call probe and block probe by reordering them in
lexical(line-number) order. For example:
```
main():
BB1
if(...) 
  BB2 foo(..);   
else 
  BB3 bar(...);
BB4
```
Before the profile is
```
main
 1: ..
 2: ..
 3: ...
 4: ...
 5: foo ...
 6: bar ...
 ```
 Now the new order is
```
 main
 1: ..
 2: ..
 3: foo ...
 4: ...
 5: bar ...
 6: ...
```
This can potentially make it more tolerant of profile mismatch, either from stale profile or frontend change. e.g. before if we add one block, even the block is the last one, all the call probes are shifted and mismatched. Moreover, this makes better use of call-anchor based stale profile matching. Blocks are matched based on the closest anchor, there would be more anchors used for the matching, reduce the mismatch scope.
Implemented long-standing TODO to support commutative intrinsics.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: llvm#86316
Add zext nneg tests and check we don't fold casts with different src types
…vm#87537)

We should consistently use PseudoInstr instead of Mnemonic to name
SIMCInstr, even though they may be the same in most cases
Allows src1 of VOP3 encoded VOPC to be an SGPR or inline immediate on
GFX1150Plus

The w32 and w64 _e64_dpp assembler only real instructions were unused,
and erroneously constructed in a way that bugged parsing of the new
instructions. They are removed.

This patch is a follow up to PR
llvm#87382
The previous diff (and it's subsequent fix) were reverted as the tests
didn't work properly on the AArch64 & ARM LLDB buildbots. I made a
couple more minor changes to tests (from @clayborg's feedback) and
disabled them for non Linux-x86(_64) builds, as I don't have the ability
do anything about an ARM64 Linux failure. If I had to guess, I'd say the
toolchain on the buildbots isn't respecting the `-Wl,--build-id` flag.
Maybe, one day, when I have a Linux AArch64 system I'll dig in to it.

From the reverted PR:

I've migrated the tests in my
llvm#79181 from shell to API (at
@JDevlieghere's suggestion) and addressed a couple issues that were
exposed during testing.

The tests first test the "normal" situation (no DebugInfoD involvement,
just normal debug files sitting around), then the "no debug info"
situation (to make sure the test is seeing failure properly), then it
tests to validate that when DebugInfoD returns the symbols, things work
properly. This is duplicated for DWP/split-dwarf scenarios.

---------

Co-authored-by: Kevin Frei <freik@meta.com>
…ructions.

Compiler can improve analysis for operands of UIToFP/SIToFP instructions
and operands of ICmp instruction.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: llvm#85966
Some TUs apparently end up with an ambiguity between `::llvm::detail`
and `support::detail`, so we close that gap at the source.
Adding OffTType to fcntl.h and stdio.h 's Macro lists in libc/spec/posix.td as
mentioned here: llvm#87266
…ructions.

Compiler can improve analysis for operands of UIToFP/SIToFP instructions
and operands of ICmp instruction.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: llvm#85966
…improve costs for testing

Improves SSE vs AVX test results for llvm#87510
vzakhari and others added 22 commits April 3, 2024 14:49
…hange_{weak,strong}` (llvm#87135)

Spotted this minor mistake in the tests as I was looking into testing
more thoroughly `atomic_ref`.

The two argument overloads are tested just above. The names of the
lambda clearly indicates that the intent was to test the one argument
overload.
… and G_ICMP for scalable vector types

This patch legalizes G_ZEXT, G_SEXT, and G_ANYEXT. If the type is a
legal mask type, then the instruction is legalized as the element-wise
select, where the condition on the select is the mask typed source
operand, and the true and false values are 1 or -1 (for
zero/any-extension and sign extension) and zero. If the type is a legal integer
or vector integer type, then the instruction is marked as legal.

The legalization of the extends may introduce a G_SPLAT_VECTOR, which
needs to be legalized in this patch for the extend test cases to pass.

A G_SPLAT_VECTOR is legal if the vector type is a legal integer or
floating point vector type and the source operand is sXLen type. This is
because the SelectionDAG patterns only support sXLen typed
ISD::SPLAT_VECTORS, and we'd like to reuse those patterns. A
G_SPLAT_VECTOR is cutom legalized if it has a legal s1 element vector
type and s1 scalar operand. It is legalized to G_VMSET_VL or G_VMCLR_VL
if the splat is all ones or all zeros respectivley. In the case of a
non-constant mask splat, we legalize by promoting the scalar value to
s8.

In order to get the s8 element vector back into s1 vector, we use a
G_ICMP. In order for the splat vector and extend tests to pass, we also
need to legalize G_ICMP in this patch.

A G_ICMP is legal if the destination type is a legal bool vector and the LHS and
RHS are legal integer vector types.
…roll patterns (llvm#86005)

Updates smmla unrolling patterns to handle vecmat contracts where `dimM=1`. This includes explicit vecmats in the form: `<1x8xi8> x <8x8xi8> --> <1x8xi32>` or implied with the leading dim folded: `<8xi8> x <8x8xi8> --> <8xi32>` 

Since the smmla operates on two `<2x8xi8>` input vectors to produce `<2x2xi8>` accumulators, half of each 2x2 accumulator tile is dummy data not pertinent to the computation, resulting in half throughput.
…sposes solely on leading unit dims. (llvm#85694)

Updates `castAwayContractionLeadingOneDim` to check for leading unit
dimensions before inserting `vector.transpose` ops.

Currently `castAwayContractionLeadingOneDim` removes all leading unit
dims based on the accumulator and transpose any subsequent operands to
match the accumulator indexing. This does not take into account if the
transpose is strictly necessary, for instance when given this
vector-matrix contract:
```mlir
  %result = vector.contract {indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d2, d3)>, affine_map<(d0, d1, d2, d3) -> (d1, d2)>], iterator_types = ["parallel", "parallel", "parallel", "reduction"], kind = #vector.kind<add>} %lhs, %rhs, %acc : vector<1x1x8xi32>, vector<1x8x8xi32> into vector<1x8xi32>
```
Passing this through `castAwayContractionLeadingOneDim` pattern produces
the following:
```mlir
    %0 = vector.transpose %arg0, [1, 0, 2] : vector<1x1x8xi32> to vector<1x1x8xi32>
    %1 = vector.extract %0[0] : vector<1x8xi32> from vector<1x1x8xi32>
    %2 = vector.extract %arg2[0] : vector<8xi32> from vector<1x8xi32>
    %3 = vector.contract {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d1)>], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %1, %arg1, %2 : vector<1x8xi32>, vector<1x8x8xi32> into vector<8xi32>
    %4 = vector.broadcast %3 : vector<8xi32> to vector<1x8xi32>
```
The `vector.transpose` introduced does not affect the underlying data
layout (effectively a no op), but it cannot be folded automatically.
This change avoids inserting transposes when only leading unit
dimensions are involved.

Fixes llvm#85691
…tedType (llvm#87582)

Previously the leading space was added in each string constant. This
patch moves the leading space out of the string constants and is instead
explicitly added to add clarity to the code.
DeclRef to field must be marked as LValue to be consistent with how the
field decl will be evaluated.

T->desugar() is unnecessary to call ->isArrayType().
This is a followup to
llvm#86359
"[lldb] [ObjectFileMachO] LLVM_COV is not mapped into firmware memory
(llvm#86359)"

where I treat LLVM_COV segments in a Mach-O binary as non-loadable.
There is another codepath in
`DynamicLoaderStatic::LoadAllImagesAtFileAddresses` which is called to
set the load addresses for a Module to the file addresses. It has no
logic to detect a segment that is not loaded in virtual memory
(ObjectFileMachO::SectionIsLoadable), so it would set the load address
for this LLVM_COV segment to the file address and shadow actual code,
breaking lldb behavior.

This method currently sets the load address for any section that doesn't
have a load address set already. This presumes that a Module was added
to the Target, some mechanism set the correct load address for SOME
segments, and then this method is going to set the other segments to a
no-slide value, assuming they were forgotten.

ObjectFile base class doesn't, today, vend a SectionIsLoadable method,
but we do have ObjectFile::SetLoadAddress and at a higher level,
Module::SetLoadAddress, when we're setting the same slide to all
segments.

That's the behavior we want in this method. If any section has a load
address, we don't touch this Module. Otherwise we set all sections to
have a load address that is the same as the file address.

I also audited the other parts of lldb that are calling
SectionList::SectionLoadAddress and looked if they should be more
correctly using Module::SetLoadAddress for the entire binary. But in
most cases, we have the potential for different slides for different
sections so this section-by-section approach must be taken.

rdar://125800290
…VRegisterBankInfo::getInstrMapping.

This removes the special case for vectors. The default case in the
second switch can handle GPR in addition to vectors. We just won't
use the static ValueMapping entry.
…7301)

Use the return type to measure the LMUL size for latency/throughput cost
…rhs`

- When both operands are constant, the matcher runs into an infinite
  loop as the commutation should be applied only when LHS is a constant
  and RHS is not.

Reviewers: arsenm

Reviewed By: arsenm

Pull Request: llvm#87426
… is vector. NFC

If the type is vector, we can immediately know to use vector mapping.
Previously we searched for FP uses, but then replaced it if the type
was vector.
Base automatically changed from bump_to_5b702be1 to feature/fused-ops August 21, 2024 20:23
@mgehre-amd mgehre-amd merged commit 8b08487 into feature/fused-ops Aug 21, 2024
10 checks passed
@mgehre-amd mgehre-amd deleted the bump_to_a853d799 branch August 21, 2024 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.