Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebasing the rust patches onto llvm HEAD #3

Merged
merged 0 commits into from
Sep 29, 2012
Merged

Conversation

erickt
Copy link
Collaborator

@erickt erickt commented Sep 29, 2012

Heya Brson! I was getting annoyed that rust-lang/rust#3142 kept popping up, and it turns out this has been fixed in llvm HEAD. We need a couple patches in rust before we can use this though. I'll link this issue with the rust one once I upload it.

@brson
Copy link
Owner

brson commented Sep 29, 2012

Thanks!

brson pushed a commit that referenced this pull request Mar 28, 2013
std::string to a StringRef. Moreover, the method being called accepts
a Twine to simplify these patterns.

Fixes this ASan failure:
==6312== ERROR: AddressSanitizer: heap-use-after-free on address 0x7fd558b1af58 at pc 0xcb7529 bp 0x7fffff572080 sp 0x7fffff572078
READ of size 1 at 0x7fd558b1af58 thread T0
    #0 0xcb7528 .../llvm/include/llvm/ADT/StringRef.h:192 llvm::StringRef::operator[]()
    #1 0x1d53c0a .../llvm/include/llvm/ADT/StringExtras.h:128 llvm::HashString()
    #2 0x1d53878 .../llvm/lib/Support/StringMap.cpp:64 llvm::StringMapImpl::LookupBucketFor()
    #3 0x1b6872f .../llvm/include/llvm/ADT/StringMap.h:352 llvm::StringMap<>::GetOrCreateValue<>()
    #4 0x1b61836 .../llvm/lib/MC/MCContext.cpp:109 llvm::MCContext::GetOrCreateSymbol()
    #5 0xe9fd47 .../llvm/lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp:154 (anonymous namespace)::ARMELFStreamer::EmitMappingSymbol()
    #6 0xea01dd .../llvm/lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp:133 (anonymous namespace)::ARMELFStreamer::EmitDataMappingSymbol()
    #7 0xe9f78b .../llvm/lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp:91 (anonymous namespace)::ARMELFStreamer::EmitBytes()
    #8 0x1b15d82 .../llvm/lib/MC/MCStreamer.cpp:89 llvm::MCStreamer::EmitIntValue()
    #9 0xcc0f9b .../llvm/lib/Target/ARM/ARMAsmPrinter.cpp:713 llvm::ARMAsmPrinter::emitAttributes()
    #10 0xcc0d44 .../llvm/lib/Target/ARM/ARMAsmPrinter.cpp:632 llvm::ARMAsmPrinter::EmitStartOfAsmFile()
    #11 0x14692ad .../llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:162 llvm::AsmPrinter::doInitialization()
    #12 0x1bc4677 .../llvm/lib/VMCore/PassManager.cpp:1561 llvm::FPPassManager::doInitialization()
    #13 0x1bc4990 .../llvm/lib/VMCore/PassManager.cpp:1595 llvm::MPPassManager::runOnModule()
    #14 0x1bc55e5 .../llvm/lib/VMCore/PassManager.cpp:1705 llvm::PassManagerImpl::run()
    #15 0x1bc5878 .../llvm/lib/VMCore/PassManager.cpp:1740 llvm::PassManager::run()
    #16 0xc3954d .../llvm/tools/llc/llc.cpp:378 compileModule()
    #17 0xc38001 .../llvm/tools/llc/llc.cpp:194 main
    #18 0x7fd557d6a11c __libc_start_main
0x7fd558b1af58 is located 24 bytes inside of 29-byte region [0x7fd558b1af40,0x7fd558b1af5d)
freed by thread T0 here:
    #0 0xc337da .../llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:56 operator delete()
    #1 0x1ee9cef .../libstdc++-v3/include/bits/basic_string.h:535 std::string::~string()
    #2 0xea01dd .../llvm/lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp:133 (anonymous namespace)::ARMELFStreamer::EmitDataMappingSymbol()
    #3 0xe9f78b .../llvm/lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp:91 (anonymous namespace)::ARMELFStreamer::EmitBytes()
    #4 0x1b15d82 .../llvm/lib/MC/MCStreamer.cpp:89 llvm::MCStreamer::EmitIntValue()
    #5 0xcc0f9b .../llvm/lib/Target/ARM/ARMAsmPrinter.cpp:713 llvm::ARMAsmPrinter::emitAttributes()
    #6 0xcc0d44 .../llvm/lib/Target/ARM/ARMAsmPrinter.cpp:632 llvm::ARMAsmPrinter::EmitStartOfAsmFile()
    #7 0x14692ad .../llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:162 llvm::AsmPrinter::doInitialization()
    #8 0x1bc4677 .../llvm/lib/VMCore/PassManager.cpp:1561 llvm::FPPassManager::doInitialization()
    #9 0x1bc4990 .../llvm/lib/VMCore/PassManager.cpp:1595 llvm::MPPassManager::runOnModule()
    #10 0x1bc55e5 .../llvm/lib/VMCore/PassManager.cpp:1705 llvm::PassManagerImpl::run()
    #11 0x1bc5878 .../llvm/lib/VMCore/PassManager.cpp:1740 llvm::PassManager::run()
    #12 0xc3954d .../llvm/tools/llc/llc.cpp:378 compileModule()
    #13 0xc38001 .../llvm/tools/llc/llc.cpp:194 main
    #14 0x7fd557d6a11c __libc_start_main

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169668 91177308-0d34-0410-b5e6-96231b3b80d8
brson pushed a commit that referenced this pull request Mar 28, 2013
Before: the function name was stored by the compiler as a constant string
and the run-time was printing it.
Now: the PC is stored instead and the run-time prints the full symbolized frame.
This adds a couple of instructions into every function with non-empty stack frame,
but also reduces the binary size because we store less strings (I saw 2% size reduction).
This change bumps the asan ABI version to v3.

llvm part.

Example of report (now):
==31711==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffa77cf1c5 at pc 0x41feb0 bp 0x7fffa77cefb0 sp 0x7fffa77cefa8
READ of size 1 at 0x7fffa77cf1c5 thread T0
    #0 0x41feaf in Frame0(int, char*, char*, char*) stack-oob-frames.cc:20
    #1 0x41f7ff in Frame1(int, char*, char*) stack-oob-frames.cc:24
    #2 0x41f477 in Frame2(int, char*) stack-oob-frames.cc:28
    #3 0x41f194 in Frame3(int) stack-oob-frames.cc:32
    #4 0x41eee0 in main stack-oob-frames.cc:38
    #5 0x7f0c5566f76c (/lib/x86_64-linux-gnu/libc.so.6+0x2176c)
    #6 0x41eb1c (/usr/local/google/kcc/llvm_cmake/a.out+0x41eb1c)
Address 0x7fffa77cf1c5 is located in stack of thread T0 at offset 293 in frame
    #0 0x41f87f in Frame0(int, char*, char*, char*) stack-oob-frames.cc:12  <<<<<<<<<<<<<< this is new
  This frame has 6 object(s):
    [32, 36) 'frame.addr'
    [96, 104) 'a.addr'
    [160, 168) 'b.addr'
    [224, 232) 'c.addr'
    [288, 292) 's'
    [352, 360) 'd'




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177724 91177308-0d34-0410-b5e6-96231b3b80d8
brson pushed a commit that referenced this pull request May 25, 2013
Rather than just splitting the input type and hoping for the best, apply
a bit more cleverness. Just splitting the types until the source is
legal often leads to an illegal result time, which is then widened and a
scalarization step is introduced which leads to truly horrible code
generation. With the loop vectorizer, these sorts of operations are much
more common, and so it's worth extra effort to do them well.

Add a legalization hook for the operands of a TRUNCATE node, which will
be encountered after the result type has been legalized, but if the
operand type is still illegal. If simple splitting of both types
ends up with the result type of each half still being legal, just
do that (v16i16 -> v16i8 on ARM, for example). If, however, that would
result in an illegal result type (v8i32 -> v8i8 on ARM, for example),
we can get more clever with power-two vectors. Specifically,
split the input type, but also widen the result element size, then
concatenate the halves and truncate again.  For example on ARM,
To perform a "%res = v8i8 trunc v8i32 %in" we transform to:
  %inlo = v4i32 extract_subvector %in, 0
  %inhi = v4i32 extract_subvector %in, 4
  %lo16 = v4i16 trunc v4i32 %inlo
  %hi16 = v4i16 trunc v4i32 %inhi
  %in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16
  %res = v8i8 trunc v8i16 %in16

This allows instruction selection to generate three VMOVN instructions
instead of a sequences of moves, stores and loads.

Update the ARMTargetTransformInfo to take this improved legalization
into account.

Consider the simplified IR:

define <16 x i8> @test1(<16 x i32>* %ap) {
  %a = load <16 x i32>* %ap
  %tmp = trunc <16 x i32> %a to <16 x i8>
  ret <16 x i8> %tmp
}

define <8 x i8> @test2(<8 x i32>* %ap) {
  %a = load <8 x i32>* %ap
  %tmp = trunc <8 x i32> %a to <8 x i8>
  ret <8 x i8> %tmp
}

Previously, we would generate the truly hideous:
	.syntax unified
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_test1
	.align	2
_test1:                                 @ @test1
@ BB#0:
	push	{r7}
	mov	r7, sp
	sub	sp, sp, #20
	bic	sp, sp, #7
	add	r1, r0, #48
	add	r2, r0, #32
	vld1.64	{d24, d25}, [r0:128]
	vld1.64	{d16, d17}, [r1:128]
	vld1.64	{d18, d19}, [r2:128]
	add	r1, r0, #16
	vmovn.i32	d22, q8
	vld1.64	{d16, d17}, [r1:128]
	vmovn.i32	d20, q9
	vmovn.i32	d18, q12
	vmov.u16	r0, d22[3]
	strb	r0, [sp, #15]
	vmov.u16	r0, d22[2]
	strb	r0, [sp, #14]
	vmov.u16	r0, d22[1]
	strb	r0, [sp, #13]
	vmov.u16	r0, d22[0]
	vmovn.i32	d16, q8
	strb	r0, [sp, #12]
	vmov.u16	r0, d20[3]
	strb	r0, [sp, #11]
	vmov.u16	r0, d20[2]
	strb	r0, [sp, #10]
	vmov.u16	r0, d20[1]
	strb	r0, [sp, #9]
	vmov.u16	r0, d20[0]
	strb	r0, [sp, #8]
	vmov.u16	r0, d18[3]
	strb	r0, [sp, #3]
	vmov.u16	r0, d18[2]
	strb	r0, [sp, #2]
	vmov.u16	r0, d18[1]
	strb	r0, [sp, #1]
	vmov.u16	r0, d18[0]
	strb	r0, [sp]
	vmov.u16	r0, d16[3]
	strb	r0, [sp, #7]
	vmov.u16	r0, d16[2]
	strb	r0, [sp, #6]
	vmov.u16	r0, d16[1]
	strb	r0, [sp, #5]
	vmov.u16	r0, d16[0]
	strb	r0, [sp, #4]
	vldmia	sp, {d16, d17}
	vmov	r0, r1, d16
	vmov	r2, r3, d17
	mov	sp, r7
	pop	{r7}
	bx	lr

	.globl	_test2
	.align	2
_test2:                                 @ @test2
@ BB#0:
	push	{r7}
	mov	r7, sp
	sub	sp, sp, #12
	bic	sp, sp, #7
	vld1.64	{d16, d17}, [r0:128]
	add	r0, r0, #16
	vld1.64	{d20, d21}, [r0:128]
	vmovn.i32	d18, q8
	vmov.u16	r0, d18[3]
	vmovn.i32	d16, q10
	strb	r0, [sp, #3]
	vmov.u16	r0, d18[2]
	strb	r0, [sp, #2]
	vmov.u16	r0, d18[1]
	strb	r0, [sp, #1]
	vmov.u16	r0, d18[0]
	strb	r0, [sp]
	vmov.u16	r0, d16[3]
	strb	r0, [sp, #7]
	vmov.u16	r0, d16[2]
	strb	r0, [sp, #6]
	vmov.u16	r0, d16[1]
	strb	r0, [sp, #5]
	vmov.u16	r0, d16[0]
	strb	r0, [sp, #4]
	ldm	sp, {r0, r1}
	mov	sp, r7
	pop	{r7}
	bx	lr

Now, however, we generate the much more straightforward:
	.syntax unified
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_test1
	.align	2
_test1:                                 @ @test1
@ BB#0:
	add	r1, r0, #48
	add	r2, r0, #32
	vld1.64	{d20, d21}, [r0:128]
	vld1.64	{d16, d17}, [r1:128]
	add	r1, r0, #16
	vld1.64	{d18, d19}, [r2:128]
	vld1.64	{d22, d23}, [r1:128]
	vmovn.i32	d17, q8
	vmovn.i32	d16, q9
	vmovn.i32	d18, q10
	vmovn.i32	d19, q11
	vmovn.i16	d17, q8
	vmovn.i16	d16, q9
	vmov	r0, r1, d16
	vmov	r2, r3, d17
	bx	lr

	.globl	_test2
	.align	2
_test2:                                 @ @test2
@ BB#0:
	vld1.64	{d16, d17}, [r0:128]
	add	r0, r0, #16
	vld1.64	{d18, d19}, [r0:128]
	vmovn.i32	d16, q8
	vmovn.i32	d17, q9
	vmovn.i16	d16, q8
	vmov	r0, r1, d16
	bx	lr

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179989 91177308-0d34-0410-b5e6-96231b3b80d8
alexcrichton pushed a commit to alexcrichton/llvm that referenced this pull request Dec 5, 2013
We currently error in clang with:
"error: thread-local storage is unsupported for the current target", but we
can start to get the llvm level ready.

When compiling

template<typename T>
struct foo {
  static __declspec(thread) int bar;
};
template<typename T>
__declspec(therad) int foo<T>::bar;
template struct foo<int>;

msvc produces

SECTION HEADER brson#3
   .tls$ name
       0 physical address
       0 virtual address
       4 size of raw data
     12F file pointer to raw data (0000012F to 00000132)
       0 file pointer to relocation table
       0 file pointer to line numbers
       0 number of relocations
       0 number of line numbers
C0301040 flags
         Initialized Data
         COMDAT; sym= "public: static int foo<int>::bar" (?bar@?$foo@H@@2ha)
         4 byte align
         Read Write

gcc produces a ".data$__emutls_v.<symbol>" for the testcase with
__declspec(thread) replaced with thread_local.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195849 91177308-0d34-0410-b5e6-96231b3b80d8
alexcrichton pushed a commit to alexcrichton/llvm that referenced this pull request Mar 27, 2014
…ify-libcall

optimize a call to a llvm intrinsic to something that invovles a call to a C
library call, make sure it sets the right calling convention on the call.

e.g.
extern double pow(double, double);
double t(double x) {
  return pow(10, x);
}

Compiles to something like this for AAPCS-VFP:
define arm_aapcs_vfpcc double @t(double %x) #0 {
entry:
  %0 = call double @llvm.pow.f64(double 1.000000e+01, double %x)
  ret double %0
}

declare double @llvm.pow.f64(double, double) brson#1

Simplify libcall (part of instcombine) will turn the above into:
define arm_aapcs_vfpcc double @t(double %x) #0 {
entry:
  %__exp10 = call double @__exp10(double %x) brson#1
  ret double %__exp10
}

declare double @__exp10(double)

The pre-instcombine code works because calls to LLVM builtins are special.
Instruction selection will chose the right calling convention for the call.
However, the code after instcombine is wrong. The call to __exp10 will use
the C calling convention.

I can think of 3 options to fix this.

1. Make "C" calling convention just work since the target should know what CC
   is being used.

   This doesn't work because each function can use different CC with the "pcs"
   attribute.

2. Have Clang add the right CC keyword on the calls to LLVM builtin.

   This will work but it doesn't match the LLVM IR specification which states
   these are "Standard C Library Intrinsics".

3. Fix simplify libcall so the resulting calls to the C routines will have the
   proper CC keyword. e.g.
   %__exp10 = call arm_aapcs_vfpcc double @__exp10(double %x) brson#1

   This works and is the solution I implemented here.

Both solutions brson#2 and brson#3 would work. After carefully considering the pros and
cons, I decided to implement brson#3 for the following reasons.

1. It doesn't change the "spec" of the intrinsics.
2. It's a self-contained fix.

There are a couple of potential downsides.
1. There could be other places in the optimizer that is broken in the same way
   that's not addressed by this.
2. There could be other calling conventions that need to be propagated by
   simplify-libcall that's not handled.

But for now, this is the fix that I'm most comfortable with.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203488 91177308-0d34-0410-b5e6-96231b3b80d8
luqmana pushed a commit to luqmana/llvm that referenced this pull request May 20, 2014
The canonical form for the extended addressing mode (e.g.,
"[x1, w2, uxtw brson#3]" is for the MCInst to have the second register be the
full 64-bit GPR64 register class. The instruction printer cleans up
the output for display to show the 32-bit register instead, per the
specification.

This simplifies 205893 now that the aliasing is handled in the printer
in 206495 so that the codegen path and the disassembler path give the
same MCInst form.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206797 91177308-0d34-0410-b5e6-96231b3b80d8
luqmana pushed a commit to luqmana/llvm that referenced this pull request May 20, 2014
For pattern like ((x >> C1) & Mask) << C2, DAG combiner may convert it
into (x >> (C1-C2)) & (Mask << C2), which makes pattern matching of ubfx
more difficult.
For example:
Given
  %shr = lshr i64 %x, 4
  %and = and i64 %shr, 15
  %arrayidx = getelementptr inbounds [8 x [64 x i64]]* @arr, i64 0, %i64 2, i64 %and
  %0 = load i64* %arrayidx
With current shift folding, it takes 3 instrs to compute base address:
  lsr x8, x0, #1
  and x8, x8, #0x78
  add x8, x9, x8

If using ubfx, it only needs 2 instrs:
  ubfx  x8, x0, brson#4, brson#4
  add x8, x9, x8, lsl brson#3

This fixes bug 19589


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207702 91177308-0d34-0410-b5e6-96231b3b80d8
luqmana pushed a commit to luqmana/llvm that referenced this pull request Sep 9, 2014
FastISel didn't take much advantage of the different addressing modes available
to it on AArch64. This commit allows the ComputeAddress method to recognize more
addressing modes that allows shifts and sign-/zero-extensions to be folded into
the memory operation itself.

For Example:
  lsl x1, x1, brson#3     --> ldr x0, [x0, x1, lsl brson#3]
  ldr x0, [x0, x1]

  sxtw x1, w1
  lsl x1, x1, brson#3     --> ldr x0, [x0, x1, sxtw brson#3]
  ldr x0, [x0, x1]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215597 91177308-0d34-0410-b5e6-96231b3b80d8
luqmana pushed a commit to luqmana/llvm that referenced this pull request Sep 9, 2014
…15597).

Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.

Original commit message:
FastISel didn't take much advantage of the different addressing modes available
to it on AArch64. This commit allows the ComputeAddress method to recognize more
addressing modes that allows shifts and sign-/zero-extensions to be folded into
the memory operation itself.

For Example:
  lsl x1, x1, brson#3     --> ldr x0, [x0, x1, lsl brson#3]
  ldr x0, [x0, x1]

  sxtw x1, w1
  lsl x1, x1, brson#3     --> ldr x0, [x0, x1, sxtw brson#3]
  ldr x0, [x0, x1]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216013 91177308-0d34-0410-b5e6-96231b3b80d8
alexcrichton pushed a commit to alexcrichton/llvm that referenced this pull request Apr 27, 2018
Revert r330413: "[SSAUpdaterBulk] Use SmallVector instead of DenseMap for storing rewrites."
Revert r330403 "Reapply "[PR16756] Use SSAUpdaterBulk in JumpThreading." one more time."

r330403 commit seems to crash clang during our integrate while doing PGO build with the following stacktrace:
      brson#2 llvm::SSAUpdaterBulk::RewriteAllUses(llvm::DominatorTree*, llvm::SmallVectorImpl<llvm::PHINode*>*)
      brson#3 llvm::JumpThreadingPass::ThreadEdge(llvm::BasicBlock*, llvm::SmallVectorImpl<llvm::BasicBlock*> const&, llvm::BasicBlock*)
      brson#4 llvm::JumpThreadingPass::ProcessThreadableEdges(llvm::Value*, llvm::BasicBlock*, llvm::jumpthreading::ConstantPreference, llvm::Instruction*)
      brson#5 llvm::JumpThreadingPass::ProcessBlock(llvm::BasicBlock*)
The crash happens while compiling 'lib/Analysis/CallGraph.cpp'.

r3340413 is reverted due to conflicting changes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330416 91177308-0d34-0410-b5e6-96231b3b80d8
alexcrichton pushed a commit to alexcrichton/llvm that referenced this pull request May 22, 2018
Patch brson#3 from VPlan Outer Loop Vectorization Patch Series brson#1
(RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).

Expected to be NFC for the current inner loop vectorization path. It
introduces the basic algorithm to build the VPlan plain CFG (single-level
CFG, no hierarchical CFG (H-CFG), yet) in the VPlan-native vectorization
path using VPInstructions. It includes:
  - VPlanHCFGBuilder: Main class to build the VPlan H-CFG (plain CFG without nested regions, for now).
  - VPlanVerifier: Main class with utilities to check the consistency of a H-CFG.
  - VPlanBlockUtils: Main class with utilities to manipulate VPBlockBases in VPlan.

Reviewers: rengolin, fhahn, mkuper, mssimpso, a.elovikov, hfinkel, aprantl.

Differential Revision: https://reviews.llvm.org/D44338



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332654 91177308-0d34-0410-b5e6-96231b3b80d8
alexcrichton pushed a commit to alexcrichton/llvm that referenced this pull request Jun 29, 2018
This patch adds a custom trunc store lowering for v4i8 vector types.
Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h)
and default action for v4i8 is to extract each element and issue 4
byte stores.

A better strategy would be to extended the promoted v4i16 to v8i16
(with undef elements) and extract and store the word lane which
represents the v4i8 subvectores. The construction:

  define void @foo(<4 x i16> %x, i8* nocapture %p) {
    %0 = trunc <4 x i16> %x to <4 x i8>
    %1 = bitcast i8* %p to <4 x i8>*
    store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2
    ret void
  }

Can be optimized from:

  umov    w8, v0.h[3]
  umov    w9, v0.h[2]
  umov    w10, v0.h[1]
  umov    w11, v0.h[0]
  strb    w8, [x0, brson#3]
  strb    w9, [x0, brson#2]
  strb    w10, [x0, brson#1]
  strb    w11, [x0]
  ret

To:

  xtn     v0.8b, v0.8h
  str     s0, [x0]
  ret

The patch also adjust the memory cost for autovectorization, so the C
code:

  void foo (const int *src, int width, unsigned char *dst)
  {
    for (int i = 0; i < width; i++)
       *dst++ = *src++;
  }

can be vectorized to:

  .LBB0_4:                                // %vector.body
                                          // =>This Inner Loop Header: Depth=1
        ldr     q0, [x0], #16
        subs    x12, x12, brson#4            // =4
        xtn     v0.4h, v0.4s
        xtn     v0.8b, v0.8h
        st1     { v0.s }[0], [x2], brson#4
        b.ne    .LBB0_4

Instead of byte operations.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335735 91177308-0d34-0410-b5e6-96231b3b80d8
alexcrichton pushed a commit to alexcrichton/llvm that referenced this pull request Aug 15, 2018
…d VPlan for tests."

Memory leaks in tests.
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/6289/steps/check-llvm%20asan/logs/stdio

Direct leak of 192 byte(s) in 1 object(s) allocated from:
    #0 0x554ea8 in operator new(unsigned long) /b/sanitizer-x86_64-linux-bootstrap/build/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:106
    brson#1 0x56cef1 in llvm::VPlanTestBase::doAnalysis(llvm::Function&) /b/sanitizer-x86_64-linux-bootstrap/build/llvm/unittests/Transforms/Vectorize/VPlanTestBase.h:53:14
    brson#2 0x56bec4 in llvm::VPlanTestBase::buildHCFG(llvm::BasicBlock*) /b/sanitizer-x86_64-linux-bootstrap/build/llvm/unittests/Transforms/Vectorize/VPlanTestBase.h:57:3
    brson#3 0x571f1e in llvm::(anonymous namespace)::VPlanHCFGTest_testVPInstructionToVPRecipesInner_Test::TestBody() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/unittests/Transforms/Vectorize/VPlanHCFGTest.cpp:119:15
    brson#4 0xed2291 in testing::Test::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc
    brson#5 0xed44c8 in testing::TestInfo::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2656:11
    brson#6 0xed5890 in testing::TestCase::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2774:28
    brson#7 0xef3634 in testing::internal::UnitTestImpl::RunAllTests() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:4649:43
    brson#8 0xef27e0 in testing::UnitTest::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc
    #9 0xebbc23 in RUN_ALL_TESTS /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/include/gtest/gtest.h:2233:46
    #10 0xebbc23 in main /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/UnitTestMain/TestMain.cpp:51
    #11 0x7f65569592e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0)

and more.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336718 91177308-0d34-0410-b5e6-96231b3b80d8
alexcrichton pushed a commit to alexcrichton/llvm that referenced this pull request Aug 15, 2018
…ering"

This reverts commit r337021.

WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7
    brson#1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3
    brson#2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3
    brson#3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18
    brson#4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23
    brson#5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5
    brson#6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const*, llvm::raw_ostream&, char const*) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5
    brson#7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3
    brson#8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26

[...]

Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv'
    #0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337079 91177308-0d34-0410-b5e6-96231b3b80d8
alexcrichton pushed a commit to alexcrichton/llvm that referenced this pull request Aug 15, 2018
A DAG-NOT-DAG is a CHECK-DAG group, X, followed by a CHECK-NOT group,
N, followed by a CHECK-DAG group, Y.  Let y be the initial directive
of Y.  This patch makes the following changes to the behavior:

    1. Directives in N can no longer match within part of Y's match
       range just because y happens not to be the earliest match from
       Y.  Specifically, this patch withdraws N's search range end
       from y's match range start to Y's match range start.

    2. y can no longer match within X's match range, where a y match
       produced a reordering complaint, which is thus no longer
       possible.  Specifically, this patch withdraws y's search range
       start from X's permitted range start to X's match range end,
       which was already the search range start for other members of
       Y.

Both of these changes can only increase the number of test passes: brson#1
constrains the ability of CHECK-NOTs to match, and brson#2 expands the
ability of CHECK-DAGs to match without complaints.

These changes are based on discussions at:

   <http://lists.llvm.org/pipermail/llvm-dev/2018-May/123550.html>
   <https://reviews.llvm.org/D47106>

which conclude that:

    1. These changes simplify the FileCheck conceptual model.  First,
       it makes search ranges for DAG-NOT-DAG more consistent with
       other cases.  Second, it was confusing that y was treated
       differently from the rest of Y.

    2. These changes add theoretical use cases for DAG-NOT-DAG that
       had no obvious means to be expressed otherwise.  We can justify
       the first half of this assertion with the observation that
       these changes can only increase the number of test passes.

    3. Reordering detection for DAG-NOT-DAG had no obvious real
       benefit.

We don't have evidence from real uses cases to help us debate
conclusions brson#2 and brson#3, but brson#1 at least seems intuitive.

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D48986

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337605 91177308-0d34-0410-b5e6-96231b3b80d8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants