Vectorize `@ inbounds for x in A ...` #13866

simonster · 2015-11-03T23:36:04Z

This would previously have been an infinite loop if length(A) == typemax(Int) so the loop vectorizer couldn't compute a trip count. Ref #13860 (comment)

This would previously have been an infinite loop if `length(A) == typemax(Int)` so the loop vectorizer couldn't compute a trip count.

simonster · 2015-11-04T01:48:50Z

Actually, maybe we should we start i at zero here instead of one and adjust everything else to match? This doesn't seem to make much of a difference if the loop gets vectorized, but I get a ~10% perf boost on count1 from #13860 (comment) without @inbounds. With i starting at zero, on LLVM 3.3, ASM is:

L31:    cmpq    %r8, %rcx
        jae     L82
        movq    (%rdi), %rdx
Source line: 4
Source line: [inline] float.jl:269
        subq    %rsi, %rdx
        vcmpneqsd       (%rdx), %xmm0, %xmm1
        vmovd   %xmm1, %edx
        andl    $1, %edx
        addq    $-8, %rsi
Source line: 3
        incq    %rcx
Source line: 4
Source line: [inline] float.jl:269
        addq    %rdx, %rax
        cmpq    %rcx, %r8
        jne     L31

vs.

L37:    cmpq    %r8, %rcx
        jae     L95
Source line: 4
Source line: [inline] float.jl:269
        leaq    (,%rsi,8), %r10
Source line: 3
        movq    (%rdi), %rdx
Source line: 4
Source line: [inline] float.jl:269
        subq    %r10, %rdx
        vcmpneqsd       (%rdx), %xmm0, %xmm1
        vmovd   %xmm1, %edx
        andl    $1, %edx
        incq    %rcx
        decq    %rsi
        addq    %rdx, %rax
        cmpq    %rsi, %r9
        jne     L37

OTOH, LLVM 3.6 appears to be smarter and this doesn't make a difference there, so maybe this isn't worth it?

timholy · 2015-11-04T08:43:04Z

Possibly related to #9182.

Vectorize `@ inbounds for x in A ...`

nalimilan · 2015-11-05T12:40:45Z

This would deserve a comment explaining why the function isn't written in the most natural way. Especially since this isn't covered by the tests, which means anybody might break this by rewriting it to an apparently better form.

This would previously have been an infinite loop if `length(A) == typemax(Int)` so the loop vectorizer couldn't compute a trip count. (cherry picked from commit fa89a6e) ref #13866

@inbounds

Currently, if a vector is resized in the midst of iteration, then `done` might "miss" the end of iteration. This trivially changes the definition to catch such a case. I am not sure what guarantees we make about mutating iterables during iteration, but this seems simple and easy to support. Note, though, that it is somewhat tricky: until #13866 we used `i > length(a)`, but that foils vectorization due to the `typemax` case. This definition seems to get the best of both worlds. For a definition like `f` below, this new definition just requires one extra `add i64` operation in the preamble (before the loop). Everything else is identical to master. ```julia function f(A) r = 0 @inbounds for x in A r += x end r end ```

@inbounds

Currently, if a vector is resized in the midst of iteration, then `done` might "miss" the end of iteration. This trivially changes the definition to catch such a case. I am not sure what guarantees we make about mutating iterables during iteration, but this seems simple and easy to support. Note, though, that it is somewhat tricky: until #13866 we used `i > length(a)`, but that foils vectorization due to the `typemax` case. This definition seems to get the best of both worlds. For a definition like `f` below, this new definition just requires one extra `add i64` operation in the preamble (before the loop). Everything else is identical to master. ```julia function f(A) r = 0 @inbounds for x in A r += x end r end ```

@inbounds

* More robust iteration over Vectors Currently, if a vector is resized in the midst of iteration, then `done` might "miss" the end of iteration. This trivially changes the definition to catch such a case. I am not sure what guarantees we make about mutating iterables during iteration, but this seems simple and easy to support. Note, though, that it is somewhat tricky: until #13866 we used `i > length(a)`, but that foils vectorization due to the `typemax` case. This definition seems to get the best of both worlds. For a definition like `f` below, this new definition just requires one extra `add i64` operation in the preamble (before the loop). Everything else is identical to master. ```julia function f(A) r = 0 @inbounds for x in A r += x end r end ```

Vectorize @ inbounds for x in A ...

fa89a6e

This would previously have been an infinite loop if `length(A) == typemax(Int)` so the loop vectorizer couldn't compute a trip count.

simonster added the backport pending 0.4 label Nov 3, 2015

timholy mentioned this pull request Nov 4, 2015

Poor performance for linspace #13401

Closed

simonster added a commit that referenced this pull request Nov 4, 2015

Merge pull request #13866 from JuliaLang/sjk/array-vectorize

1d270c6

Vectorize `@ inbounds for x in A ...`

simonster merged commit 1d270c6 into master Nov 4, 2015

simonster deleted the sjk/array-vectorize branch November 4, 2015 23:24

tkelman removed the backport pending 0.4 label Nov 9, 2015

mbauman mentioned this pull request May 11, 2018

More robust iteration over Vectors #27079

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize `@ inbounds for x in A ...` #13866

Vectorize `@ inbounds for x in A ...` #13866

simonster commented Nov 3, 2015

simonster commented Nov 4, 2015

timholy commented Nov 4, 2015

nalimilan commented Nov 5, 2015

Vectorize @ inbounds for x in A ... #13866

Vectorize @ inbounds for x in A ... #13866

Conversation

simonster commented Nov 3, 2015

simonster commented Nov 4, 2015

timholy commented Nov 4, 2015

nalimilan commented Nov 5, 2015

Vectorize `@ inbounds for x in A ...` #13866

Vectorize `@ inbounds for x in A ...` #13866