improve clone_from_slice performance #17960

mahkoh · 2014-10-11T23:18:43Z

Old vs. New vs. Vec::push_all

test slice     ... bench:   3091942 ns/iter (+/- 54460)
test slice_new ... bench:   1800065 ns/iter (+/- 69513)
test vec       ... bench:   1804805 ns/iter (+/- 75609)

thestinger · 2014-10-11T23:22:35Z

Using clone rather than clone_from is less efficient for non-trivial types. Why not work on making zip optimize here if that's the real problem? At the very least, it should avoid zip while continuing to use clone_from because right now it's a micro-optimization at the expense of being a macro-pessimization.

mahkoh · 2014-10-11T23:25:14Z

Updated with clone_from.

mahkoh · 2014-10-11T23:47:53Z

This will be optimized to a memcpy for Copy types. I don't see how Zip and Chain can be optimized for this case because Rust doesn't allow you to write a specialized implementation for ExactSize iterators.

thestinger · 2014-10-12T00:01:05Z

The lack of a specialized implementation isn't a blocker for having it optimize. Attempting to fix the underlying issue is far better than just working around it across the standard library before it's deemed to be infeasible.

@zwarich worked on fixing missed optimizations for iterators and probably has a good idea about what would be necessary here.

zwarich · 2014-10-14T18:14:13Z

If we want the zip version to compile to a memcpy or a memmove, then first LLVM would have to compile this function into an lib call:

void f(int *a1, int *a2, size_t len1, size_t len2) {
    for (size_t i = 0, j = 0; i < len1 && j < len2; ++i, ++j) {
        a1[i] = a2[j];
    }
}

It is able to vectorize it, albeit with an aliasing check for the two arrays outside of the vectorized loop. It seems like it really should be able to realize that it can call memmove. It doesn't do any better with one induction variable, which at least says that induction variable analysis is working:

void g(int *a1, int *a2, size_t len1, size_t len2) {
    for (size_t i = 0; i < len1 && i < len2; ++i) {
        a1[i] = a2[i];
    }
}

If I add restrict to this, then it actually generates a call to memcpy:

void f(int * restrict a1, int * restrict a2, size_t len1, size_t len2) {
    for (size_t i = 0, j = 0; i < len1 && j < len2; ++i, ++j) {
        a1[i] = a2[j];
    }
}

Turning the zip code into something like that shouldn't be a problem, but I believe there is still an extra null check for None that isn't being eliminated. I know how to add it to the null check elimination pass, just haven't gotten around to it.

If we can get the Rust IR to look like the IR from the C code, then we have two options for getting a lib call out of it:

Teach LLVM to be able to generate a memmove instead of just a memcpy. It seems that this should be possible and would benefit a lot of people using C/C++ code. On many platforms memmove and memcpy are aliases anyways.
Use the new scoped aliasing metadata to express that the output can't alias the input here. This is more difficult, and the scoping of the aliasing regions doesn't match our impending switch to single-entry / multiple-exit regions, although it could hopefully be modified to do so.

zwarich · 2014-10-14T18:26:00Z

Actually, shouldn't we be able to say that an &mut slice as an argument is noalias?

thestinger · 2014-10-14T18:49:06Z

@zwarich: Yes, but we'd need to mark it via the new scoped noalias metadata as parameter attributes only work for the outermost pointer.

zwarich · 2014-10-14T20:59:23Z

@thestinger We would have to split slices into their components at the top of functions, and then use scoped aliasing metadata with a scope corresponding to the whole function. This is morally equivalent to changing the calling convention to pass slices as separate components. It isn't sound to use scoped metadata with a scope of the entire function for every &mut that appears in the function, since there may be nonoverlapping borrows of the same path.

thestinger · 2014-10-14T21:45:23Z

@zwarich: I don't think non-overlapping borrows violates LLVM's NoAlias guarantee. There is no memory dependency between them so it doesn't matter.

The NoAlias response may be used when there is never an immediate dependence between any memory reference based on one pointer and any memory reference based the other. The most obvious example is when the two pointers point to non-overlapping memory ranges. Another is when the two pointers are only ever used for reading memory. Another is when the memory is freed and reallocated between accesses through one pointer and accesses through the other — in this case, there is a dependence, but it’s mediated by the free and reallocation.

thestinger · 2014-10-14T21:45:46Z

LLVM converts our usage of noalias in function parameters into scoped noalias metadata anyway.

zwarich · 2014-10-14T21:58:59Z

@thestinger Consider a program fragment like the following:

let mut i = 0i;
{
    let p = &mut i;
    *p = 1;
}
{
    let q = &mut i;
    *q = 2;
}

Ignoring the reality that the LLVM optimizer will just convert this to SSA form and ignore any alias info we give, saying that both p and q are noalias in the same scope would mean that LLVM is free to move *p = 1 after *q = 2, changing the behavior.

thestinger · 2014-10-14T22:02:35Z

@zwarich: Those borrows don't overlap in the same scope though. The scoped noalias metadata isn't just a function scope thing.

zwarich · 2014-10-14T22:04:25Z

@thestinger My original comment that I thought you were disagreeing with mentioned that we can't put them all in the same scope, unless we unpack the slice once at the beginning of the function:

It isn't sound to use scoped metadata with a scope of the entire function for every &mut that appears in the function

Old vs. New vs. Vec::push_all ``` test slice ... bench: 3091942 ns/iter (+/- 54460) test slice_new ... bench: 1800065 ns/iter (+/- 69513) test vec ... bench: 1804805 ns/iter (+/- 75609) ```

fix: add extra_test_bin_args to test explorer test runner `@HKalbasi` I thought I included this in rust-lang#17470 but it appears not so I have created a new issue rust-lang#17959 for this fix.

improve clone_from_slice performance

cea171b

bors closed this Oct 24, 2014

bors merged commit cea171b into rust-lang:master Oct 24, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve clone_from_slice performance #17960

improve clone_from_slice performance #17960

mahkoh commented Oct 11, 2014

thestinger commented Oct 11, 2014

mahkoh commented Oct 11, 2014

mahkoh commented Oct 11, 2014

thestinger commented Oct 12, 2014

zwarich commented Oct 14, 2014

zwarich commented Oct 14, 2014

thestinger commented Oct 14, 2014

zwarich commented Oct 14, 2014

thestinger commented Oct 14, 2014

thestinger commented Oct 14, 2014

zwarich commented Oct 14, 2014

thestinger commented Oct 14, 2014

zwarich commented Oct 14, 2014

improve clone_from_slice performance #17960

improve clone_from_slice performance #17960

Conversation

mahkoh commented Oct 11, 2014

thestinger commented Oct 11, 2014

mahkoh commented Oct 11, 2014

mahkoh commented Oct 11, 2014

thestinger commented Oct 12, 2014

zwarich commented Oct 14, 2014

zwarich commented Oct 14, 2014

thestinger commented Oct 14, 2014

zwarich commented Oct 14, 2014

thestinger commented Oct 14, 2014

thestinger commented Oct 14, 2014

zwarich commented Oct 14, 2014

thestinger commented Oct 14, 2014

zwarich commented Oct 14, 2014