Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve clone_from_slice performance #17960

Merged
merged 1 commit into from
Oct 24, 2014
Merged

improve clone_from_slice performance #17960

merged 1 commit into from
Oct 24, 2014

Conversation

mahkoh
Copy link
Contributor

@mahkoh mahkoh commented Oct 11, 2014

Old vs. New vs. Vec::push_all

test slice     ... bench:   3091942 ns/iter (+/- 54460)
test slice_new ... bench:   1800065 ns/iter (+/- 69513)
test vec       ... bench:   1804805 ns/iter (+/- 75609)

@thestinger
Copy link
Contributor

Using clone rather than clone_from is less efficient for non-trivial types. Why not work on making zip optimize here if that's the real problem? At the very least, it should avoid zip while continuing to use clone_from because right now it's a micro-optimization at the expense of being a macro-pessimization.

@mahkoh
Copy link
Contributor Author

mahkoh commented Oct 11, 2014

Updated with clone_from.

@mahkoh
Copy link
Contributor Author

mahkoh commented Oct 11, 2014

This will be optimized to a memcpy for Copy types. I don't see how Zip and Chain can be optimized for this case because Rust doesn't allow you to write a specialized implementation for ExactSize iterators.

@thestinger
Copy link
Contributor

The lack of a specialized implementation isn't a blocker for having it optimize. Attempting to fix the underlying issue is far better than just working around it across the standard library before it's deemed to be infeasible.

@zwarich worked on fixing missed optimizations for iterators and probably has a good idea about what would be necessary here.

@zwarich
Copy link

zwarich commented Oct 14, 2014

If we want the zip version to compile to a memcpy or a memmove, then first LLVM would have to compile this function into an lib call:

void f(int *a1, int *a2, size_t len1, size_t len2) {
    for (size_t i = 0, j = 0; i < len1 && j < len2; ++i, ++j) {
        a1[i] = a2[j];
    }
}

It is able to vectorize it, albeit with an aliasing check for the two arrays outside of the vectorized loop. It seems like it really should be able to realize that it can call memmove. It doesn't do any better with one induction variable, which at least says that induction variable analysis is working:

void g(int *a1, int *a2, size_t len1, size_t len2) {
    for (size_t i = 0; i < len1 && i < len2; ++i) {
        a1[i] = a2[i];
    }
}

If I add restrict to this, then it actually generates a call to memcpy:

void f(int * restrict a1, int * restrict a2, size_t len1, size_t len2) {
    for (size_t i = 0, j = 0; i < len1 && j < len2; ++i, ++j) {
        a1[i] = a2[j];
    }
}

Turning the zip code into something like that shouldn't be a problem, but I believe there is still an extra null check for None that isn't being eliminated. I know how to add it to the null check elimination pass, just haven't gotten around to it.

If we can get the Rust IR to look like the IR from the C code, then we have two options for getting a lib call out of it:

  1. Teach LLVM to be able to generate a memmove instead of just a memcpy. It seems that this should be possible and would benefit a lot of people using C/C++ code. On many platforms memmove and memcpy are aliases anyways.
  2. Use the new scoped aliasing metadata to express that the output can't alias the input here. This is more difficult, and the scoping of the aliasing regions doesn't match our impending switch to single-entry / multiple-exit regions, although it could hopefully be modified to do so.

@zwarich
Copy link

zwarich commented Oct 14, 2014

Actually, shouldn't we be able to say that an &mut slice as an argument is noalias?

@thestinger
Copy link
Contributor

@zwarich: Yes, but we'd need to mark it via the new scoped noalias metadata as parameter attributes only work for the outermost pointer.

@zwarich
Copy link

zwarich commented Oct 14, 2014

@thestinger We would have to split slices into their components at the top of functions, and then use scoped aliasing metadata with a scope corresponding to the whole function. This is morally equivalent to changing the calling convention to pass slices as separate components. It isn't sound to use scoped metadata with a scope of the entire function for every &mut that appears in the function, since there may be nonoverlapping borrows of the same path.

@thestinger
Copy link
Contributor

@zwarich: I don't think non-overlapping borrows violates LLVM's NoAlias guarantee. There is no memory dependency between them so it doesn't matter.

The NoAlias response may be used when there is never an immediate dependence between any memory reference based on one pointer and any memory reference based the other. The most obvious example is when the two pointers point to non-overlapping memory ranges. Another is when the two pointers are only ever used for reading memory. Another is when the memory is freed and reallocated between accesses through one pointer and accesses through the other — in this case, there is a dependence, but it’s mediated by the free and reallocation.

@thestinger
Copy link
Contributor

LLVM converts our usage of noalias in function parameters into scoped noalias metadata anyway.

@zwarich
Copy link

zwarich commented Oct 14, 2014

@thestinger Consider a program fragment like the following:

let mut i = 0i;
{
    let p = &mut i;
    *p = 1;
}
{
    let q = &mut i;
    *q = 2;
}

Ignoring the reality that the LLVM optimizer will just convert this to SSA form and ignore any alias info we give, saying that both p and q are noalias in the same scope would mean that LLVM is free to move *p = 1 after *q = 2, changing the behavior.

@thestinger
Copy link
Contributor

@zwarich: Those borrows don't overlap in the same scope though. The scoped noalias metadata isn't just a function scope thing.

@zwarich
Copy link

zwarich commented Oct 14, 2014

@thestinger My original comment that I thought you were disagreeing with mentioned that we can't put them all in the same scope, unless we unpack the slice once at the beginning of the function:

It isn't sound to use scoped metadata with a scope of the entire function for every &mut that appears in the function

bors added a commit that referenced this pull request Oct 24, 2014
Old vs. New vs. Vec::push_all

```
test slice     ... bench:   3091942 ns/iter (+/- 54460)
test slice_new ... bench:   1800065 ns/iter (+/- 69513)
test vec       ... bench:   1804805 ns/iter (+/- 75609)
```
@bors bors closed this Oct 24, 2014
@bors bors merged commit cea171b into rust-lang:master Oct 24, 2014
lnicola pushed a commit to lnicola/rust that referenced this pull request Aug 29, 2024
fix: add extra_test_bin_args to test explorer test runner

`@HKalbasi` I thought I included this in rust-lang#17470 but it appears not so I have created a new issue rust-lang#17959 for this fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants