Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unrolling operations on short tuples #28614

Closed
wants to merge 4 commits into from
Closed

Unrolling operations on short tuples #28614

wants to merge 4 commits into from

Conversation

bramtayl
Copy link
Contributor

See #28431

@bramtayl
Copy link
Contributor Author

Sorry for the multiple typos...

@KristofferC
Copy link
Member

KristofferC commented Aug 12, 2018

It is helpful if you put a bit more effort into justifying the PR. While there is some breadcrumbs in #28431 it would be very useful to have some introduction to the problem, analysis of benchmarks before and after, something showing that consideration to compilation time has been taken etc.

A two word PR description to a short discussion that is not clearly linked to the code here makes this hard to review and the changes of it getting merged likely reduced.

@bramtayl
Copy link
Contributor Author

I have no idea what the effects on compile time will be, and also, I'm not sure how to run benchmarks. But for some more information:

There are some operations that in tight loops where we need type constant operations on small tuples to be efficient. Most typically, this is dimension tuples, where we mix integers and axes. So for doing dimension surgery (for example, in the inner loops of something like broadcast or mapslices) you need recursive definitions. Several operations on tuples in Base rely on looping through tuples which won't be type constant. Notable exceptions include map, broadcast, etc. Now, for some operations, like boolean getindex, getting type stability also requires constant propagation of booleans. My understanding is to get this working in Base will require careful use of @inline and @pure. It's something I could go through and do but I'd rather wait until we've squeezed all the constant propagation out we can. In the mean time, instead of using true and false, you can use typed values like Val{true}() or True(). This isn't ideal but it makes type stability quite a bit more predictable.

If you have very long tuples with a single type, recursive unrolling seems to be slow compared to just looping through them. Therefore, to limit the use of unrolling definitions, I've restricted these methods to just ShortTuples. I have no idea what the size of ShortTuples that best maximizes performance would be. It might be the case that the best size will vary by function definition.

I'm very sure there are additional operations that could become faster on short tuples by unrolling. This is just a start, and I put it up mostly to see what people thought. I think the changes are mostly non-breaking. The biggest change would be returning tuples where previous operations (unoptimally) returned vectors.

Additional effort after this PR would be using these recursive definitions in tight inner loops, and I know for sure this would make mapslices much faster.

Let me know if there's anymore information that I can provide?

@KristofferC
Copy link
Member

So the purpose of this PR is performance but you cannot perform any benchmarks? Or is the purpose some sort of type stability, but then there are no tests for this. I'm just trying to figure out the concrete problem that this PR solves.

@bramtayl
Copy link
Contributor Author

Well, due to the problems with type stability, not all functions will be type stable. I could write up some @inferred tests for the type stable ones, though, and I could make some benchmarks for small tuples (type constant and not type constant) before and after these changes.

@KristofferC
Copy link
Member

Having some tests that used to fail and now pass would help clarify the purpose here.

@andyferris
Copy link
Member

@bramtayl First thank you for your work.

I do agree the way these operations are currently working is rather haphazard. Some things unroll. Some things iterate. Things like foreach are forgotten. As a user I forget what I can rely on and whatever and it's great to see some work cleaning this up. I'll note that NamedTuple uses the new generated functions with fallbacks (I have found the implementation there to be first-class and reliable).

However I have to reiterate what Kristoffer is saying. It should be expected that a PR comes with a justification, that performance PRs come with a benchmark, and that type-stability PRs come with a before-and-after example. It's my experience that writing the code is the easy part - communicating with humans is a much more challenging and involved issue 😄

Tuple{A, B, C, D, E, F, G, H, I, J, K, L, M} where {A, B, C, D, E, F, G, H, I, J, K, L, M},
Tuple{A, B, C, D, E, F, G, H, I, J, K, L, M, N} where {A, B, C, D, E, F, G, H, I, J, K, L, M, N},
Tuple{A, B, C, D, E, F, G, H, I, J, K, L, M, N, O} where {A, B, C, D, E, F, G, H, I, J, K, L, M, N, O},
Tuple{A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P} where {A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note you can write NTuple{16, Any} instead of all of that... it's equivalent since tuple parameters are covariant

@bramtayl
Copy link
Contributor Author

Thanks for the feedback. I dont' really have time for a full PR rn but most of the code currently lives in Keys.jl in case anyone is ever interested

@bramtayl bramtayl closed this Aug 21, 2018
@bramtayl bramtayl mentioned this pull request Dec 23, 2018
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants