Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Cartesian to base #5387

Merged
merged 8 commits into from
Jan 14, 2014
Merged

Move Cartesian to base #5387

merged 8 commits into from
Jan 14, 2014

Conversation

timholy
Copy link
Member

@timholy timholy commented Jan 13, 2014

This introduces a new-and-improved version of Cartesian, and rewrites certain functionality from base to use it.

Main points:

  • A new macro @ngenerate that takes away most of the "wrapper pain" from using Cartesian
  • Only a handful of functions have been migrated---I've focused this on the more difficult cases, namely getindex, setindex!, reducedim, and broadcast.
  • Despite adding cartesian.jl (a 400+ line file), this leads to a net reduction of the codebase by approximately 400 lines; almost 800 lines were deleted by reimplementing just those 4 core functions.
  • In the case of broadcast, the new version is more general in that it supports AbstractArrays and not just StridedArrays.
  • Many functions that have previously been written with linear indexing can be easily re-written. da8f1dd is included as an example of how to do this for a single function, fill!.
  • Performance for SubArrays is notably better for some the functions that have been migrated over; performance has been approximately maintained in other cases.

TODO:

  • From the documentation I didn't quite follow how broadcast_getindex and broadcast_setindex! are supposed to work (what are the inds arrays?); these function have been deleted from this PR, but can be restored if needed.
  • Initially I intended to migrate the BitArray functions, so that gen_cartesian_map could be eliminated. I chickened out in the face of their complexity, but if this PR makes it in we could do that at a later time.

All times are after suitable warmup.

On master:

X = zeros(100,100,30,30);
S = sub(X, :, :, :, :);
julia> @time fill!(S, 5);
elapsed time: 0.555467706 seconds (64 bytes allocated)

This PR:

julia> @time fill!(S, 5);
elapsed time: 0.07676407 seconds (48 bytes allocated)

Approximately 7-fold better.

Other times are shown below. Test code is in this gist, as are the results of running arrayperf in test/.

reducedim (first set of rows is for Arrays, 2nd for SubArrays; left column is the region, middle column is timing for master, right column is this PR):

1   0.1435  0.1423
2   0.1754  0.1608
3   0.1538  0.1649
4   0.3379  0.3591
(1,2)           0.1339  0.1445
(1,3)           0.1302  0.1361
(1,4)           0.1305  0.1418
(2,3)           0.1574  0.1562
(2,4)           0.1608  0.1572
(3,4)           0.1511  0.1595
(1,2,3)         0.1244  0.1418
(1,2,4)         0.1244  0.142
(1,3,4)         0.129   0.1366
(2,3,4)         0.1572  0.1553
(1,2,3,4)       0.1242  0.1423
1   0.4896  0.3994
2   1.0059  0.4507
3   1.1755  0.4613
4   0.9263  0.5376
(1,2)           0.4883  0.3808
(1,3)           0.7812  0.3811
(1,4)           0.5038  0.3813
(2,3)           1.2748  0.4469
(2,4)           1.0367  0.4485
(3,4)           1.81    0.4545
(1,2,3)         0.4832  0.3807
(1,2,4)         0.4891  0.3809
(1,3,4)         0.6274  0.3814
(2,3,4)         2.8611  0.4469
(1,2,3,4)       0.4828  0.3807
broadcast:
(10,)   (1,10)  0.5324  0.6126
(10,10) (10,10) 0.3802  0.4507
(1000,) (1,1000)    0.0778  0.0641
(10,1,10)   (1,10,10)   0.1142  0.1112
(10,10,1)   (1,10,10)   0.0948  0.0936
(100,1,100) (1,100,100) 0.0659  0.0566

arrayperf results are in the gist.

CC @lindahua, @toivoh.

@timholy
Copy link
Member Author

timholy commented Jan 13, 2014

Also CC @carlobaldassi

@toivoh
Copy link
Contributor

toivoh commented Jan 13, 2014

broadcast_getindex and broadcast_setindex! are an implementation of the pointwise indexing operations that I suggested in #2591. One reason that I put them in was to demonstrate the feasibility of that concept. But I'm not sure whether there is interest to keep them.

@timholy
Copy link
Member Author

timholy commented Jan 13, 2014

I should have added: in my view this material is quite reasonable for 0.3, but we shouldn't export any of Cartesian. (No exports have been added in this PR.) The advantages of merging sooner rather than later are that:

  • While Cartesian is a bit "ugly", I don't think anyone could call master's current reducedim.jl, or broadcast.jl, or gen_array_index_map easy bedtime reading. In both cases, the new core algorithms are approximately a dozen lines of code, and (once you get used to reading Cartesian code) pretty transparent.
  • The new algorithms are much closer in spirit to how we should write things when a prettier substitute for Cartesian gets implemented. Consequently, most of the porting work is already done.
  • The fact that we can be more systematic---often replacing 10 different implementations for different dimensionalities or argument types with a single function---means it's easier to maintain.

@JeffBezanson
Copy link
Member

Wow. A net -400 lines to get speedups puts this change in a distinguished category indeed.

I think we're all on the same page that needing to switch to an alternate syntactic universe of @ signs to do N-d stuff is not ideal, but a patch such as this can hardly be ignored. Our goal should be to keep these changes more-or-less in place, but to be able to replace the macros with better syntax and compiler optimizations. For example it is very clear that i is a tuple and the N should follow this tuple around via type inference.

@StefanKarpinski
Copy link
Member

Wow, this is super impressive. I think the last major change that was anywhere near this big of a net code deletion was the sort refactor. While I agree that we definitely want to move past doing all of this with macros, I think this is a good intermediate point and the macros tell us what kinds of things language primitives need to be able to do well. I'm all for merging it once various TODOs are taken care of.

@timholy
Copy link
Member Author

timholy commented Jan 14, 2014

broadcast_getindex and broadcast_setindex! are an implementation of the pointwise indexing operations that I suggested in #2591. One reason that I put them in was to demonstrate the feasibility of that concept. But I'm not sure whether there is interest to keep them.

Totally cool. I put them back. I also:

  • Generalized broadcast_setindex! so the assigned value can either be an array or a scalar
  • Allowed assigned values to be any shape consistent with Base.setindex_shape_check
  • Added some bounds-checks and tests for them, as recent additions of @inbounds had introduced some memory-unsafe operations.

If you disagree with any of these, feel free to make changes.

timholy added a commit that referenced this pull request Jan 14, 2014
@timholy timholy merged commit 626b45b into master Jan 14, 2014
@toivoh
Copy link
Contributor

toivoh commented Jan 14, 2014

I'm glad that you were able to make sense of #2591 :) and thanks for putting them back. This looks really impressive!

@timholy
Copy link
Member Author

timholy commented Jan 14, 2014

Woot! Now let's wait for the bug reports to start flowing in...

Next issue: with the exception of the fill! example, I deliberately confined this patch to cases where we were already using codegen to write the algorithms---using these macros simply introduced a more flexible and compact way to do that. Now that this is done, should we do more stuff like da8f1dd, cherry-picking items from abstractarrays.jl to reimplement using Cartesian? The advantage is that we finally have a way to easily write algorithms that perform quite well on SubArrays (although even cartesian SubArray indexing still might need a bit of optimization); the disadvantage is that codegen would spread into places where we didn't formerly use it.

@kmsquire
Copy link
Member

One of the things I love about julia is it's readability, so while I really appreciate the speed improvements and reduction in code, and at a high level I understand what you're doing, da8f1dd is somewhat opaque to me.

@carlobaldassi
Copy link
Member

Mmh, yes, I vaguely remember about a time when I used to know how those BitArray functions worked ;)

More seriously, nice job! I'll look into this issue of getting rid of gen_cartesian_map as soon as I have the time.

@timholy
Copy link
Member Author

timholy commented Jan 14, 2014

@kmsquire, @ngenerate is not present in the Cartesian package and therefore not documented, but with a little practice (hopefully) it's pretty simple. The syntax is @ngenerate itersymbol <function definition>; this causes methods to be created for any integer value of itersymbol. This example works exactly like what's shown in the first couple of lines of Cartesian.jl documentation.

For any function that has historically been implemented using linear indexing and gets migrated to Cartesian, all one really needs to know is one screenful's worth of documentation. From that perspective, it's perhaps not so bad. But I share your concern, and for that reason I am genuinely of two minds about whether more examples like fill! should be implemented. I'd even be totally fine with reverting that particular change; I included it precisely to provoke this kind of discussion. I think the less-questionable usage of Cartesian is for cases where our code was already unreadable because we relied on code generation; in such cases, I would assert that if you spend a little bit of time getting used to Cartesian, you'll find the new versions many times more transparent.

@timholy
Copy link
Member Author

timholy commented Jan 14, 2014

@carlobaldassi, no hurry. I may be able to help, too, but like you for now I should move on to other things. Thanks for your interest, though!

@timholy
Copy link
Member Author

timholy commented Jan 14, 2014

@kmsquire and others, to see the macros in operation just do something like this:

using Base.Cartesian
macroexpand(quote
<blob>
end)

where you just copy-paste the whole function definition (including the @ngenerate). You'll see it generates explicit versions for 1 and 2 dimensions (in this case with 1 and 2 loops, respectively), and uses a dictionary cache for anything higher. For starters, don't try to understand how the dictionary cache works, just focus on the explicit cases.

@kmsquire
Copy link
Member

Thanks, Tim. I really do find this work quote impressive!

@JeffBezanson
Copy link
Member

I'm not going to revert this, but it was merged slightly hastily. We have lost a lot of type information in the changed functions, due to calling a function taken from a dictionary. In many cases this can be fixed using the pattern:

A = alloc_result()
something_magical!(A)
return A

@kmsquire
Copy link
Member

Perhaps we should have type checks in the tests for those functions?

On Wednesday, January 15, 2014, Tim Holy wrote:

Sorry, I interpreted earlier comments as a go-ahead.

I was about to say that they already work that way:
https://github.com/JuliaLang/julia/blob/master/base/reducedim.jl#L49-L51
https://github.com/JuliaLang/julia/blob/master/base/broadcast.jl#L106
https://github.com/JuliaLang/julia/blob/master/base/multidimensional.jl#L39
https://github.com/JuliaLang/julia/blob/master/base/multidimensional.jl#L42

https://github.com/JuliaLang/julia/blob/master/base/multidimensional.jl#L122
but now I see what you mean.

Should be fixed by c6d359bhttps://github.com/JuliaLang/julia/commit/c6d359b27b4ca41dec782a8bab38d2cec9981855.
Let me know about other problems.


Reply to this email directly or view it on GitHubhttps://github.com//pull/5387#issuecomment-32358652
.

@kmsquire
Copy link
Member

Okay, I see that you are two steps ahead of me, Tim. :-)

sum(A::AbstractArray{Bool}, region) = sum!(reduction_init(A,region,0), A)
eval(ngenerate(:N, :(sum!{T,N}(R::AbstractArray, A::AbstractArray{T,N})), N->gen_reduction_body(N, +)))
prod{T}(A::AbstractArray{T}, region) = prod!(reduction_init(A,region,one(T)), A)
eval(ngenerate(:N, :(prod!{T,N}(R::AbstractArray, A::AbstractArray{T,N})), N->gen_reduction_body(N, *)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we bring back something like Dahua's @code_reducedim to make these easier to read?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's kind of what @ngenerate is, but I didn't want to use it here because I was re-using a body-generator. But I can create a macro for that, too---good idea.

@lindahua
Copy link
Contributor

@timholy I think you might want to add the document, so people can learn to use these?

@timholy
Copy link
Member Author

timholy commented Jan 17, 2014

I've not added anything to Julia's documentation because I think these should remain unexported---we all hope they'll go away ASAP.

Most of it is documented pretty thoroughly in Cartesian.jl (there were only small changes needed for the transition into base), but perhaps I should add some description of @ngenerate? (even though it doesn't have @ngenerate)

timholy referenced this pull request Feb 3, 2014
* Use setindex_shape_check and DimensionMismatch instead
  of generic errors; this (and a couple of extra minor fixes)
  makes BitArrays behave like Arrays (again)
* Move portions of the indexing code to multidimensional.jl
* Restrict signatures so as to avoid to_index and convert, then
  create very generic methods which do the conversion and the
  dispatch: this greatly simplifies the logic and removes most
  of the need for disambiguation. Should also improve code
  generation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants