Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

De-eval-ify vectorized unary functions over SparseMatrixCSCs, and then transition them to compact broadcast syntax #17265

Merged
merged 2 commits into from
Aug 27, 2016

Conversation

Sacha0
Copy link
Member

@Sacha0 Sacha0 commented Jul 4, 2016

Edit: Ref. #16285 and #11474.

This PR's first commit revises the implementation of vectorized unary functions over SparseMatrixCSCs, leveraging higher order functions to displace a set of macros and evals. Regarding performance, see below.

This PR's second commit then transitions those vectorized unary functions over SparseMatrixCSCs to compact broadcast syntax, accordingly revises the associated tests, expands those tests, and then adds deprecations for the vectorized calls.

Question: Both the existing code and that in this PR separate unary functions into three classes (with disjoint logic): (1) unary functions that map zeros to zeros and may map nonzeros to zeros; (2) unary functions that map zeros to zeros and nonzeros to nonzeros; and (3) unary functions that map both zeros and nonzeros to nonzeros. In the first class, when a nonzero in the original sparse matrix miraculously maps to exact zero under, e.g., sin, the implementation expunges that entry. This behavior seems like a holdover from the aggressive-stored-zero-expunging era, now inconsistent with behavior elsewhere. Perhaps classes one and two should be merged? Merging those two classes would enable some code reduction and might increase the performance of operations presently falling under class one (by eliminating a branch in an inner loop, and enabling @simd decoration of that inner loop). Thoughts?

Perf
With the exception of real and imag, for functions in the first class the existing and new implementations perform similarly or slightly favor the new implementation. The new implementation seems to resolve a type instability or so with real and imag, significantly improving performance: Where real is the existing implementation and ho_real this PR's, this bench

using BenchmarkTools

smallN = 10^1; smallA = sprand(smallN, smallN, 5/smallN); smallC = smallA + im*smallA;
largeN = 10^6; largeA = sprand(largeN, largeN, 10/largeN); largeC = largeA + im*largeA;

presmat = smallC;
benchpres = @benchmarkable real(Float64, presmat); tune!(benchpres);
benchnew = @benchmarkable ho_real(Float64, presmat); tune!(benchnew);
medianpres = median(run(benchpres));
mediannew = median(run(benchnew));
println(ratio(mediannew, medianpres))

for presmat = smallC yields

BenchmarkTools.TrialRatio:
  time:             0.07120253164556962
  gctime:           1.0
  memory:           0.5189873417721519
  allocs:           0.06557377049180328
  time tolerance:   5.00%
  memory tolerance: 1.00%

and for presmat = largeC yields

BenchmarkTools.TrialRatio:
  time:             0.16305659839888223
  gctime:           0.16709694727350574
  memory:           0.3442608362244807
  allocs:           3.4976803383995736e-7
  time tolerance:   5.00%
  memory tolerance: 1.00%

For functions in the second class, this PR's implementation exhibits significantly better performance on small matrices and somewhat better on large matrices, e.g. for the bench above with presmat = smallA, comparing expm1 and ho_expm1 yields

BenchmarkTools.TrialRatio:
  time:             0.18331303288672351
  gctime:           1.0
  memory:           0.7735849056603774
  allocs:           0.4
  time tolerance:   5.00%
  memory tolerance: 1.00%

and for presmat = largeA

BenchmarkTools.TrialRatio:
  time:             0.8084212063475197
  gctime:           0.9765840983181242
  memory:           0.9999974302672074
  allocs:           0.4375
  time tolerance:   5.00%
  memory tolerance: 1.00%

This PR's implementation also appears to resolve a type instability or so with abs and abs2, e.g. comparing abs2 and ho_abs2 for presmat = smallC the above bench yields

BenchmarkTools.TrialRatio:
  time:             0.05240027045300879
  gctime:           1.0
  memory:           0.5189873417721519
  allocs:           0.06557377049180328
  time tolerance:   5.00%
  memory tolerance: 1.00%

and for presmat = largeC

BenchmarkTools.TrialRatio:
  time:             0.11257829481710147
  gctime:           0.1572109282310488
  memory:           0.3442608362244807
  allocs:           3.4976803383995736e-7
  time tolerance:   5.00%
  memory tolerance: 1.00%

For functions in the third class, performance differences are within the noise. Best!

eval(Multimedia, quote
export @MIME
macro MIME(s)
Base.warn_once("@MIME(\"\") is deprecated, use MIME\"\" instead.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong conflict resolution here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed? Thanks!

@Sacha0 Sacha0 force-pushed the unarybcastspmat branch from b3b8350 to fafacd9 Compare July 4, 2016 02:20
@ViralBShah ViralBShah added the sparse Sparse arrays label Jul 4, 2016
@Sacha0 Sacha0 force-pushed the unarybcastspmat branch from fafacd9 to 57ea2f9 Compare July 5, 2016 18:18
@Sacha0
Copy link
Member Author

Sacha0 commented Jul 5, 2016

Messed things up a bit while addressing comments. Should be in order again now. Thanks!

@tkelman
Copy link
Contributor

tkelman commented Jul 18, 2016

needs rebase

@Sacha0 Sacha0 force-pushed the unarybcastspmat branch from 57ea2f9 to 08eb7b6 Compare July 19, 2016 20:59
@Sacha0
Copy link
Member Author

Sacha0 commented Jul 19, 2016

needs rebase

Thanks for the ping! Rebased.

@@ -793,6 +793,19 @@ function transpose(x)
return x
end

# Deprecate vectorized unary functions over sparse matrices in favor of compact broadcast syntax (#17265).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we won't be deprecating these in 0.5, move after the "to be deprecated in 0.6"

@Sacha0
Copy link
Member Author

Sacha0 commented Aug 5, 2016

Rebased for 0.6. Thanks!

# Operations that map both zeros and nonzeros to zeros, yielding a dense matrix
"""
Takes unary function `f` that maps both zeros and nonzeros to nonzeros, and returns a new
`Matrix{TvB}` `B` effectively generated by applying `f` to every entry in `A`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why "effectively" ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the B = fill(f(zero(Tv)), size(A)), this method calls f only once rather than once for each zero in A. Would make a difference if e.g. f has side effects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth writing that down in the docstring?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Clarified in docstring. Thanks!

@tkelman
Copy link
Contributor

tkelman commented Aug 6, 2016

ready to merge?

@ViralBShah
Copy link
Member

Looks like a good candidate for backporting as well for 0.5.x.

@tkelman
Copy link
Contributor

tkelman commented Aug 6, 2016

No it isn't, it's a whole bunch of deprecations.

@Sacha0
Copy link
Member Author

Sacha0 commented Aug 7, 2016

ready to merge?

I assume this query was not directed at me, but in case it was: Yes, this PR should be ready to merge, though it might interact with #17302 and this PR should be easier to rebase than #17302. Thanks and best!

@Sacha0
Copy link
Member Author

Sacha0 commented Aug 10, 2016

Rebased resolving deprecation conflict. Best!

@tkelman tkelman added this to the 0.6.0 milestone Aug 25, 2016
@tkelman
Copy link
Contributor

tkelman commented Aug 25, 2016

@Sacha0 sorry we frequently do a bad job of merging your (always impeccably well-done) PR's before they hit conflicts. Rebase?

@Sacha0
Copy link
Member Author

Sacha0 commented Aug 25, 2016

No worries! Your time is much better spent elsewhere at the moment. Also thanks for the kind words! Related question: Though master is technically 0.6-dev at this point, I imagine that prior to the actual 0.5 release it's advantageous to keep master as close as possible to 0.5.x? That is, I imagine holding noncritical PRs targeting 0.6 till 0.5 is out the door makes life easier? If so, apologies for bothering you with noncritical PRs targeting 0.6 recently, and I'll hold bumps / non-bugfix work till 0.5 is out. Thanks and best!

@tkelman
Copy link
Contributor

tkelman commented Aug 25, 2016

It's fine actually, as long as things aren't a huge risk of causing non-trivial conflicts with bugfixes that need backporting. 0.5 is most likely only one more RC away from final, and we need to get moving on 0.6 just as much as we need to get 0.5 final done.

Sacha0 added 2 commits August 25, 2016 14:25
…higher order functions and multiple dispatch to displace eval. Fixes some apparent type instabilities.
…act broadcast syntax, accordingly revise and expand the associated tests, and add deprecations for the vectorized syntax.
@tkelman tkelman merged commit 379c248 into JuliaLang:master Aug 27, 2016
@Sacha0
Copy link
Member Author

Sacha0 commented Aug 27, 2016

Thanks for reviewing / merging!

@Sacha0 Sacha0 deleted the unarybcastspmat branch August 27, 2016 16:24
@stevengj
Copy link
Member

stevengj commented Aug 31, 2016

I wonder if we should have a more general function that does broadcast on sparse matrices:

function broadcast{T<:Number}(f::Function, A::AbstractSparseArray{T})
    fzero = f(zero(T))
    if fzero == zero(fzero)
       ... broadcast to sparse array, operating f only on nonzero elements ...
    else
       .... broadcast to dense array, though as an optimization we could just use fzero for the zero elements ... or throw an error?
    end
end

This assumes f is pure, though.

@tkelman
Copy link
Contributor

tkelman commented Aug 31, 2016

@Sacha0
Copy link
Member Author

Sacha0 commented Aug 31, 2016

Would the sketch above be type unstable? Would a version parametric on the function type recover type stability? Best!

@stevengj
Copy link
Member

stevengj commented Aug 31, 2016

@Sacha0, the problem is that it assumes f is pure; I think it should be possible to make it type stable.

I guess we could use traits for this, but they wouldn't help much for anonymous functions that are constructed during loop fusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sparse Sparse arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants