Deprecate broadcast behaviour of `+` in favour of adding identity matrices #22880

dalum · 2017-07-20T03:33:53Z

I understand this has been discussed and experimented with before and people found that it was really annoying when + didn't automatically broadcast. But @StefanKarpinski suggested I opened an issue for this to describe the rationale for bringing it up again.

Currently, you can write a generic polynomial function that "almost just works" by using I in the following pattern:

f(x, y) = x*y*I + x*I + y*I

The almost here refers to the fact that this will return a UniformScaling if all of the inputs are scalar. With the deprecation of +(::UniformScalar, ::Number) this pattern is broken. After the deprecation, you must now explicitly use one in defining the polynomial to ensure that x and y have similar dimensionality:

f(x, y) = x * one(y) + y * one(x)

But this becomes really messy as soon as you have more than two variables in an expression:

f(x, y, z) = x * one(y) * one(z) + y * one(x) * one(z) + z * one(x) * one(y)

which explodes as you go to higher order. The solution for this is to drop the broadcasting behaviour and define:

+(x::AbstractArray, y::Number) = x .+ y .* one(x)
+(x::Number, y::AbstractArray) = x .* one(y) .+ y

or possibly even:

+(x::AbstractArray{<:T}, y::T) where T = x .+ y .* one(x)
+(x::T, y::AbstractArray{<:T}) where T = x .* one(y) .+ y

I think the argument that automatic broadcasting happens in other languages is a really good argument for keeping it as it is. But I feel that with dot-notation, we can have the best of both worlds now, since you can define a polynomial like:

f(x, y) = x * y + x + y + 1

where you get the standard or broadcast behaviour by calling either f(A, B) or f.(A, B).

And if you are interested in a personal anecdote, then I spent a whole day (possibly two) debugging my code two months ago in frustration because I was inverting an array of the form: (H - E)^-1 and couldn't figure out why I got unphysical results. Spoiler: E was a scalar and I was expecting it to subtract from the diagonal.

The text was updated successfully, but these errors were encountered:

andyferris · 2017-07-20T04:44:59Z

So I'm not clear what problem we are solving with + as it currently stands, though I feel it does cause problems. What I see is that if I have:

f(x,y) = x*y + x + y + 1

with the current behavior I would call f(x,y) for scalars (though f.(x,y) happens to work), f.(x,y) for vectors would do an elementwise version, and for square matrices f(x,y) doesn't do what a polynomial equation should do (though f.(x,y) gives the element-wise version).

Basically I feel that allowing array + scalar do automatic broadcasting can be a bit dangerous because it's misleading in the square matrix case (such as the OP where E was scalar). It also happens to be the last auto-vectorized function that I'm aware of, which makes it a rather special case. Please note that the only reason for defining +(x::AbstractArray, y::AbstractArray) is because it makes sense for linear algebra (x and y live in a vector space) otherwise we would have to require x .+ y for element-wise addition. Similarly, * for arrays and array-scalar combinations is motivated by linear algebra. So I'm really not sure why matrix + scalar would do anything inconsistent with linear algebra... and thus in favor of this change (or else making scalar + array an error with a nice message to use .+).

andreasnoack · 2017-09-28T19:24:38Z

Half of this issue is already fixed with #22932.

With #23923, it would again be possible to use I for the second part of the proposal instead of using 1. I feel that it might be a reasonable compromise to require I here since it will be generic (work for Numbers as well) while some users might get tripped by A + 1 being the same as A + I instead of throwing.

andyferris · 2017-09-29T05:20:21Z

I feel we should aim to support the distributive law matrix * vector + scalar * vector == (matrix + scalar) * vector in v1.0 because AFAICT a lot of the confusion around the previous behaviour relates to this.

StefanKarpinski · 2017-09-29T16:44:44Z

While I agree with that in principle, I fear that the behavior of other systems (Python, R, Matlab) here will make this quite dangerous. People will write A + 1 expecting it to do what A .+ 1 does and it will silently do something very different – i.e. add one to the diagonal of A. The safe option here is to make matrix + scalar an error. The distributive law should hold for uniform scaling objects, however: matrix * vector + scalar * I * vector == (matrix + scalar * I) * vector.

dalum · 2017-09-30T08:58:17Z

I'm sorry if I'm being too greedy, but I'm not really sure that it is solved by the reintroduction of I + scalar. Generic polynomials of multiple variables now require all scalars to be multiplied by I before being input. I. e., f(x, y) = x + y will not work for f(matrix, scalar). It could be made to work if written as f(x, y) = x*I + y*I, but then f(scalar, scalar) returns a UniformScaling.

This gets even trickier when you have a function such as f(x, y) = exp(x) + y. Right now, exp(::UniformScaling) isn't defined. This is arguably a bug, but then every valid matrix function defined for scalars must also be defined for UniformScaling.

Note also that distributivity does not hold for UniformScaling: (matrix + scalar) * I != matrix * I + scalar * I.

Finally, if matrix + scalar is supported, then perhaps UniformScaling could be deprecated. 😼

mschauer · 2017-09-30T10:13:17Z

Edit: Rereading the issue I now see with @andreasnoack that UniformScalings have the right properties except x I + y I = x*y^0 + y*x^0.

StefanKarpinski · 2017-09-30T17:21:09Z

@eveydee, The direction you're moving this is very much towards scalars simply being what UniformScaling was introduced as. We should stop and consider if that's where we want to go. I'm not saying it's not, but we should certainly consider it. And if we do that, it should be one of the first warnings we give to people coming from other systems: in Julia, matrix + scalar adds scalar to the diagonal of matrix and only works for square matrices.

dalum · 2017-10-01T03:27:51Z

@StefanKarpinski You are right, and I am not sure it is the right way to go. The problem for me is: how to write code that works for generic linear algebra objects. That is, how do you translate from equations that you derive on a whiteboard to code implemented in Julia. As I see it, there are two possible routes to being able to write fully generic code, which has simple-to-follow rules, none of which are currently satisfied:

UniformScaling is the scaling operator of the linear algebra world. In this case, the fact that scalar * matrix and scalar * vector works is just for convenience in special cases, and the rule of thumb is that for linear algebra applications, all scalars should be multiplied by I.
Scalars are also scaling operators in the linear algebra world. In this case, UniformScaling serves no purpose and can be deprecated.

The first one is annoying for people like me, who primarily use Julia to do linear algebra (arguably a minority in the Julia community), but the inconvenience of suffixing every scalar literal constant by I is probably something that goes away, when it becomes a habit, and is infinitely preferable to manually figuring out the sizes of the identity matrices as is the case in most other languages. The bigger problem is ensuring that it is consistent across objects. For instance, all functions defined for AbstractMatrix that you might use in an equation must also be defined for UniformScaling (exp, log, sqrt, sin, etc.). Also, functions which are rooted in linear algebra, like norm, trace, etc. should probably be changed to return a UniformScaling object. In my opinion, the complexity here gets out of hand quite quickly.

The second one has the potential to be a source of confusion and bugs for people with Matlab or Python muscle-memory who use arrays for non-linear algebra applications. However, all it requires is defining scalar + matrix so the complexity is low, and UniformScaling becomes obsolete.

oscardssmith · 2017-10-01T05:26:15Z

What about having @Ior something thing similar that could be used like @. to add lots of multiplication by Identity matrices to make stuff work, while preventing people from hitting it by accident?

StefanKarpinski · 2017-10-02T17:26:55Z

However, all it requires is defining scalar + matrix so the complexity is low, and UniformScaling becomes obsolete.

This is not entirely true since one can also do neat things like this with I currently:

julia> A = rand(3, 2)
3×2 Array{Float64,2}:
 0.777236  0.80784
 0.46201   0.548122
 0.447051  0.407108

julia> [A I; I A]
6×5 Array{Float64,2}:
 0.777236  0.80784   1.0  0.0       0.0
 0.46201   0.548122  0.0  1.0       0.0
 0.447051  0.407108  0.0  0.0       1.0
 1.0       0.0       0.0  0.777236  0.80784
 0.0       1.0       0.0  0.46201   0.548122
 0.0       0.0       1.0  0.447051  0.407108

julia> [A I; I A']
5×5 Array{Float64,2}:
 0.777236  0.80784   1.0       0.0       0.0
 0.46201   0.548122  0.0       1.0       0.0
 0.447051  0.407108  0.0       0.0       1.0
 1.0       0.0       0.777236  0.46201   0.447051
 0.0       1.0       0.80784   0.548122  0.407108

julia> [A 0I; 0I A]
6×5 Array{Float64,2}:
 0.777236  0.80784   0.0  0.0       0.0
 0.46201   0.548122  0.0  0.0       0.0
 0.447051  0.407108  0.0  0.0       0.0
 0.0       0.0       0.0  0.777236  0.80784
 0.0       0.0       0.0  0.46201   0.548122
 0.0       0.0       0.0  0.447051  0.407108

If we deprecated UniformScaling then either this shorthand would go away or we would write it with scalars, like [A 1; 1 A] and [A 0; 0 A]. That's pretty handy but may be a step too far, especially since [A 1; 1 A] looks like it should, if anything, fill with 1s not identity blocks.

KristofferC · 2017-11-06T17:07:38Z

I'm having trouble distilling what exactly this issue is about.

Is it to define +(x::AbstractArray, y::Number) = x .+ y .* one(x)? It seems that this will not happen. So in that case, can this be closed?

andyferris · 2017-11-06T23:01:54Z

My take is that it is to allow things like Taylor expansions, e.g. exp_taylor3(x) = 1 + x + x*x/2 + x*x*x/6, work on square matrices as well as numbers. One could currently use exp_taylor3(x) = one(x) + x + x*x/2 + x*x*x/6.

We've already deprecated array + scalar for v0.7.

There's also the issue of the distributive law which should read (matrix + scalar)*vector = matrix*vector + scalar*vector - which implies matrix + scalar == matrix + scalar*I. Personally, I strongly feel that for our built-in math types we should strive support what are standard and correct associative and distributive laws, in order to allow for expressive and generic code.

StefanKarpinski · 2017-11-20T19:04:17Z

It seems that the breaking part of this change has already happened. Whether to allow square matrix + scalar to mean what square matrix + scalar*I currently means in the future is not a breaking change given that we've already disallowed that operation.

andyferris · 2017-11-20T23:24:12Z

Hmm? I plan to make a v1.0 PR as soon as a v1.0 branch opens and the v0.7 deprecations are removed. (Probably not a change that would block releasing v1.0, so perhaps this makes sense).

StefanKarpinski · 2017-11-21T14:50:42Z

Honestly, I would let things settle for a while before introducing square matrix + scalar doing the same thing as square matrix + scalar*I. Sure, we might end up with largely vestigial I but there's no need to rush this as long as we've got the necessary deprecations/deletions into 0.7/1.0.

andyferris · 2018-11-10T05:29:10Z

there's no need to rush this as long as we've got the necessary deprecations/deletions into 0.7

I suppose the same argument could be said to apply to v1.1... But at what point to you draw the line?

vtjnash · 2021-04-12T16:59:17Z

As @andreasnoack mentioned, with #23923 merged, this seems to work pretty well now. We might add #29951, but there seemed to be a consensus against taking that step.

ararslan added linear algebra Linear algebra speculative Whether the change will be implemented is speculative labels Jul 20, 2017

JeffBezanson added the broadcast Applying a function over a collection label Jul 20, 2017

StefanKarpinski added this to the 1.0 milestone Jul 20, 2017

fredrikekre mentioned this issue Jul 24, 2017

Deprecate +/- methods for array+scalar etc #22932

Merged

JeffBezanson assigned andyferris Aug 10, 2017

JeffBezanson mentioned this issue Aug 15, 2017

deprecate expm in favor of exp #23233

Merged

fredrikekre mentioned this issue Sep 28, 2017

Undeprecate +/- for UniformScaling and Number #23923

Merged

StefanKarpinski added triage This should be discussed on a triage call and removed triage This should be discussed on a triage call labels Nov 20, 2017

StefanKarpinski modified the milestones: 1.0, 1.x Nov 20, 2017

cormullion mentioned this issue Mar 18, 2018

NEWS.md is getting a bit untidy #26508

Closed

dalum mentioned this issue Nov 7, 2018

Add +/- for Number and AbstractMatrix to LinearAlgebra #29951

Closed

vtjnash closed this as completed Apr 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecate broadcast behaviour of `+` in favour of adding identity matrices #22880

Deprecate broadcast behaviour of `+` in favour of adding identity matrices #22880

dalum commented Jul 20, 2017 •

edited

Loading

andyferris commented Jul 20, 2017 •

edited

Loading

andreasnoack commented Sep 28, 2017

andyferris commented Sep 29, 2017

StefanKarpinski commented Sep 29, 2017 •

edited

Loading

dalum commented Sep 30, 2017 •

edited

Loading

mschauer commented Sep 30, 2017 •

edited

Loading

StefanKarpinski commented Sep 30, 2017

dalum commented Oct 1, 2017 •

edited

Loading

oscardssmith commented Oct 1, 2017

StefanKarpinski commented Oct 2, 2017

KristofferC commented Nov 6, 2017

andyferris commented Nov 6, 2017

StefanKarpinski commented Nov 20, 2017

andyferris commented Nov 20, 2017 •

edited

Loading

StefanKarpinski commented Nov 21, 2017

andyferris commented Nov 10, 2018

vtjnash commented Apr 12, 2021

Deprecate broadcast behaviour of + in favour of adding identity matrices #22880

Deprecate broadcast behaviour of + in favour of adding identity matrices #22880

Comments

dalum commented Jul 20, 2017 • edited Loading

andyferris commented Jul 20, 2017 • edited Loading

andreasnoack commented Sep 28, 2017

andyferris commented Sep 29, 2017

StefanKarpinski commented Sep 29, 2017 • edited Loading

dalum commented Sep 30, 2017 • edited Loading

mschauer commented Sep 30, 2017 • edited Loading

StefanKarpinski commented Sep 30, 2017

dalum commented Oct 1, 2017 • edited Loading

oscardssmith commented Oct 1, 2017

StefanKarpinski commented Oct 2, 2017

KristofferC commented Nov 6, 2017

andyferris commented Nov 6, 2017

StefanKarpinski commented Nov 20, 2017

andyferris commented Nov 20, 2017 • edited Loading

StefanKarpinski commented Nov 21, 2017

andyferris commented Nov 10, 2018

vtjnash commented Apr 12, 2021

Deprecate broadcast behaviour of `+` in favour of adding identity matrices #22880

Deprecate broadcast behaviour of `+` in favour of adding identity matrices #22880

dalum commented Jul 20, 2017 •

edited

Loading

andyferris commented Jul 20, 2017 •

edited

Loading

StefanKarpinski commented Sep 29, 2017 •

edited

Loading

dalum commented Sep 30, 2017 •

edited

Loading

mschauer commented Sep 30, 2017 •

edited

Loading

dalum commented Oct 1, 2017 •

edited

Loading

andyferris commented Nov 20, 2017 •

edited

Loading