Make var and std work for Vector{Vector{T}} #23897

andreasnoack · 2017-09-27T08:12:40Z

by removing Number restriction from some signatures as well as using broadcasting in std.

I'd also like to do something similar for cov but hit the issue that broadcast is not doing the right thing for Vector{Vector{T}}. E.g.

julia> x = [[2,4,6],[4,6,8]];

julia> x .- [1,1,1]
ERROR: DimensionMismatch("arrays could not be broadcast to a common size")
Stacktrace:
 [1] _bcs1 at ./broadcast.jl:69 [inlined]
 [2] _bcs at ./broadcast.jl:63 [inlined]
 [3] broadcast_shape at ./broadcast.jl:57 [inlined]
 [4] broadcast_indices at ./broadcast.jl:49 [inlined]
 [5] broadcast_c at ./broadcast.jl:310 [inlined]
 [6] broadcast(::Function, ::Array{Array{Int64,1},1}, ::Array{Int64,1}) at ./broadcast.jl:434

Do we have an issue for that?

Fixes #23884

…ction from some signatures as well as using broadcasting in std. Fixes #23884

DNF2 · 2017-09-27T09:03:24Z

Does this not go in the opposite direction to the strategy of not auto-vectorizing functions?

Why not just use: std.(x) and cov.(x)? It seems odd that std should support vectors of vectors, since v^2 and sqrt(v) makes no sense for vectors.

andreasnoack · 2017-09-27T09:12:27Z

Why not just use: std.(x) and cov.(x)?

It is not the same thing.

julia> x = [[2,4,6],[4,6,8]];

julia> var(x)
3-element Array{Float64,1}:
 2.0
 2.0
 2.0

julia> var.(x)
2-element Array{Float64,1}:
 4.0
 4.0

You can compute the variance of a vector. In Julia, we have decided that it is just the diagonal of the covariance matrix. The standard deviation is a bit tricky, though. It could either be the diagonal of the matrix square root of the covariance matrix or the element wise square root of the diagonal of the covariance matrix. This PR uses the latter definition.

mschauer · 2017-09-27T09:13:46Z

@DNF2 The empirical covariance is a function which maps a collection of equal-length vectors to a matrix (this involves the outer product v*v', not on v^2). In order to work on arrays it interprets the array as collection of vectors (therefore the vardim argument, as there are two possible interpretations of a matrix as vector of vectors). std.( ) does something entirely different (computing many standard deviations)

DNF2 · 2017-09-27T09:19:12Z

Hm. In #23884 I guess I misread the answer to

What do you expect to get?

probably because the example vectors were so similar. I thought the request was to auto-broadcast.

Thanks for the explanation.

mschauer · 2017-09-27T09:36:31Z

broadcast is not doing the right thing

Cases like this could be caught in the signature, right? E.g.

julia> Vector{Vector} <: S where {S<:AbstractVector{T}} where {T<:AbstractVector}
true

but maybe it is better to fix that at the root.

TotalVerb · 2017-09-28T16:18:45Z

@andreasnoack We do have an issue for it: #18379

andyferris · 2017-09-29T07:06:17Z

I have to admit, I find this a bit surprising. I'm trying to put my finger on it - but I think that it has something to do with the fact the the properties of Vectors as objects obey certain change-of-basis relationships. The examples above seem to be some kind of automatic element wise thing (no, not var.(...) but something else) and is definitely basis dependent.

If you were to ask me what I thought the "variance" of a set of vectors was, I would answer the covariance or it's determinant or the RMS L2 distance from the mean, not some element wise calculation. The covariance works generically with number as sum(v*v' for v in vecs) and last one like sum(v'*v for v in veces) (ignoring mean).

mschauer · 2017-09-29T07:28:05Z

Returning the "diagonal of the covariance matrix" is basically component-wise application of the var function. So that is a case of automatic broadcasting which is not easily expressed in dot operations for Vector{Vector}s (also not easily expressed in other implicit form, e.g. var.(collect(zip(vectors...))) is not so nice)

andreasnoack · 2017-09-29T07:28:20Z

I'd also prefer the covariance matrix but we have kind of already made the decision since

julia> X = randn(10,2);

julia> var(X, 1)
1×2 Array{Float64,2}:
 0.69424  0.894716

whereas

julia> cov(X, 1)
2×2 Array{Float64,2}:
  0.69424    -0.0762704
 -0.0762704   0.894716

bramtayl · 2017-09-29T22:42:03Z

This seems like a job for a zip-like iterator.

andreasnoack · 2017-10-02T09:53:26Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

nanosoldier · 2017-10-02T13:57:42Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

mbauman · 2018-02-22T23:54:23Z

I just discovered this — it seems quite strange to me. How are we defining var? Is it not just sum(abs2, v - mean(v)) / (length(v) - 1)? Why are we (effectively) defining abs2(v::Vector) = abs2.(v)?

mbauman · 2018-02-23T16:48:43Z

Ok, after discussing this more with Andreas, I'm on board with the behaviors here… but we might want to look into updating the docstring… and possibly moving away from broadcasting. Basically, cov([[1,2,3],[4,5,6]]) is a well-defined operation based upon multiplication of the transpose — that works just fine for vectors of vectors. And the variance can be defined as the diagonal of the covariance matrix.

Moreover, this is an operation that is good to support because we implicitly do it with var(X, 1) — that's effectively var([X[:,1],X[:,2],…]).

Using broadcasting here still tastes a little funny to me, because it is effectively imposing diag(X*X') == abs2.(X) for all types. If a collection were to support broadcasting and addition/subtraction (but not multiplication), should we still use this same definition?

Make var and std work for Vector{Vector{T}} by removing Number restri…

db44347

…ction from some signatures as well as using broadcasting in std. Fixes #23884

andreasnoack mentioned this pull request Sep 27, 2017

std() and var() do not work on array of arrays while mean() does #23884

Closed

Make cov work for Vector{Vector}

ebda0b2

andreasnoack force-pushed the anj/var branch from 9bdbd57 to ebda0b2 Compare September 27, 2017 20:50

andreasnoack requested a review from simonster September 28, 2017 07:24

kshyatt added the maths Mathematical functions label Sep 29, 2017

andreasnoack merged commit b10833b into master Oct 2, 2017

andreasnoack deleted the anj/var branch October 2, 2017 14:59

yha mentioned this pull request Aug 19, 2020

var of empty vector of arrays should be vector of NaNs JuliaStats/Statistics.jl#51

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make var and std work for Vector{Vector{T}} #23897

Make var and std work for Vector{Vector{T}} #23897

andreasnoack commented Sep 27, 2017

DNF2 commented Sep 27, 2017

andreasnoack commented Sep 27, 2017

mschauer commented Sep 27, 2017

DNF2 commented Sep 27, 2017 •

edited

Loading

mschauer commented Sep 27, 2017

TotalVerb commented Sep 28, 2017

andyferris commented Sep 29, 2017

mschauer commented Sep 29, 2017

andreasnoack commented Sep 29, 2017 •

edited

Loading

bramtayl commented Sep 29, 2017

andreasnoack commented Oct 2, 2017

nanosoldier commented Oct 2, 2017

mbauman commented Feb 22, 2018 •

edited

Loading

mbauman commented Feb 23, 2018

Make var and std work for Vector{Vector{T}} #23897

Make var and std work for Vector{Vector{T}} #23897

Conversation

andreasnoack commented Sep 27, 2017

DNF2 commented Sep 27, 2017

andreasnoack commented Sep 27, 2017

mschauer commented Sep 27, 2017

DNF2 commented Sep 27, 2017 • edited Loading

mschauer commented Sep 27, 2017

TotalVerb commented Sep 28, 2017

andyferris commented Sep 29, 2017

mschauer commented Sep 29, 2017

andreasnoack commented Sep 29, 2017 • edited Loading

bramtayl commented Sep 29, 2017

andreasnoack commented Oct 2, 2017

nanosoldier commented Oct 2, 2017

mbauman commented Feb 22, 2018 • edited Loading

mbauman commented Feb 23, 2018

DNF2 commented Sep 27, 2017 •

edited

Loading

andreasnoack commented Sep 29, 2017 •

edited

Loading

mbauman commented Feb 22, 2018 •

edited

Loading