std() and var() do not work on array of arrays while mean() does #23884

stakaz · 2017-09-26T20:50:18Z

Hello, I wonder why I cannot use std() or var() on an array of arrays while I can do so for mean().

Consider this simple example:

julia> x = [[2,4,6],[4,6,8]]
2-element Array{Array{Int64,1},1}:
 [2, 4, 6]
 [4, 6, 8]

julia> mean(x)
3-element Array{Float64,1}:
 3.0
 5.0
 7.0

julia> std(x)
ERROR: MethodError: no method matching zero(::Type{Array{Int64,1}})
Closest candidates are:
  zero(::Type{Base.LibGit2.GitHash}) at libgit2/oid.jl:106
  zero(::Type{Base.Pkg.Resolve.VersionWeights.VWPreBuildItem}) at pkg/resolve/versionweight.jl:82
  zero(::Type{Base.Pkg.Resolve.VersionWeights.VWPreBuild}) at pkg/resolve/versionweight.jl:124
  ...
Stacktrace:                                                                                                                                                                                                       
 [1] #var#533(::Bool, ::Void, ::Function, ::Array{Array{Int64,1},1}) at ./statistics.jl:184                                                                                                                       
 [2] (::Base.#kw##var)(::Array{Any,1}, ::Base.#var, ::Array{Array{Int64,1},1}) at ./<missing>:0                                                                                                                   
 [3] std(::Array{Array{Int64,1},1}) at ./statistics.jl:244                                                                                                                                                        
 [4] macro expansion at ./REPL.jl:97 [inlined]                                                                                                                                                                    
 [5] (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:73

The version information is

Julia Version 0.6.0
Commit 903644385b* (2017-06-19 13:05 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Pentium(R) CPU  N3530  @ 2.16GHz
  WORD_SIZE: 64
  BLAS: libblas
  LAPACK: liblapack
  LIBM: libm
  LLVM: libLLVM-3.9.1 (ORCJIT, silvermont)

I find it strange that the one function is possible to give the intended outcome while the other end up with an error. Is this the desired behavior? Have I missed something?

I would say that exactly the same interpretation as for the mean() function should be used for std() and var(). I am aware of the possibility of a multidimensional array and direction argument to std() but this does not explain why it works out so nicely for mean().

I came across this while using DataFrames with arrays as elements of the DataArray but the problem seams to be much more general.

Maybe there are other ways to do this?

The text was updated successfully, but these errors were encountered:

andreasnoack · 2017-09-26T21:13:10Z

I find it strange that the one function is possible to give the intended outcome while the other end up with an error.

These functions have been written for Number types. mean is a much simpler function than var and therefore works by chance. It might not be too complicated to get var working though.

stakaz · 2017-09-26T21:19:25Z

I understand that mean is much simpler. However, when summation and division is possible on the elements of the array to "mean" over, the variance should also work without any troubles as is only uses summation and multiplication (which is possible, when division is possible).

In general, all such "over an array" statistical functions should work in the same way and should work when the described operations are possible on the elements.

andreasnoack · 2017-09-26T21:55:20Z

the variance should also work without any troubles

I think you are underestimating the challenges in writing generic code. It might be possible to get this working though. It depends on what you expect to get when computing std([[2,4,6],[4,6,8]]). What do you expect to get?

stakaz · 2017-09-26T22:07:23Z

I think you are underestimating the challenges in writing generic code.

This is certainly true.

What do you expect to get?

std([[2,4,6],[4,6,8]]) = [std([2,4]), std([4,6]), std([6,8])]

so basically I would expect a point wise calculation on every position of the inner elements (arrays in this case but could of cause be matrices as well) "averaging" over all outer array positions. Sounds difficult but I hope the idea is clear.

stakaz · 2017-09-26T22:25:21Z

Or in formulas and julia notation:

x = [x1, x2, ..., xN]
x1 = [2, 4, 6]
x2 = [4, 6, 8]
...
xN = ...

mean(x) = 1 / N * (x1 .+ x2 .+ ... .+ xN)
var(x) = [mean(x1.^2), mean(x2.^2), ..., mean(xN.^2)] .- mean(x).^2 # probably with some correction factors for unbiased version
std(x) = sqrt.(var(x))

…ction from some signatures as well as using broadcasting in std. Fixes #23884

stakaz · 2017-09-27T08:13:19Z

I have written a first "workaround" (it works good but of course does not handle any errors or so). Maybe this can help to develop a generic code with this functionality.

It does not recognize DataArrays.DataArray as an AbstractArray, I don't know why. So for now a wrapper must be used as well where Array(dataframe[:somecolumn]) is passed instead of dataframe[:somecolumn] directly.

The code:

import Base.std
function std(v::AbstractArray{V,1};corrected=true) where {V<:AbstractArray{T,N} where {T,N}}
	return sqrt.(var(v;corrected=corrected))
end

import Base.var
function var(v::AbstractArray{V,1};corrected=true) where {V<:AbstractArray{T,N} where {T,N}}
	return sum(abs2,[i - mean(v) for i in v]) / (length(v) - Int(corrected))
end

## arrays for which the var/std should be calculated
x1_test = [1,3,5]
x2_test = [6,7,8]
x3_test = [3,4,7]
x4_test = [-1,0,1]

## but now orderd in an array of arrays/matrices
x = [[1,6,3,-1],[3,7,4,0],[5,8,7,1]]
M = [[1 6; 3 -1], [3 7; 4 0], [5 8; 7 1]]

## just some output

println("reference: \n")
println("\ntrue uncorrected variances")
println(var(x1_test;corrected = false))
println(var(x2_test;corrected = false))
println(var(x3_test;corrected = false))
println(var(x4_test;corrected = false))

println("\ntrue corrected variances")
println(var(x1_test;corrected = true))
println(var(x2_test;corrected = true))
println(var(x3_test;corrected = true))
println(var(x4_test;corrected = true))

println("\ntrue uncorrected standard deviations")
println(std(x1_test;corrected = false))
println(std(x2_test;corrected = false))
println(std(x3_test;corrected = false))
println(std(x4_test;corrected = false))

println("\ntrue corrected standard deviations")
println(std(x1_test;corrected = true))
println(std(x2_test;corrected = true))
println(std(x3_test;corrected = true))
println(std(x4_test;corrected = true))

println("\nnew function: \n")
println("\nuncorrected variances")
println(var(x;corrected=false))
println("\ncorrected variances")
println(var(x;corrected=true))
println("\nuncorrected standard deviations")
println(std(x;corrected=false))
println("\ncorrected standard deviations")
println(std(x;corrected=true))

println("\nnew function with array of matrices")
println("\nuncorrected variances")
println(var(M;corrected=false))
println("\ncorrected variances")
println(var(M;corrected=true))
println("\nuncorrected standard deviations")
println(std(M;corrected=false))
println("\ncorrected standard deviations")
println(std(M;corrected=true))

and the output:

reference: 


true uncorrected variances
2.6666666666666665
0.6666666666666666
2.888888888888889
0.6666666666666666

true corrected variances
4.0
1.0
4.333333333333333
1.0

true uncorrected standard deviations
1.632993161855452
0.816496580927726
1.699673171197595
0.816496580927726

true corrected standard deviations
2.0
1.0
2.0816659994661326
1.0

new function: 


uncorrected variances
[2.66667, 0.666667, 2.88889, 0.666667]

corrected variances
[4.0, 1.0, 4.33333, 1.0]

uncorrected standard deviations
[1.63299, 0.816497, 1.69967, 0.816497]

corrected standard deviations
[2.0, 1.0, 2.08167, 1.0]

new function with array of matrices

uncorrected variances
[2.66667 0.666667; 2.88889 0.666667]

corrected variances
[4.0 1.0; 4.33333 1.0]

uncorrected standard deviations
[1.63299 0.816497; 1.69967 0.816497]

corrected standard deviations
[2.0 1.0; 2.08167 1.0]

andreasnoack · 2017-09-27T08:15:56Z

See #23897 which I opened just a minute before you posted this.

stakaz · 2017-09-27T08:20:55Z

Nice ;) Well, here, just for completeness my wrapper for DataFrames.

function std(v::DataArrays.DataArray{V,1};corrected=true) where {V<:AbstractArray{T,N} where {T,N}}
	return std(Array(v);corrected=corrected)
end

function var(v::DataArrays.DataArray{V,1};corrected=true) where {V<:AbstractArray{T,N} where {T,N}}
	return var(Array(v);corrected=corrected)
end

stakaz · 2017-09-27T08:24:11Z

Thank you for the quick work. I hope that this commit will be merged soon ;)

* Make var and std work for Vector{Vector{T}} by removing Number restriction from some signatures as well as using broadcasting in std. Fixes #23884 * Make cov work for Vector{Vector}

andreasnoack added a commit that referenced this issue Sep 27, 2017

Make var and std work for Vector{Vector{T}} by removing Number restri…

db44347

…ction from some signatures as well as using broadcasting in std. Fixes #23884

andreasnoack mentioned this issue Sep 27, 2017

Make var and std work for Vector{Vector{T}} #23897

Merged

kshyatt added the maths Mathematical functions label Sep 29, 2017

andreasnoack closed this as completed in #23897 Oct 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

std() and var() do not work on array of arrays while mean() does #23884

std() and var() do not work on array of arrays while mean() does #23884

stakaz commented Sep 26, 2017

andreasnoack commented Sep 26, 2017

stakaz commented Sep 26, 2017

andreasnoack commented Sep 26, 2017

stakaz commented Sep 26, 2017 •

edited

Loading

stakaz commented Sep 26, 2017 •

edited

Loading

stakaz commented Sep 27, 2017

andreasnoack commented Sep 27, 2017

stakaz commented Sep 27, 2017

stakaz commented Sep 27, 2017

std() and var() do not work on array of arrays while mean() does #23884

std() and var() do not work on array of arrays while mean() does #23884

Comments

stakaz commented Sep 26, 2017

andreasnoack commented Sep 26, 2017

stakaz commented Sep 26, 2017

andreasnoack commented Sep 26, 2017

stakaz commented Sep 26, 2017 • edited Loading

stakaz commented Sep 26, 2017 • edited Loading

stakaz commented Sep 27, 2017

andreasnoack commented Sep 27, 2017

stakaz commented Sep 27, 2017

stakaz commented Sep 27, 2017

stakaz commented Sep 26, 2017 •

edited

Loading

stakaz commented Sep 26, 2017 •

edited

Loading