-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fit mle weighted laplace #1676
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportBase: 83.60% // Head: 85.60% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #1676 +/- ##
==========================================
+ Coverage 83.60% 85.60% +1.99%
==========================================
Files 130 130
Lines 6642 8189 +1547
==========================================
+ Hits 5553 7010 +1457
- Misses 1089 1179 +90
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
function fit_mle(::Type{<:Laplace}, x::AbstractArray{<:Real}, w::AbstractArray{<:Real}) | ||
n = length(x) | ||
if n != length(w) | ||
throw(DimensionMismatch("Inconsistent array lengths.")) | ||
end | ||
xc = similar(x) | ||
copyto!(xc, x) | ||
m = median(xc, weights(w)) # https://github.com/JuliaStats/StatsBase.jl/blob/0b64a9c8a16355da16469d0fe5016e0fdf099af5/src/weights.jl#L788 | ||
xc .= abs.(x .- m) | ||
return Laplace(m, mean(xc, weights(w))) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I've taken the opportunity to remove an unneeded allocation; there's no need to create an extra array for the centered variables.
function fit_mle(::Type{<:Laplace}, x::AbstractArray{<:Real}, w::AbstractArray{<:Real}) | |
n = length(x) | |
if n != length(w) | |
throw(DimensionMismatch("Inconsistent array lengths.")) | |
end | |
xc = similar(x) | |
copyto!(xc, x) | |
m = median(xc, weights(w)) # https://github.com/JuliaStats/StatsBase.jl/blob/0b64a9c8a16355da16469d0fe5016e0fdf099af5/src/weights.jl#L788 | |
xc .= abs.(x .- m) | |
return Laplace(m, mean(xc, weights(w))) | |
end | |
function fit_mle(::Type{<:Laplace}, data::AbstractArray{<:Real}) | |
med = median(data) | |
# TODO: is it possible to do a single-pass median and mean absolute deviation? | |
mean_abs_deviation = mean(data) do x | |
return abs(x - med) | |
end | |
return Laplace(med, mean_abs_dev) | |
end | |
function fit_mle(::Type{<:Laplace}, data::AbstractArray{<:Real}, w::AbstractWeights) | |
if length(data) != length(w) | |
throw(DimensionMismatch("Inconsistent array lengths.")) | |
end | |
# TODO: can this be done in a single-pass? | |
# code for weighted median can be found at: | |
# https://github.com/JuliaStats/StatsBase.jl/blob/0b64a9c8a16355da16469d0fe5016e0fdf099af5/src/weights.jl#L788 | |
med = median(data, weights(w)) | |
mean_abs_deviation = mean(zip(data, weights)) / sum(w) do x, weight | |
return weight * abs(x - med) | |
end | |
return Laplace(med, mean_abs_deviation) | |
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggestion breaks the non-weighted implementation without improving its allocations (I don't think the non-weighted algorithm can be improved allocationwise).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggestion breaks the non-weighted implementation without improving its allocations
I made a mistake earlier. Not sure if it works now, but @dmetivie can double-check hopefully. (Is there an easy way to run tests on suggestions?)
I don't think the non-weighted algorithm can be improved allocationwise
The code for both algorithms is the same. But in any case, there's definitely no need to copy an array just to center it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in any case, there's definitely no need to copy an array just to center it.
The code on master seems to be as efficient as possible: The copy is used by the median calculations and then only reused by the centering operation. Note that the code uses median!
- if you switch to median
, a copy is created internally and discarded afterwards. So IMO the current code should not be changed.
dw = fit(dist, x, w) | ||
@test isa(dw, dist) | ||
@test isapprox(location(dw), 5.0, atol=0.02) | ||
@test isapprox(scale(dw), 3.0, atol=0.03) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you test the weighted method gives the same answers as the unweighted method when w
is a set of unit weights?
@test isa(d, dist) | ||
@test isapprox(location(d), 5.0, atol=0.02) | ||
@test isapprox(scale(d) , 3.0, atol=0.03) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test for the case where the weights don't sum to 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And ideally one where the weights sum to 0, making sure we get either a NaN or an error?)
Thanks for the review, I'll do your suggestion and test, when I find some time.
I was going to write down the proof because I don't see it written explicitly anywhere. |
Yeah, I think it's a well-known enough result that we can use it. (The proof should be identical to the one for the unweighted case). |
I added
fit_mle(Laplace, x, w)
method using the weighted median plus a test in the test section of Laplace fit.