-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New distribution: Maximum #1655
Conversation
Codecov ReportBase: 85.52% // Head: 85.46% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #1655 +/- ##
==========================================
- Coverage 85.52% 85.46% -0.07%
==========================================
Files 130 131 +1
Lines 8153 8165 +12
==========================================
+ Hits 6973 6978 +5
- Misses 1180 1187 +7
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
I like the idea of having such a distribution. I imagine though that users would find an analogous |
Thank you for your interest, @sethaxen! I agree that |
They seem complicated, but they're much simpler when written as a function of the PDF and CDF of the binomial distribution (Edit: probably need to double-check that the pdf works even when function orderstat_pdf(dist, n, r, x)
d = Binomial(n - 1, cdf(dist, x))
return n * pdf(d, r - 1) * pdf(dist, x)
end
function orderstat_cdf(dist, n, r, x)
d = Binomial(n, cdf(dist, x))
return cdf(d, n) - cdf(d, r - 1)
end
Several other distributions in this package don't have closed-form expressions for the quantile function e.g. Distributions.jl/src/mixtures/mixturemodel.jl Lines 444 to 448 in 432a7f9
There are several quantile algorithms in https://github.com/JuliaStats/Distributions.jl/blob/master/src/quantilealgs.jl that can be used here. There's likely a way to set absolute lower and upper bounds for the quantile for quantile_bisect (Edit: you could use the quantiles for the minimum and maximum as the extrema, but maybe tighter bounds are possible).
|
Hi @Vilin97 are you interested in tackling this more general |
I'm not, sorry! I understand the desire for more generality but in this case it will come at the expense of less certainty, e.g. with quantiles. I will leave the implementation of orderStatistics to someone else. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On further thought, even if we have an OrderStatistic
distribution, it makes sense to have independent Maximum
and Minimum
distributions. I'll tackle the other two once this is finished, but I have some suggestions:
@@ -0,0 +1,25 @@ | |||
""" | |||
The maximum of n iid random variables with continuous univariate distribution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any reason to limit this to continuous distributions, since it's useful for discrete ones as well. Could you then move maximum.jl
to be in the univariate/
directory?
Also, could you expand this docstring similar to others in this package, e.g. Binomial
?
""" | ||
The maximum of n iid random variables with continuous univariate distribution | ||
""" | ||
struct Maximum{D<:ContinuousUnivariateDistribution} <: ContinuousUnivariateDistribution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct Maximum{D<:ContinuousUnivariateDistribution} <: ContinuousUnivariateDistribution | |
struct Maximum{D<:UnivariateDistribution,S<:ValueSupport} <: UnivariateDistribution{S} |
struct Maximum{D<:ContinuousUnivariateDistribution} <: ContinuousUnivariateDistribution | ||
dist::D | ||
n::Int | ||
Maximum{D}(dist, n) where {D<:ContinuousUnivariateDistribution} = new{D}(dist, n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maximum{D}(dist, n) where {D<:ContinuousUnivariateDistribution} = new{D}(dist, n) | |
function Maximum{D}(dist, n) where {D<:UnivariateDistribution} | |
new{D,value_support(D)}(dist, n) | |
end |
Maximum{D}(dist, n) where {D<:ContinuousUnivariateDistribution} = new{D}(dist, n) | ||
end | ||
|
||
function Maximum(dist::D, n::Integer; check_args::Bool=true) where {D <: ContinuousUnivariateDistribution} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function Maximum(dist::D, n::Integer; check_args::Bool=true) where {D <: ContinuousUnivariateDistribution} | |
function Maximum(dist::D, n::Integer; check_args::Bool=true) where {D <: UnivariateDistribution} |
return Maximum{D}(dist, n) | ||
end | ||
|
||
rand(rng::AbstractRNG, d::Maximum{D}) where {D} = maximum([rand(rng, d.dist) for _ in 1:d.n]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the D
keyword since we don't use it. Also, for continuous d
we can use that Maximum(Uniform(0, 1), n)
is the same as Beta(n, 1)
to sample more efficiently. Might be able to do something similar in the discrete case, but we can at least use an iterator to avoid allocating a large array for large n
:
rand(rng::AbstractRNG, d::Maximum{D}) where {D} = maximum([rand(rng, d.dist) for _ in 1:d.n]) | |
rand(rng::AbstractRNG, d::Maximum) = maximum(rand(rng, d.dist) for _ in 1:d.n) | |
function rand(rng::AbstractRNG, d::Maximum{<:ContinuousUnivariateDistribution}) | |
return quantile(d.dist, rand(rng, Beta(d.n, 1))) | |
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even better, since rand(Beta(n, 1)
is equivalent to rand()^(1//n)
rand(rng::AbstractRNG, d::Maximum{D}) where {D} = maximum([rand(rng, d.dist) for _ in 1:d.n]) | |
rand(rng::AbstractRNG, d::Maximum) = maximum(rand(rng, d.dist) for _ in 1:d.n) | |
function rand(rng::AbstractRNG, d::Maximum{<:ContinuousUnivariateDistribution}) | |
return quantile(d.dist, rand(rng)^(1//d.n)) | |
end |
|
||
#### Evaluation | ||
|
||
cdf(d::Maximum{D}, x::Real) where {D} = cdf(d.dist, x)^d.n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for D
cdf(d::Maximum{D}, x::Real) where {D} = cdf(d.dist, x)^d.n | |
cdf(d::Maximum, x::Real) = cdf(d.dist, x)^d.n |
#### Evaluation | ||
|
||
cdf(d::Maximum{D}, x::Real) where {D} = cdf(d.dist, x)^d.n | ||
pdf(d::Maximum{D}, x::Real) where {D} = d.n*pdf(d.dist, x)*cdf(d.dist, x)^(d.n-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for D
, and we can implement pdf
for the discrete case as cdf(d, x) - cdf(d, xminus)
where xminus
is the largest element in the support less than x
:
pdf(d::Maximum{D}, x::Real) where {D} = d.n*pdf(d.dist, x)*cdf(d.dist, x)^(d.n-1) | |
pdf(d::Maximum, x::Real) = d.n*pdf(d.dist, x)*cdf(d.dist, x)^(d.n-1) | |
function pdf(d::Maximum{<:DiscreteUnivariateDistribution}, x::Real) | |
p = cdf(d.dist, x) | |
n = d.n | |
return p^n - (p - pdf(d.dist, x))^n | |
end |
|
||
cdf(d::Maximum{D}, x::Real) where {D} = cdf(d.dist, x)^d.n | ||
pdf(d::Maximum{D}, x::Real) where {D} = d.n*pdf(d.dist, x)*cdf(d.dist, x)^(d.n-1) | ||
logpdf(d::Maximum{D}, x::Real) where {D} = log(d.n)+logpdf(d.dist, x)+(d.n-1)*logcdf(d.dist, x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If x
is e.g. a Float32
, taking log(d.n)
will promote it to a Float64
. We can also support discrete case as above:
logpdf(d::Maximum{D}, x::Real) where {D} = log(d.n)+logpdf(d.dist, x)+(d.n-1)*logcdf(d.dist, x) | |
function logpdf(d::Maximum, x::Real) | |
n = d.n | |
dist = d.dist | |
lp = logpdf(dist, x)+(n-1)*logcdf(dist, x) | |
return lp + log(oftype(lp, n)) | |
end | |
function logpdf(d::Maximum{<:DiscreteUnivariateDistribution}, x::Real) | |
dist = d.dist | |
n = d.n | |
logp = logcdf(d.dist, x) | |
return n*logp + log1mexp(n*log1mexp(logpdf(d.dist, x) - logp)) | |
end |
minimum(d::Maximum{D}) where {D} = minimum(d.dist) | ||
maximum(d::Maximum{D}) where {D} = maximum(d.dist) | ||
insupport(d::Maximum{D}, x::Real) where {D} = insupport(d.dist, x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for D
:
minimum(d::Maximum{D}) where {D} = minimum(d.dist) | |
maximum(d::Maximum{D}) where {D} = maximum(d.dist) | |
insupport(d::Maximum{D}, x::Real) where {D} = insupport(d.dist, x) | |
minimum(d::Maximum) = minimum(d.dist) | |
maximum(d::Maximum) = maximum(d.dist) | |
insupport(d::Maximum, x::Real) = insupport(d.dist, x) |
cdf(d::Maximum{D}, x::Real) where {D} = cdf(d.dist, x)^d.n | ||
pdf(d::Maximum{D}, x::Real) where {D} = d.n*pdf(d.dist, x)*cdf(d.dist, x)^(d.n-1) | ||
logpdf(d::Maximum{D}, x::Real) where {D} = log(d.n)+logpdf(d.dist, x)+(d.n-1)*logcdf(d.dist, x) | ||
quantile(d::Maximum{D}, q::Real) where {D} = quantile(d.dist, q^(1/d.n)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for D
, and we can use //
to avoid unnecessarily promoting to a Float64
. Also, I'd have to think more about how if this needs to be changed to handle the discrete case.
quantile(d::Maximum{D}, q::Real) where {D} = quantile(d.dist, q^(1/d.n)) | |
quantile(d::Maximum, q::Real) = quantile(d.dist, q^(1//d.n)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems nothing special needs to be done, see #1655 (comment).
Interestingly, https://epubs.siam.org/doi/10.1137/1.9780898719062.ch4, they point out that the |
I'm not certain. I just benchmarked, and |
A bare bones implementation of the distribution of maximum of n iid random variables.