New distribution: Maximum #1655

Vilin97 · 2022-12-28T01:03:59Z

A bare bones implementation of the distribution of maximum of n iid random variables.

codecov-commenter · 2022-12-28T01:25:37Z

Codecov Report

Base: 85.52% // Head: 85.46% // Decreases project coverage by -0.06% ⚠️

Coverage data is based on head (71dd3cc) compared to base (432a7f9).
Patch coverage: 41.66% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1655      +/-   ##
==========================================
- Coverage   85.52%   85.46%   -0.07%     
==========================================
  Files         130      131       +1     
  Lines        8153     8165      +12     
==========================================
+ Hits         6973     6978       +5     
- Misses       1180     1187       +7

Impacted Files	Coverage Δ
src/Distributions.jl	`100.00% <ø> (ø)`
src/univariates.jl	`74.07% <ø> (ø)`
src/univariate/continuous/maximum.jl	`41.66% <41.66%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

sethaxen · 2022-12-28T20:09:19Z

I like the idea of having such a distribution. I imagine though that users would find an analogous Minimum or Median useful. Which makes me wonder if it would be better to add a general OrderStatistic(dist, n, i) distribution that takes a parameter n (sample size) and a parameter i (order statistic). Maximum would correspond to OrderStatistic(dist, n, n), while Minimum would be OrderStatistic(dist, n, 1), and Median would be OrderStatistic(dist, n, fld(n, 2)). One could even have a Quantile(dist, n, p) alias that is just OrderStatistic(dist, n, floor(Int, p*n)). Though I'm in general not a fan of things that look like constructors returning other objects.

Vilin97 · 2022-12-30T19:42:57Z

Thank you for your interest, @sethaxen! I agree that OrderStatistic would be a good generalization but the formulas for its cdf and pdf are considerably more complicated. AFAIK, there is no closed form formula for the inverse of the cdf. Therefore, I don't think it makes sense to implement OrderStatistic.

sethaxen · 2022-12-30T21:04:30Z

Thank you for your interest, @sethaxen! I agree that OrderStatistic would be a good generalization but the formulas for its cdf and pdf are considerably more complicated.

They seem complicated, but they're much simpler when written as a function of the PDF and CDF of the binomial distribution (Edit: probably need to double-check that the pdf works even when dist is discrete):

function orderstat_pdf(dist, n, r, x)
    d = Binomial(n - 1, cdf(dist, x))
    return n * pdf(d, r - 1) * pdf(dist, x)
end

function orderstat_cdf(dist, n, r, x)
    d = Binomial(n, cdf(dist, x))
    return cdf(d, n) - cdf(d, r - 1)
end

AFAIK, there is no closed form formula for the inverse of the cdf. Therefore, I don't think it makes sense to implement OrderStatistic.

Several other distributions in this package don't have closed-form expressions for the quantile function e.g.

Distributions.jl/src/mixtures/mixturemodel.jl

Lines 444 to 448 in 432a7f9

    
           function quantile(d::UnivariateMixture{Continuous}, p::Real) 
        
               ps = probs(d) 
        
               min_q, max_q = extrema(quantile(component(d, i), p) for (i, pi) in enumerate(ps) if pi > 0) 
        
               quantile_bisect(d, p, min_q, max_q) 
        
           end

There are several quantile algorithms in https://github.com/JuliaStats/Distributions.jl/blob/master/src/quantilealgs.jl that can be used here. There's likely a way to set absolute lower and upper bounds for the quantile for quantile_bisect (Edit: you could use the quantiles for the minimum and maximum as the extrema, but maybe tighter bounds are possible).

sethaxen · 2023-01-27T09:41:12Z

Hi @Vilin97 are you interested in tackling this more general OrderStatistics distribution?

Vilin97 · 2023-01-27T16:15:24Z

Hi @Vilin97 are you interested in tackling this more general OrderStatistics distribution?

I'm not, sorry! I understand the desire for more generality but in this case it will come at the expense of less certainty, e.g. with quantiles. I will leave the implementation of orderStatistics to someone else.

sethaxen

On further thought, even if we have an OrderStatistic distribution, it makes sense to have independent Maximum and Minimum distributions. I'll tackle the other two once this is finished, but I have some suggestions:

sethaxen · 2023-01-29T08:56:14Z