-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OrderStatistic and JointOrderStatistics distributions #1668
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #1668 +/- ##
==========================================
- Coverage 85.92% 85.89% -0.03%
==========================================
Files 139 142 +3
Lines 8376 8560 +184
==========================================
+ Hits 7197 7353 +156
- Misses 1179 1207 +28
☔ View full report in Codecov by Sentry. |
I've only seen the expression of the density of joint order statistics DerivationGiven the density function Suppose now we marginalize all Let We can repeat this to get the joint density of any subset of order statistics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't added them to the docs yet. Univariate distributions are currently separated into Continuous and Discrete categories, but OrderStatistic
can be either. Plus, JointOrderStatistics
and OrderStatistic
are so closely related, it might make sense to give them their own docs page. Could even be a page devoted to distributions of statistics of samples, in case in the future someone adds asymptotic distributions of statistics like quantiles, etc.
# this is slow if length(d.r) is close to n and quantile for d.dist is expensive, | ||
# but this branch is probably taken when length(d.r) is small or much smaller than n. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably maybe worth creating samplers for these two cases.
JointOrderStatistics(Cauchy(), 10, [1, 10]) # joint distribution of the extrema | ||
``` | ||
""" | ||
struct JointOrderStatistics{D<:ContinuousUnivariateDistribution,R<:AbstractVector{Int}} <: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the continuous case, we can write down the PDF and not much else (except the mean and covariance for a handful of dist
s, for a future PR).
For the discrete case, the PDF is much more complicated and probably only tractable when length(r) <= 2
. For a discrete dist
whose support is a finite set, we can define a distribution of the order statistics for a sample without replacement, at least in the bivariate case. The PMF, mean, and covariance are known. I wonder if that should be a special case of this distribution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if that should be a special case of this distribution.
Nah, seems complicated and not all that useful.
Bump @devmotion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the embarassingly long delay, I had completely forgotten this PR (again).
I made a few comments. I also wonder if we could add a comparison with existing R implementations, to exploit additionallu the existing test infrastructure and run standardized tests for evaluations and sampling?
Plus, JointOrderStatistics and OrderStatistic are so closely related, it might make sense to give them their own docs page
I agree, I think a separate page in the docs could be useful.
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
No problem!
Unfortunately I don't think there are any standard R implementations of distributions of order statistics against which we can compare.
Done! |
function _rand!(rng::AbstractRNG, d::JointOrderStatistics, x::AbstractVector{<:Real}) | ||
n = d.n | ||
if n == length(d.r) # r == 1:n | ||
# direct method, slower than inversion method for large `n` and distributions with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, no I'm not aware of such an example. Might be good enough for now.
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great PR, looks good to me @sethaxen!
I spotted only a few final things that, IMO, should be addressed before merging the PR. But I already approve now to indicate that it seems fine to me otherwise.
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Great, thanks for the review! |
I realized that when we evaluated |
Great! I'll merge it within the next days, if there are no objections. |
This PR will add two new distributions:
OrderStatistic
: a univariate distribution representing the distribution of the ranki
th draw in an IID sample of lengthn
from some (continuous or discrete) univariate distributiondist
.JointOrderStatistics
: a continuous multivariate distribution representing the joint distribution of the rankr
draws in the same sample.These have a number of uses. For example, they can represent transformations that have been performed to data (e.g. outlier removal and calculation of summary statistics). They also can be used to draw confidence intervals, e.g. of ECDF plots. For
JointOrderStatistics
, in the extreme case wherer=1:n
, i.e. when the distribution is over all ranks, it can be used in a PPL like Turing to restrict an IID array of parameters to the subset of arrays that are ordered.Relates #1643 #1655, closes #1284