Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New type hierarchy for ValueSupport #945

Closed
wants to merge 18 commits into from
Closed

New type hierarchy for ValueSupport #945

wants to merge 18 commits into from

Conversation

richardreeve
Copy link
Contributor

@richardreeve richardreeve commented Jul 30, 2019

Allows non-Int, non-Float64 eltypes for countable and continuous support respectively, and discontinuous distributions in general.

…ypes for countable and continuous support respectively, and discontinuous distributions in general.
@richardreeve
Copy link
Contributor Author

I've just stripped type hierarchy out of #941 without any improvements to the distributions as requested by @matbesancon. Do feel free to stick with the original PR if it's manageable on its own! May be of interest to @mschauer also.

@codecov-io
Copy link

codecov-io commented Jul 31, 2019

Codecov Report

Merging #945 into master will decrease coverage by 0.42%.
The diff coverage is 80%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #945      +/-   ##
==========================================
- Coverage   77.96%   77.53%   -0.43%     
==========================================
  Files         112      112              
  Lines        5363     5409      +46     
==========================================
+ Hits         4181     4194      +13     
- Misses       1182     1215      +33
Impacted Files Coverage Δ
src/multivariate/mvlognormal.jl 96.77% <ø> (-0.06%) ⬇️
src/multivariate/dirichlet.jl 59.35% <ø> (-0.22%) ⬇️
src/multivariate/mvnormal.jl 71.08% <ø> (-0.52%) ⬇️
src/Distributions.jl 100% <ø> (ø) ⬆️
src/multivariate/product.jl 80% <ø> (-6.67%) ⬇️
src/univariate/continuous/normal.jl 95.1% <ø> (-3.61%) ⬇️
src/multivariate/mvnormalcanon.jl 80.43% <ø> (+1.71%) ⬆️
src/multivariate/mvtdist.jl 59.34% <ø> (-0.45%) ⬇️
src/functionals.jl 78.57% <100%> (+3.57%) ⬆️
src/mixtures/mixturemodel.jl 78.37% <100%> (+0.9%) ⬆️
... and 67 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 78b5d96...69bd2cc. Read the comment docs.

src/common.jl Outdated Show resolved Hide resolved
src/common.jl Outdated Show resolved Hide resolved
src/common.jl Outdated Show resolved Hide resolved
@matbesancon
Copy link
Member

minor comments, otherwise this will be good with some tests

@matbesancon
Copy link
Member

alright, we'll freeze this one and wait for #945 to be merged, then the diff will be much smaller

richardreeve and others added 3 commits July 31, 2019 09:29
Co-Authored-By: Mathieu Besançon <mathieu.besancon@gmail.com>
Co-Authored-By: Mathieu Besançon <mathieu.besancon@gmail.com>
…amework.

Minor bugfix for MixtureModel with non-fully parameterised types and add testing.
@richardreeve
Copy link
Contributor Author

Was that last message intended for #941 @matbesancon? In the meanwhile I've added some very minor fixes to distributions (and tests) that are currently overriding the default behaviour because they use non-Float64's in continuous distributions.

@matbesancon
Copy link
Member

Was that last message intended for #941 @matbesancon?

Yes absolutely sorry!

@matbesancon
Copy link
Member

matbesancon commented Aug 1, 2019

Thanks for appveyor good catch, maybe we should have this as a separate PR and merge it right away in a separate one?

@matbesancon
Copy link
Member

@richardreeve just did, you can merge master into your branch :)

@richardreeve
Copy link
Contributor Author

richardreeve commented Aug 1, 2019

I'm now pretty sure that this is complete. The only other thing I could add to this branch is switching pdf() -> pmf() for all of the discrete distributions.

@matbesancon
Copy link
Member

matbesancon commented Aug 1, 2019

Cool, let's see what the coverage is saying, also let's keep pmf for the next PR, that's big enough I think

@richardreeve
Copy link
Contributor Author

Aside from the comment you made above (that I still don't understand!), is this good to go?

@matbesancon
Copy link
Member

Sorry for the misunderstanding, I was thinking pdf -> pmf can go in the next PR, like #941

I'm going to have a last review and let others voice their opinion, but looks good yup :)

@richardreeve
Copy link
Contributor Author

richardreeve commented Aug 1, 2019

Ah, no - I understood the comment about pmf() - it was the (review) comment about the docs above that I didn't understand.

@matbesancon
Copy link
Member

Ah ok sorry, it's about not forcing it to Float64, but leaving other "continuous" types (other floats and other continuous-like real types)

@richardreeve
Copy link
Contributor Author

Okay, right. The new "default" is to use ContinuousSupport{T <: Number}, but just for backward compatibility I have Continuous = ContinuousSupport{Float64} since most distributions are hard coded with Float64s.

At some point it would be nice to generalise this, but it's non-trivial. In particular being about to extend to allow Unitful numbers for many continuous distributions, which is the first thing I'd like to do, requires us to do detailed type checking since typeof(μ) ≠ typeof(σ²), for instance.

@matbesancon
Copy link
Member

having this alias may create a barrier to these generic re-writes, (which we very much want to achieve). I would use ContinuousSupport{Float64} where it needs hard-coding only, otherwise people will tend to use Continuous, shorter names tend to be preferred.

@mschauer
Copy link
Member

I would like to react to this first, can we agree on a time-line when this is merged?

@matbesancon
Copy link
Member

can we agree on a time-line when this is merged?

Let us say in one week from now?

@mschauer
Copy link
Member

That would be great, thank you a lot.

@richardreeve
Copy link
Contributor Author

richardreeve commented Aug 15, 2019

Why? We’ve already had a week (three weeks since the first PR, and over two since this one started) and we’ve achieved nothing except that I now have very little time or energy left to finish this work. I introduced this PR to improve the package, as the several thousand other lines of code that I’ve contributed to this and Distributions.jl, which have added functionality, fixed bugs and added improved testing. Why are we waiting any longer?

@matbesancon
Copy link
Member

There will likely be no release in a week from now, I still need to fix the "no-argument-check" thing, the PR I created introduces too much complexity, fixing it will not happen before then

@richardreeve
Copy link
Contributor Author

That’s fine, but there are two more PRs before this is all done - #941 and maybe #951...

@matbesancon
Copy link
Member

#951 being breaking, it would be preferable in a separate release later. #941 introduces new features, so same, it should maybe come with a new release

@matbesancon
Copy link
Member

This PR or #941?

@richardreeve
Copy link
Contributor Author

I can do #951 in a non-breaking way if you think it’s a desirable... it would basically involve removing things like ContinuousUnivariateDistribution as well as Continuous from the package code and using UnivariateDistribution{ContinuousSupport{T}} instead everywhere. Then keeping Continuous and ContinuousUnivariateDistribution but relegating them to src/deprecates.jl even though they can’t actually be deprecated and removing them from the documentation, and then it could be included immediately.

I actually think that’s desirable as there are too many const aliases knocking around, but it’s not currently implemented...

@mschauer
Copy link
Member

This PR or #941?

This one, because subtyping <: ValueSupport instead of now ValueSupport{T} will give an invalid subtyping in definition of ... error:

julia> using Phylo
[ Info: Precompiling Phylo [aea672f4-3940-5932-aa44-993d1c3ff149]
ERROR: LoadError: LoadError: invalid subtyping in definition of Phylogenetics

@richardreeve
Copy link
Contributor Author

Fair enough. I hadn’t spotted this. Phylo is my package so easily fixed, but in any event I can easily fix this by changing ValueSupport to a different name - SupportType probably - and adding another const alias.

@matbesancon
Copy link
Member

I can easily fix this by changing ValueSupport to a different name - SupportType probably - and adding another const alias.

It sounds reasonable yes. Maybe we could deprecate ValueSupport for people to use SupportType{T} instead in that case? (maybe Support is better than SupportType here)

@mschauer
Copy link
Member

Well, it is a PR which substantially changes the internals, so some breakage is rather expected. This is just what I found when looking. So much from myself now.

@richardreeve
Copy link
Contributor Author

richardreeve commented Aug 15, 2019

Yes, I was thinking about Support - I was only hesitating because short exported names always make me a bit nervous in case there are name clashes with other packages...

Edit: actually, I don’t really believe this in this situation, so I’ll go with Support. Thanks for spotting it anyway, @mschauer.

@matbesancon
Copy link
Member

@richardreeve did you have time to do the changes?

@richardreeve
Copy link
Contributor Author

richardreeve commented Aug 20, 2019

Hi @matbesancon - I was actually walking in the alps last week (must stop checking github when I get a signal!), so not yet, but I'm back now and I'll get it done tonight!

Edit: I've got corrections to two theses to finish going through (somewhat!) unexpectedly so it'll have to be Thursday, sorry.

@@ -67,10 +80,11 @@ into an array, depending on the variate form.
nsamples(t::Type{Sampleable}, x::Any)
nsamples(::Type{D}, x::Number) where {D<:Sampleable{Univariate}} = 1
nsamples(::Type{D}, x::AbstractArray) where {D<:Sampleable{Univariate}} = length(x)
nsamples(::Type{D}, x::AbstractVector) where {D<:Sampleable{Multivariate}} = 1
nsamples(::Type{D}, x::AbstractArray{<:AbstractVector}) where {D<:Sampleable{Multivariate}} = length(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an unrelated change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was fixing the handling of numbers throughout the code and I spotted that multivariate was missing this function - this isn't a deletion... the edited code is in the next line:

nsamples(::Type{D}, x::AbstractVector{<:Number}) where {D<:Sampleable{Multivariate}} = 1


function expectation(distr::CountableUnivariateDistribution,
g::Function, epsilon::Real)
sum(support(distr)) do x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make sense, this is a loop over typically 2ˆ64 elements?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a slight editing of the original code:

    f = x->pdf(distr,x)
    (leftEnd, rightEnd) = getEndpoints(distr, epsilon)
    sum(x -> f(x)*g(x), leftEnd:rightEnd)

I don't have a strong opinion about whether it's sensible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are dropping the epsilon

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. That's because this is for non-ContiguousSupport types - they are not necessarily ordered, so we can't use getEndpoints(). At the moment, the only type that falls into this category is DiscreteNonParametric, but it will also include Dirac, neither of which are expected to have anything like 2^64 elements..

length(searchsorted(support(d), x)) > 0
insupport(d::DiscreteNonParametric{T}, x::Number) where {T<:Number} =
length(searchsorted(support(d), convert(T, x))) > 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer support to do the right thing here and not insupport to second guess the needed transformation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you expect support to do here? It's just returning the support, and then we're checking for presence of the value in that set.

@mschauer
Copy link
Member

FYI: I still have the "postponed by a week" in mind and will come back later today.

@mschauer
Copy link
Member

One thing, to be clear, what would you consider the set of features I would have to reproduce when I would like to replace this PR by my own attempt? Besides making the additional tests pass?

@richardreeve
Copy link
Contributor Author

richardreeve commented Aug 22, 2019

The problem with having split this off from #941 is that the new tests, which will emerge from the new distributions, are not included. The tests changed here mostly just change the test type hierarchy to match the new changes.

This PR provides a mechanism for supporting non-Int, non-Float64 eltypes for countable and continuous support respectively, and discontinuous distributions in general, in a consistent and comprehensible way. It does this whilst keeping consistency with the old code base so there are no - or very few! - breaking changes, and without increasing the number of things that people will forget to do when implementing their own distributions. That's what it tries to do - it's an abstract thing, not a specific set of features per se.

@richardreeve
Copy link
Contributor Author

richardreeve commented Aug 22, 2019

Note: This PR is now largely superseded by #951, which includes all of these changes, but also totally removes all references to the old const aliases from the codebase, and in so doing removes several new bugs that weren't identified here. This means that anyone referring to the code to understand how to write a new distribution will see the new API in action. I would strongly recommend merging that rather than this. Although it adds a few hundred lines of code, they are nearly all just replacing old aliases with more explicit types.

@richardreeve richardreeve closed this by deleting the head repository Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants