Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store nothing bounds in Truncated #1720

Merged
merged 1 commit into from
May 15, 2023
Merged

Store nothing bounds in Truncated #1720

merged 1 commit into from
May 15, 2023

Conversation

devmotion
Copy link
Member

This PR adds support for nothing bounds in Truncated: While they are supported (and in fact encouraged) by truncated for specifying left- or right-truncated distributions, currently they are not stored in Truncated as given but instead converted to and saved as +Inf/-Inf.

This loss of information is not only surprising but also problematic whenever specializations with specialized return types for left- or right-truncated distributions are needed. For instance, one application (in a private repo) is Bayesian inference in which different transformations of the support are required depending on whether the support of a truncated normal distribution is lower- and/or upper-bounded.

The PR uses the same approach (and also some of the code) that is already used for censored/Censored.

@codecov-commenter
Copy link

codecov-commenter commented May 9, 2023

Codecov Report

Patch coverage: 87.09% and project coverage change: -0.03 ⚠️

Comparison is base (4fd8be8) 85.85% compared to head (95777ac) 85.83%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1720      +/-   ##
==========================================
- Coverage   85.85%   85.83%   -0.03%     
==========================================
  Files         138      138              
  Lines        8337     8350      +13     
==========================================
+ Hits         7158     7167       +9     
- Misses       1179     1183       +4     
Impacted Files Coverage Δ
src/truncate.jl 88.28% <83.33%> (-2.72%) ⬇️
src/truncated/normal.jl 85.60% <95.00%> (+0.22%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@@ -1,7 +1,7 @@
name = "Distributions"
uuid = "31c24e10-a181-5473-b8eb-7969acd0382f"
authors = ["JuliaStats"]
version = "0.25.90"
version = "0.25.91"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a JuliaHub package search to see whether any packages might be relying on Truncated{D,S} being concrete and/or using the bounds directly and assuming they're real-valued. Looks like we're in the clear for the former but there are a few registered packages that will break due to the latter: ModelConstructors, DiffEqBayes, and DiffEqBayesStan.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that this is an internal change and things may break for downstream packages that access fields of (to be internal) structs that are not part of the API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance, I don't think code such as in ModelConstructors is a reason to move this change to a breaking release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

access fields of (to be internal) structs

? I'm not sure what you mean by "to be internal" here.

I am generally of the opinion that if you touch the fields/properties of an object then you should expect to lose a finger at any moment. That said, it doesn't seem to be documented that these fields aren't public, though the fields themselves also aren't documented so perhaps it can be reasonably assumed that they aren't public.

I don't think code such as in ModelConstructors is a reason to move this change to a breaking release.

That's fair, but it could be worthwhile to submit PRs to the packages that will be affected by this change to ensure they use extrema (or similar) instead to avoid any issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? I'm not sure what you mean by "to be internal" here.

I meant that generally I think these fields do not belong to any official API, so there are no guarantees at all and code that accesses them may break at any time, but that in this case even the whole struct is intended to be an internal implementation detail in the next breaking release. For backward compatibility, Truncated is still exported but all outer constructors are already deprecated since users are supposed to call truncated instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think probably a middle ground would be good: Not exporting Truncated anymore (since as long as it is exported, users will construct truncated distributions with available constructors of Truncated even though it is deprecated and not recommended) but mentioning it in the docs as a fallback that can be dispatched on without any guarantees on the fields or type parameters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding your initial comment, I submitted PRs to the three affected packages: SciML/DiffEqBayes.jl#304, StanJulia/DiffEqBayesStan.jl#18, FRBNY-DSGE/ModelConstructors.jl#18

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the deprecated constructor methods are removed, the only remaining method for Truncated is its inner constructor. If a user really wants to go through the trouble of computing all of the requisite quantities that get stored in Truncated in order to pass them to the inner constructor, they either have a very good reason to do so or all hope is lost and they're simply an agent of chaos who will do as they please regardless of whether it's exported.

I'd recommend we continue to export Truncated but add a note to its docstring that it isn't intended to be constructed directly and to use truncated for that purpose (currently truncated links to Truncated but not vice versa), and that the fields are an implementation detail. The one guarantee it would be nice to make is that the first type parameter is a Distribution but there may be more so don't assume Truncated{D} is concrete.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why export it if you should not use it? To me it seems more reasonable to not export it and only clearly state and document the guarantees on the fallback type (and hence make changes to them require a breaking release).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why export it if you should not use it?

IMO it's not that you shouldn't use it, just that you shouldn't assume that its constructor is The Way™ to obtain a truncated distribution.

@devmotion
Copy link
Member Author

The PR to ModelConstructors should not block this PR IMO since the latest tagged release of this package does not even support Distributions 0.25 it seems. It is also a bit unclear to me if it is still actively maintained given that there are no recent commits on the master branch.

@devmotion
Copy link
Member Author

@ararslan Since the PRs to DiffEqBayes and DiffEqBayesStan are merged, do you think this PR is ready to be merged?

Comment on lines +102 to +103
const LeftTruncated{D<:UnivariateDistribution,S<:ValueSupport,T<:Real} = Truncated{D,S,T,T,Nothing}
const RightTruncated{D<:UnivariateDistribution,S<:ValueSupport,T<:Real} = Truncated{D,S,T,Nothing,T}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps export these to accompany Truncated? They seem generally useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this should be part of the PR, mainly for the following three reasons: 1) LeftCensored/RightCensored are not exported either, 2) It is easy to export them in a future non-breaking release but they can't be un-exported in a non-breaking release once they are exported, 3) Noy every left/right Truncated distribution is of type LeftTruncated/RightTruncated.

@devmotion devmotion merged commit 32cea9b into master May 15, 2023
@devmotion devmotion deleted the dw/truncated branch May 15, 2023 16:53
@ararslan
Copy link
Member

Seems that 32cea9b#diff-0c632d946a50f0251329e7ba1bc19ab4e19bd0f1815607122fc8ef30efe4dadbL15-L16 is causing the type inference tests added by #1717 to fail. With those more specific definitions, the extrema of a truncated normal with Float32 parameters were also Float32s, but when falling back to calling extrema on the underlying untruncated distribution, you get Float64s. I believe this is because the support for Normal is specified with Float64 literals.

@devmotion
Copy link
Member Author

Unfortunate that this was not discovered before merging the PR.

I think removing the specialization in this PR is correct - why should Truncated{<:Normal} handled differently than e.g. Truncated{<:Laplace} or Truncated{<:Beta}?

Another question is, of course, whether we want extrema to return Float64 or Float32 in this case. To me it seems reasonable to expect that typeof(minimum(dist)) === eltype(dist) for univariate distributions (and analogously for maximum(dist)), i.e., that the type of rand(dist) and minimum(dist)/maximum(dist) match. Generally, the current design of Distributions defines eltype(dist) === Float64 for continuous distributions, regardless of the parameter type partype(dist) (there are many discussions and issues - and an open PR - regarding this design). Thus returning Float64 would be correct. Unfortunately, some distributions such as Normal are defined incorrectly in so far that rand returns a number of type partype for them.

@ararslan
Copy link
Member

ararslan commented May 16, 2023

I think removing the specialization in this PR is correct

Yeah I agree, I only mention it because it seems to be the cause of the issue.

I'm mostly indifferent on what extrema should return, and in some sense it's tangential to this; it's relevant in that its use in the computation of truncated normal moments means that the output doesn't match the type of the parameters, which is inconsistent with untruncated distributions. That's of course easy enough to address internally but it does mean that downstream users may run into similar situations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants