Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release Distributions.jl v1.0 #880

Open
non-Jedi opened this issue May 7, 2019 · 36 comments
Open

Release Distributions.jl v1.0 #880

non-Jedi opened this issue May 7, 2019 · 36 comments
Labels
Milestone

Comments

@non-Jedi
Copy link
Contributor

non-Jedi commented May 7, 2019

I'm eager to have the guarantees of semver for packages I use as dependencies. What needs to happen before this package could be considered 1.0 and released as stable? From the outside it seems like it might already be ready for that, but I don't have a good view on any breaking changes that may still need to be made or essential functionality that's missing.

@simonbyrne
Copy link
Member

#823 is definitely a big breaking change.

@matbesancon
Copy link
Member

@non-Jedi thanks for this issue.

  • As mentioned, Release StatsBase.jl v1.0 StatsBase.jl#493 is a big one. Once done, we have to migrate all StatsBase functions
  • WIP: Add keyword constructors #823 made us realize the tests are in a bad state, we should remove JSON-based tests and replace them with generated Julia files, will be much faster
  • Improved documentation for sampler and distribution interfaces. In the current state, it's very hard to know what is required and what is optional for implementing new distributions. The doc should be improved for that.
  • if done at all, Remove UnivariateGMM #844 needs to be done before freezing
  • there are some design issues with mixtures
  • removing dependencies on PDMats would make the lib way more lightweight cc @andreasnoack MvNormal Matrix Types and Speed #688

Given how much work that already is on the plain stuff which would make Distributions 1.0-ready, I tend to think #823 is optional there, if not present, it should be a 2.0 milestone

@datnamer
Copy link

datnamer commented May 8, 2019

What's needed for GPU support?

@nalimilan
Copy link
Member

* [](https://github.com/JuliaStats/Distributions.jl/issues/688)

Given how much work that already is on the plain stuff which would make Distributions 1.0-ready, I tend to think #823 is optional there, if not present, it should be a 2.0 milestone

It would be weird to tag 1.0 with a large PR that everybody is looking forward. At least, deprecations could be made so that the PR can be merged without breaking anything in 1.x.

OTC, refactoring tests doesn't affect the API, so I don't see what it should be on the 1.0 roadmap. Same for documentation.

@matbesancon
Copy link
Member

matbesancon commented May 8, 2019

@datnamer

What's needed for GPU support?

I am not sure what you mean. In doubt, I'd say generic containers everywhere instead of hardcoded Float64

@matbesancon
Copy link
Member

Refactoring tests is a major point blocking #823.
I was considering 1.0 not only from the semver point of view, but also as signal of readiness and robustness of the library. Documentation here is also documenting clearly what is required as interface for a distribution, otherwise any removal afterwards is breaking

@simonbyrne
Copy link
Member

Refactoring tests is a major point blocking #823.

Not really: we could merge #823, it's just that the keyword constructors aren't really being tested.

@matbesancon
Copy link
Member

Not really: we could merge #823, it's just that the keyword constructors aren't really being tested.

Not testing the keyword constructor would be problematic, we would merge something we are not able to test against moving forward.

Ok for #823 before 1.0 yes, even if we don't need to see 2.0 as a super-distant milestone, unlike the Julia project itself. DifferentialEquations has been moving on with major releases

@matbesancon
Copy link
Member

@non-Jedi if you want to pick up some issues from there feel free :)
I tagged a bunch of issues as up-for-grabs

@matbesancon
Copy link
Member

Also: @cscherrer, this one is still open and could potentially lead to breaks
#771

@matbesancon matbesancon added this to the 1.0 milestone Jun 5, 2019
@cossio
Copy link
Contributor

cossio commented Jun 10, 2019

Please consider also #911, which I think could also lead to breaking changes.

@matbesancon
Copy link
Member

It's not major, it could lead to breaking change but nothing awful, and always fixable by specifying types

@matbesancon
Copy link
Member

Not if you add a deprecation fallback, but anyway yes this will be better done before 1.0 to be able to break it

@richardreeve
Copy link
Contributor

As far as breaking changes are concerned before 1.0, can I also put in a vote for design issues with mixture models - specifically, but not only, combining discrete and continuous distributions to get discontinuous distributions - I'm gathering thoughts in #925 (and on slack). Also fixing discrete distributions having to be Ints rather than any countable set - see #916 for example - will be mildly breaking.

@DilumAluthge
Copy link

#916 would be breaking since it modifies the existing DiscreteNonParametric distribution. But we could add a new distribution, separate from the existing DiscreteNonParametric, that allows the support to be any countable set. And that wouldn't be breaking, if it were a new distribution.

@richardreeve
Copy link
Contributor

But in any event, extending Discrete to allow any countable set rather than just Ints (which is the current spec) and Reals rounded or floored to Ints would be breaking (albeit in ways that hopefully no-one is using), wouldn't it? I know you're only doing part of that at the moment, you're just doing non-Numbers, but it seems a logical thing to do too, and any code that depends on discrete distributions being integer-based will break anyway...

@xukai92
Copy link
Contributor

xukai92 commented Jul 15, 2019

@datnamer

What's needed for GPU support?

I am not sure what you mean. In doubt, I'd say generic containers everywhere instead of hardcoded Float64

Also it seems that a common use case in deep learning is not effeciently handled by the current design. Say we have a vector of Isotropic Gaussian vg = [MvNormal(mu1, sig1), MvNormal(mu2, sig2), ...] and I'd like to draw one sample from each Gaussian rand.(vg). This operation is much faster if all mus and sig are concatenated into matrix and the sampling needs only be done once via matix operations, which is fast in GPU. I think this is a general pattern in deep learning and could be accelerated by GPUs. Any idea if this kind of "GPU batch operation" would be somehow supported?

@matbesancon
Copy link
Member

matbesancon commented Aug 27, 2019

@xukai92 would this imply a large sparse covariance matrix of the stacked random variables?

@xukai92
Copy link
Contributor

xukai92 commented Sep 16, 2019

Yes I believe this is one way to say it.

@davidanthoff
Copy link
Contributor

My vote would be to just release 1.0.0 as is, so that we get the full richness of semver going forward, it just makes the life of downstream packages so much easier.

I wouldn't mind at all if there is a breaking 2.0.0 release soon after, and if there is a need for 3.0.0, well, then that can come as well. I would not treat 1.0.0 as anything special, other than unlocking more features from the package manager.

@darsnack
Copy link
Contributor

I opened a new issue for GPU-related support ( #1067 ) to coalesce the steps it would take to get the package there. Happy to start contributing towards this goal once the community discusses a bit.

@nickrobinson251
Copy link
Contributor

Where are we on this?

If we tagged v1.0, I think the only things likely to make us release v2.0 shortly after are #823 and #951

Those are both large PRs with very little progress on them in the last 6 months (which is fair enough -- they'll be nice when we do get them over the line). I work on a couple of (private) packages that have compat entries like Distributions = "0.16, 0.17, 0.18, 0.19, 0.20, 0.21, 0.22", so I think we're in a good place regarding stability.

@matbesancon what do you thinking about tagging v1? What would be your main concerns?

@matbesancon
Copy link
Member

@nickrobinson251 thanks for keeping up on this. I tend to agree with your point, the two PR mentioned have been stalled and I'd be fine with keeping them for a 2.0.
I think one last breaking PR is #1079
I would be for releasing a 0.23 with this last PR incorporated, along with any other imminently mergeable, and then 1.0 with all the deprecations removed

What do you think? Also my timetable is a bit tight these days, and I feel like it's the case for many other maintainers, I'd appreciate any help on checking what has to be done before 1.0 :)

@richardreeve
Copy link
Contributor

I stopped working on #951 after several months a while ago because it wasn't clear it was going to be merged to be honest (and I had too much teaching at the time). I think it's the right thing to do - ValueSupport needs to be extended to handle more distributions, and there are several PRs that address that kind of issue in specific cases, but I spent a lot of time on it but it felt (to me) like it wasn't going to be merged, it was just going to drift on taking time for ever. I'd be very happy if I was wrong. If the repository owners are still interested and will merge it I can work it up again? Even for 1.0 if it's desirable as it is lightly breaking...

@cossio
Copy link
Contributor

cossio commented Mar 10, 2020

Regarding 1.0. Is there a decision on what to do about issues like #1071? In general, how to allow for different return types of random numbers generated from a distribution?

@richardreeve
Copy link
Contributor

This was one of the things solved by #951 - it imputed desired return types from parameter types. If that's the right thing to do of course!

@StefanKarpinski
Copy link

Bump. Lack of a 1.0 release is causing real pains for people: JuliaRegistries/General#33273.

@DilumAluthge
Copy link

It might be worth making two separate lists of:

  1. What breaking changes are actually blocking a 1.0 release?
  2. Breaking changes that could wait until a 2.0 or 3.0 release.

@StefanKarpinski
Copy link

Remember: integers are free. There's not harm in releasing 1.0 now and releasing 2.0 whenever it's ready, be it next year or whatever. Given how long we've been waiting for 1.0, seems reasonable to make a 1.0 release sooner than later.

@nalimilan
Copy link
Member

I agree it's probably a good idea to tag 1.0. But often the problem is that no breaking changes have been merged, so that moving to 1.0 would require all dependencies to bump their version bounds even though nothing will break. That's the 1.0 paradox. :-)

@nickrobinson251
Copy link
Contributor

If we release v1.0 now, such that it's basically identical to v0.24.15 (or whatever the latest release is), then that makes it very easy for users to switch to v1.0; there's no having to update code for breaking changes. Of course, users will need to update to v1.0 to get any new features/bugfixes, but if/when they want these things the update will be trivial. Given how widely used Distributions.jl is, i think it's no bad thing to make it as trivial as possible for packages to update.

@StefanKarpinski
Copy link

It would be easy to automatically bump all the registry compat bounds if there's no breaking change. The alternative is to continue with 0.24.x releases until a breaking change is made and the release 0.25.0. The main problem here is using the leading non-zero version number for non-breaking releases. See https://discourse.julialang.org/t/psa-if-your-packages-version-is-of-the-form-0-y-z-please-do-not-bump-the-minor-version-for-new-features/46570.

@simsurace
Copy link
Contributor

Are there any updates on this? Is https://github.com/JuliaStats/Distributions.jl/milestone/1 still reflecting the current view of what's required to tag v1?

@devmotion
Copy link
Member

Personally, I think in the next breaking release would be good to resolve at least also the rand inconsistencies (there's an open PR, it's stuck due to ForwardDiff weirdness and breaking changes), the more general product distributions (there's a PR but had to revert some breaking changes) and more general affine transformations (also breaking). There are a bunch of other PRs with additional breaking changes which might be good to include as well.

In general, possibly it might be too much to aim for including all breaking changes in the next breaking release. Which makes me wonder though how useful a 1.0 release is at this point if there are already additional breaking changes planned.

@matbesancon
Copy link
Member

I haven't followed closely all the open streams but +1 on @devmotion's point on the rand inconsistencies.

After that I think we could switch to a major version system and accept the fact that 1.0 will not be set in stone but a first stable one before we can hammer out the other breaking changes

@StefanKarpinski
Copy link

If there's more breaking changes already planned, I agree, put 1.0 off until they're done. But I agree with @matbesancon that it's fine to make a 1.0 releases even if there are more conceivable breaking changes—after all making a 2.0 release is not the end of the world.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests