Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto block registration of packages pre-1.0 #111019

Open
oxinabox opened this issue Jul 13, 2024 · 21 comments
Open

Auto block registration of packages pre-1.0 #111019

oxinabox opened this issue Jul 13, 2024 · 21 comments

Comments

@oxinabox
Copy link
Contributor

oxinabox commented Jul 13, 2024

From SemVer 2.0

How do I know when to release 1.0.0?

If your software is being used in production, it should probably already be 1.0.0. If you have a stable API on which users have come to depend, you should be 1.0.0. If you’re worrying a lot about backward compatibility, you should probably already be 1.0.0.

I am going to argue that this is normally actually the case when you register in general.
If you don't intend for people to use it you shouldn't be registering it.
Perhaps by exception people are registering with intent to iterate rapidly in a breaking way.
But it is exceptional and we can always use the functionality to bypass that rule.

It's problematic we people register pre-1.0
Because

  1. People struggle to draw the line when to hug 1.0, but on registration is a good time
  2. Releasing 1.0 once "stability is achieved" is breaking for no reason.
  3. They can't use full range of semver 3 digits, which makes backporting sometimes impossible.

I propose we only enforce this on new package registration. And allow old packages to be grandfathered.

We have talked about this on and off for years

@tecosaur
Copy link
Contributor

I don't think it would be a good hard rule to have, but I think it would be a good guideline/soft check.

I have one or two packages that are 0.x myself at the moment, because they're just about useful enough to be registered but still very much having their API/design worked out.

In most cases though, I think it's very useful to have the three channels of information in the version number:

  • Breaking changes
  • New features
  • Bugfixes

So perhaps a way this could be done is as an automerge check, allowing for manual intervention if the registrant insists that they've thought about it and really do want 0.x?

@oxinabox
Copy link
Contributor Author

oxinabox commented Aug 19, 2024

So perhaps a way this could be done is as an automerge check, allowing for manual intervention if the registrant insists that they've thought about it and really do want 0.x?

I agree.
We make it an automerge check, and be generous with exceptions.
And since I propose only doing that check it at registration time, it will not be too much barrier or inconvenience.
Since once registered would be allowed to continue.

@Tortar
Copy link
Contributor

Tortar commented Sep 11, 2024

I'm not sure if following SemVer prescription would be beneficial in this case. I think this would encourage people to release a 1.0 version when the API is not ready. Considering the amount of packages which are still in <1.0 stage in the Julia ecosystem, it seems that many mantainers feel like their packages are still not ready for a 1.0 release, which is more a lack of time maybe than anything else. This seems to me an artificial solution to a different underling problem.

@oxinabox
Copy link
Contributor Author

oxinabox commented Sep 12, 2024

I would argue its actually the opposite.
There are so so many packages with very stable API that haven't changed in years that are pre-1.0
there is no cost for initially releasing at 1.0 and then releasing 2.0 later.

@tecosaur
Copy link
Contributor

To me the motivating issue here is that we have a bunch of 0.x packages that release versions with new features, don't break API, but are considered a semver breaking release because they don't have access to that crucial third channel of semver information.

This results in needless comparability bound churn across the ecosystem. This is the problem that I see this proposal helping with.

I don't quite understand this view that occasionally seems to pop up that 1.0 is for the "finished"/"fully worked out" version of the software: it's really just for as soon as you start caring about having a public API. Agents.jl is on v6 last I checked, and that's great!

@Tortar
Copy link
Contributor

Tortar commented Sep 12, 2024

if that is the ratio of the proposal @tecosaur I'm more convinced now that it's probably a good idea actually, I didn't consider the problem of having breaking releases for new features. And also the reasoning of @oxinabox seems right, but maybe it's more debatable why mantainers didn't release a 1.0 for years. But that is not the point maybe. So for what is worth I would reconsider my position :-) But it doesn't seem a good hard rule also to me anyway.

@goerz
Copy link
Member

goerz commented Sep 21, 2024

the motivating issue here is that we have a bunch of 0.x packages that release versions with new features, don't break API, but are considered a semver breaking release

This is a misreading of SemVer. Pre-1.0, there is no such thing as a "breaking release". Pre-1.0 packages do not have a stable API, so by definition, they can't break that API. Quoting from the SemVer specification: "Major version zero (0.y.z) is for initial development. Anything MAY change at any time."

It is not a violation of SemVer to add features (or even make breaking changes) in a pre-1.0 "bugfix" release. This is actually something that, arguably,Pkg gets wrong. Or at the very least, the check in General that "All dependencies should have [compat] entries that are upper-bounded and only include a finite number of breaking releases" makes no sense for dependencies that are pre-1.0. If I'm using a pre-1.0 package as a dependency, it's my responsibility to figure out what kind of compatibility I might have with different versions of that package. It's very much going to depend on the specific package. We shouldn't be making or enforcing unwarranted assumptions on pre-1.0 dependencies.

Releasing 1.0 once "stability is achieved" is breaking for no reason.

That's also a misreading of SemVer. Breaking releases must change the major version number, but major version number changes do not need to contain breaking changes. They generally should, of course, but the change from 0.x to 1.0 especially is not a breaking change according to SemVer, because 0.x wasn't stable in the first place!

@goerz
Copy link
Member

goerz commented Sep 21, 2024

This is not to say that there aren't plenty of 0.x packages that should be 1.0. It's also not to say that pre-1.0, there isn't some less formal notion of "breaking" vs "non-breaking". I certainly use 0.y.0 releases to indicate bigger changes that are likely to "break" other packages and 0.y.x to indicate smaller changes (both new features and bugfixes). But that's a convention on top of SemVer, and not something we should strictly enforce.

I would also claim that pre-1.0 is an important part of a package's development cycle. If the developers feel that they're still "figuring out" the API, it shouldn't be 1.0. A lot of projects in Julia are related to academic research, and in such a situation, "active research" should indicate pre-1.0. If you're eventually going to publish a paper of a package, the publication of that paper seems like a good time to make a 1.0 release.

I pretty strongly believe that most packages should use 0.1.0 for registration, not 1.0.0. I have a much higher bar for registrations if the initial version is 1.0. Such a package must explicitly define a stable API, so it must have complete documentation. For pre-1.0 packages, the bar is much lower: it must have a basic explanation for what the package does, and give some idea how to use it, e.g., via a usage example. If I see a registration for a 1.0 package that does not have full documentation, I will definitely leave a blocking comment. Certainly, most submissions to General right now do not actually meet the 1.0 bar.

@frankier
Copy link
Contributor

To me the motivating issue here is that we have a bunch of 0.x packages that release versions with new features, don't break API, but are considered a semver breaking release because they don't have access to that crucial third channel of semver information.

If the problem to do with pre-1.0 packages that cause everyone to have to update compat bounds when they release new backwards-compatible features, maybe that is where the focus should be placed.

  1. Find all the pre-1.0 packages with the most dependent packages - or over some threshold
  2. Open an issue asking them to release 1.0

There's an argument to be made that the approach is "too late", but on the other hand, adding extra steps for package registration could put people off registering packages, or make them delay, which could also have bad outcomes.

@goerz
Copy link
Member

goerz commented Sep 25, 2024

The tips in the SemVer document certainly aren't wrong: If a package has a lot of dependents and thus should be worried about backward compatibility (whether the authors feel that the API is perfect or not), that package should probably make a 1.0 release. That does not mean we should be pushing people to make 1.0 releases prematurely, or impose version restrictions on registration. And I'll say it again: switching from 0.x to 1.0 simply to mark the API as "stable" is the right thing to do. SemVer does not require that 1.0 must include some kind of "breaking changes" relative to the last 0.x release. In fact, I would actively avoid changes besides "cleanup" and documentation in a 1.0 release. So finding packages that should be on 1.0 and asking them to just make that 1.0 release is a perfectly fine thing to do.

@araujoms
Copy link

araujoms commented Sep 26, 2024

Right now I have a package that depends on six 0.x packages. Three of them are definitely not ready for 1.0, but are nevertheless very useful dependencies. In a hypothetical world were the 1.0 rule was in place before they got registered, there are two possibilities: the unstable packages would unhelpfully set their versions at 1.0, or they would remain unregistered.

If they remained unregistered I would have to deal with unregistered dependencies, which is a pain in the ass, and wouldn't be able to register my own package.

@Datseris
Copy link
Contributor

Datseris commented Oct 3, 2024

the unstable packages would unhelpfully set their versions at 1.0.

How unhelpful is it really if you are already using them as a dependency in another package...? They clearly have an API, otherwise how would you be using them? I'd rather say it would be helpful instead if they were at 1.0. This gives the package authors the nudge they need to be thinking about their users. If you care about having users, you should also care about compatibility and breaking changes and SemVer is a great system for that, with the added advantage of being ultra-widely adopted. If you don't care about having users, well, why are you registering? Pkg.add(; url = ...) works perfectly fine anyways.

Overall I give a +1 to the idea of disabling the automerge for new packages that are < 1.0, and in general as a community showing that we value SemVer and hence would prefer registered packages to have an API that at least the authors themselves consider solid to be 1.0. I've expressed in many other situations that registering a package in the general registry is way too easy and having checks that disable the auto merge can be of benefit to the community.

Some packages I've registered would not pass the bar of this rule, and that's fine. I would have to wait more before registering. Not really a big deal... In the next version of Pkg.jl we will be able to add packages to our projects without them being registered, and still specify reproducibility by specifying the exact commit. Hence, even less the reason to register everything as soon as possible.

For a counterexample that also gives a +1 to the automerge block suggestion: two packages I am involved with Agents.jl is at v6, and DynamicalSystems.jl at v3. Its not like you can't release a breaking release post 1.0 if you have good reasons to. If anything, it makes you more concious about breaking changes.

@Datseris
Copy link
Contributor

Datseris commented Oct 3, 2024

@frankier and @goerz I agree with your comments; they are excellent: motivating a 1.0 for large packages that have been stable for a long time is a good idea. But I don't find it related with this issue which is about new registered packages. Additionally, I have done this several times so far and I have been faced either with no response, or no real arguments, sometimes the authors saying they don't themselves know what breaking changes they would like to do (proving the point entirely that they should release 1, but oh well...). So that's why I think your suggestions should be a separate discussion alltogether.

@tecosaur
Copy link
Contributor

tecosaur commented Oct 3, 2024

I also think it's worth highlighting that this proposal is to for automerge, not a hard rule. I still think there's the occasional well-justified 0.x package.

Drawing from my own experience, I'd point to DataToolkit, which is currently at 0.9 with major API changes every release so far. Once it settles down a bit 1.0 will be forthcoming, but it's still useful enough to register IMO. If I had to re-register it, I could flag it for manual acceptance saying I've thought about it and think that it's worthwhile at a 0.x state. To me, this case is an exception.

Like @Datseris, I think in general this would be a valuable nudge to authors to think about the API they're delivering, and the state of their package a bit more.

It's also not like we can't reverse this rule if it turns out not to be such a good idea in practice 🙂

@araujoms
Copy link

araujoms commented Oct 3, 2024

How unhelpful is it really if you are already using them as a dependency in another package...? They clearly have an API, otherwise how would you be using them?

It's a matter of communication. They do have an API, but it changes frequently, and the packages are incomplete. That's what I expect from 0.x packages. If they had set their version to 1.0 just to get registered I would have expected otherwise.

@goerz
Copy link
Member

goerz commented Oct 3, 2024

I feel very strongly that, if anything, we should go the other way: New packages should automerge only if the initial package version is <1.0, and require confirmation for >=1.0. There are very few new registrations that I've seen over the last couple of years that would have met the bar for a 1.0 version in terms of actually having a stable and fully documented (!) API [edit: and tested!]. I'd even say that except for very small and simple packages, it would be impossible for a package author to know if a v1.0 label is appropriate. No API survives contact with actually being used in the wild for the first time. The best way to know if your API is good is to have other people use it. Registration should be an indicator that you want to have people use it. I would generally recommend registering at v0.1 and then tagging v1.0 after the package has picked up some usage. There does not need to be any changes besides the documentation between v0.1 and v1.0.

I think we actually used to have an initial version 0.1 as a requirement in General. That causes issues, too, so I don't think we should enforce that at all, but 0.1 is still a good default, and something I would recommend for most registrations. A v0.1 tag is the definition of "public, but not stable".

Again, that is not to say that we shouldn't push for packages that have a lot of dependents to actually commit to their API and tag 1.0. There are numerous packages that are <1.0 and really should be 1.0. I also don't object to organizations having an internal policy of only releasing very polished packages into the public that basically could be at v1.0. But that's not a level that everyone can or should operate at. And even in that case, I think they should use a pre-1.0 tag as a "release candidate" and only tag v1.0 after registration.

@Datseris
Copy link
Contributor

Datseris commented Oct 4, 2024

There are very few new registrations that I've seen over the last couple of years that would have met the bar for a 1.0 version in terms of actually having a stable and fully documented (!) API [edit: and tested!].

I'd argue this is due to the shockingly low threshold for registration. Indeed, everytime I check out the slack channel of new registrations I am like "but why" for half of them. If there are no quality criteria to register then of course that's what you would see. So the point would be to make it harder, not easier, for packages to autoregister.

I'd even say that except for very small and simple packages, it would be impossible for a package author to know if a v1.0 label is appropriate.

I am confused by this statement. I think you have a different idea of what "1.0" means that I do. If you don't know if 1.0 is appropriate then 1.0 is appropriate. Because if you don't know it means you have no planned breaking changes. Otherwise you would know.

And to my understanding, the smaller the package the easier it is to release 1.0 because you simply have much less possibilities to make a breaking change. If you have a package with 1,000s of exported functions, any breaking change in any of them would technically justify a major version increment. So it's harder to go 1.0 for larger projects than smaller to the best of my understanding of what "1.0" conveys.

@frankier
Copy link
Contributor

frankier commented Oct 4, 2024

There are very few new registrations that I've seen over the last couple of years that would have met the bar for a 1.0 version in terms of actually having a stable and fully documented (!) API [edit: and tested!].

I'd argue this is due to the shockingly low threshold for registration. Indeed, everytime I check out the slack channel of new registrations I am like "but why" for half of them. If there are no quality criteria to register then of course that's what you would see. So the point would be to make it harder, not easier, for packages to autoregister.

I think this gets to the crux of the issue. Why should it be harder? Why should there be a higher threshold? Remember, having to be registered is transitive, so it means when I have a registered package, all the dependencies have to be registered too. Early version registered code could be rough code extracted from a mature package. I don't see a good reason apart from a misplaced need to control or a notion that package registration should act as some kind of stamp of approval. In effect, it acts to make people's life harder by adding busywork, and slows down code reuse.

@tecosaur
Copy link
Contributor

tecosaur commented Oct 4, 2024

There are two response I have to this, and I feel a need to explicitly state this because they are rather separate issues that I think are in danger of being conflated.

Firstly, I don't see how setting version to 1.0 for auto-merge is a high bar. It's a single-line change + re-trigger the register bot, or just a matter of saying "hey, can I get manual approval?". I don't see how this could be seen as setting a high bar. If this is truly so onerous that it pushes out potential packages, I really question whether we could reasonably hope those packages would be maintained at all in the first place.

There is value in having three channels of information in semver separately to the difficulty of registering, as has been articulated in earlier comments. To me, the crux of the issue is whether we want to push packages to have three channels of semver (breaking, feature, buxfix) or just two (breaking, bugfix).


Second, and separately, I think there should be a higher bar for the registry. In my mind, registering packages should be more than just bundling some related code and throwing it out into the world.* In my mind, a good registry is defined by having a broad collection of high-quality packages.

What exactly are "high-quality" packages? That's a complex question, but I think we can assert some general requirements:

  • Has a clear name, not prone to causing confusion
  • Has a clearly defined scope
  • Is able to be loaded without issue
  • Has tests, ideally with good test coverage
  • Is type-stable, well inferred
  • Does not have an excessive number of dependencies
  • Does not have needlessly narrow dependency compat bounds
  • Has documentation, ideally:
    • Has an overview/quickstart
    • Has a tutorial
    • Has a reference
    • Has a guide
    • Has diagrams, where sensible
  • Has a defined API
  • Has docstrings for all public API, ideally all functions/variables (including internals) and the module
  • Has an (informative) readme
  • Uses clear variable/function naming, in accordance with Julia naming conventions
  • Will be actively maintained

Does every high-quality package meet all of those requirements? No, there's always nuance and exceptions. However, I think it's worth encouraging actual packages to aspire to meet as many of these as possible, as early as possible.

* In 1.11+, we have project [sources] for this.


I'd like to re-emphasise that I only see a small overlap between these points (we've veering into a much broader discussion).

I'd advocate for considering whether we want to set a soft requirement for 1.0 on the semver merits specifically, and hold a conversation about package quality and the bar for registration in another issue (or elsewhere).

TLDR; It's a low bar and not even about the bar for entry, but I think we should have a higher bar anyway.

@Datseris
Copy link
Contributor

Datseris commented Oct 4, 2024

I don't see a good reason apart from a misplaced need to control or a notion that package registration should act as some kind of stamp of approval.

Interestinlgy, this semester I just started my first course as lecturer, which of course I would do with Julia. It was a big problem that Julia does not have any sorts of stamps of approval, like MATLAB's centralized toolbox or Python's Anaconda. They are installable instantly across my whole campus without anyone asking any questions. With Julia? Your whole package installation must get erased as soon as you log out because there is no "stamp of approval" that my IT will accept. Needless to say this is a bad experience, so maybe these stamps of approval aren't as useless as you suggest. As long as there is no community request for stamps of approval, we will never have one. And I see the suggestion of this issue as a good step forwards.

@tecosaur put the situation perfectly in their reply so I don't have much to add besides requesting some sort of argument of how the suggestion here is making life so much harder. Is because you would have to change Pkg.add("Name") to Pkg.add(url = x)? I don't buy this as such a big downgrade to anyone's life.

@goerz
Copy link
Member

goerz commented Oct 8, 2024

I am confused by this statement. I think you have a different idea of what "1.0" means than I do.

Pre-1.0 and post-1.0 represent different stages of development. Pre-1.0, backwards compatibility is not a concern. You would accept PRs if they improve the API of the package in the absolute. Post-1.0, backwards compatibility is a concern. So it would be common to reject PRs or close issues because their benefit does not outweigh the cost of a breaking change.

Of course, people can disagree with how strongly they value API stability. I'm in the more conservative camp, where you would avoid breaking changes unless there is a very compelling reason for the new functionality and there is no way to implement it in a backwards-compatible way. That might mean delaying features for a future "2.0 milestone". If you value stability less, you might end up going from 1.0 to 8.0 in the span of a year. Fine by SemVer, but I might decide your package is more trouble than it's worth to use as a dependency. In any case, backwards compatibility should be part of the conversation post-1.0. Conversely, if you're pre-1.0, but you are worrying about backwards compatibility (because you have many dependents), then you really should tag 1.0. That's what the guidelines in the SemVer spec are about.

So the question isn't whether you have any breaking changes planned. The question is at what level you're going to evaluate issues and PRs. I would say that when you first open up a package to public use, you should be permissive in how you deal with new PRs. But of course, it's up to you.

Beyond that, my issue with pushing for 1.0 on new registrations is that it comes with additional requirements. For v0.1, my bar is more or less

  • Must have a name that is appropriate for the scope of the package
  • Must have some non-trivial functionality (no "placeholder" packages)
  • Must have a README that explains the purpose of the package and gives a basic usage example (or a link to documentation that is sufficient to figure out how to get started)

With 1.0, in addition to that,

  • Must define the stable API. That generally means complete documentation.
  • Must have tests with reasonable coverage (on the order of 80%). Otherwise, there's no way to know if the package actually implements its API correctly
  • Should somewhat reasonably fill the described scope of the package. No major gaps in functionality.

If we push for packages to register at 1.0, we should also have tooling in place to check these conditions, which is pretty hard. If people just use 1.0 as an initial version to get around the bot, but without actually being ready for 1.0, I think we're doing a disservice to the community and to the spirit of SemVer. Also, it denies people the pre-1.0 development stage, which, again, I think is an important part of a project's life cycle.

Of course, I'm very happy if people register packages that are mature enough for 1.0. I just wouldn't expect it from the majority of registrations. And I'll certainly review packages that register with an initial version 1.0 and potentially leave a blocking comment to the effect of "this doesn't seem like a v1.0 quite yet".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants