Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide on policy for choosing package versions when solving dependencies #8021

Closed
gridbugs opened this issue Jun 21, 2023 · 15 comments
Closed

Comments

@gridbugs
Copy link
Collaborator

The two competing proposals for this are:

  • prefer_newest: the latest mutually-compatible versions of dependencies are chosen
  • prefer_oldest: the oldest mutually-compatible versions of dependencies are chosen (this is sometimes called MVS)

The solver in opam currently uses prefer_newest. The opam_0install package which we use to solve dependencies can be configured to use either policy. At the time of writing this dune is using prefer_oldest but this was chosen on a whim (by myself) and without proper discussion. I'm creating this issue so we can properly discus the issue and decide which policy we will use in dune.

In addition to choosing a policy we also need to decide whether to make the policy configurable by users.

Brief description of high-level trade offs

With prefer_newest, packages that don't specify an upper bound on their dependencies will automatically depend on latest versions of their dependencies. This means that package maintainers don't need to explicitly update the dependencies in their package manifests when new versions of deps are released, but it also means that releasing a new version of the package requires testing that the change doesn't break any reverse dependencies of the package and possibly patching or adding upper bounds to packages that would otherwise be broken. On the other hand prefer_oldest would avoid situations where an update to a package would break reverse dependencies, but it would require package maintainers to explicitly update their dependencies to track the latest versions.

Some things to consider:

  • opam will probably continue to use prefer_newest. Will it harm the package ecosystem if dune uses a different policy from opam? We really want to avoid splitting the ecosystem.
  • how will this affect the maintenance burden on package maintainers and the maintainers of the opam-repository, and how does the burden scale as the number of packages grows?
  • most package manifests are written with prefer_newest in mind and so there's little motivation to correctly set lower bounds on their dependencies. This will have to be fixed in order to correctly solve these packages with prefer_oldest.
@tmattio
Copy link
Collaborator

tmattio commented Jun 21, 2023

Thanks a lot for opening the issue @gridbugs! To share more on the discussion we had offline, my main concern is that changing from resolving the newest version to the oldest is that we would incentivise developers to add unnecessary lower bounds to their packages (for instance to benefit from a performance improvement that is not strictly necessary). I am worried about the impact this will have on opam-repository: we're currently optimising for having the largest subset of co-installable packages as possible. If developers start adding stricter lower bounds, will we end up making it harder for opam-repository users to find solutions that respect their constraints?

@rgrinberg
Copy link
Member

Thanks for the summary. It looks broadly correct to me.

I think the conflict can be summarized by two different sides:

  • Those who prefer the constraints to be optimistic
  • Those who prefer the constraints to be pessimistic

Not one side is clearly better than the other, but only one can be default. My argument for making pessimism the default is simple: it leads to far less build failures. My preference would be optimize for this quality above all the others.

my main concern is that changing from resolving the newest version to the oldest is that we would incentivise developers to add unnecessary lower bounds to their packages (for instance to benefit from a performance improvement that is not strictly necessary)

IMO this bump in the lower bounds would be warranted. If a developer switched to testing against a newer version, then how could the developer guarantee that the package is still working against older versions?

I am worried about the impact this will have on opam-repository: we're currently optimising for having the largest subset of co-installable packages as possible. If developers start adding stricter lower bounds, will we end up making it harder for opam-repository users to find solutions that respect their constraints?

We will be reducing the amount of correct build plans that respect their constraints. But we will also be reducing the amount of incorrect build plans that respect their constraints. The failure modes are also different:

  • Not finding a correct build plan will require the user to relax lower bounds until they solve for a working build plan
  • Finding an incorrect build plan will require the user to add missing upper bounds until they solve for a working build plan

I quite like the first option more because it's easier to adjust constraints until you get a failing build than it is to adjust constraints until you get a working build.

I have to say though, as a user, I don't much value in having very wide build plans and I'm with bumping things along if my dependencies require me to. I realize others have different preferences.

As a side note, before submitting a package to opam one should be able to test their package against both build plans. Both types are errors going to cause grief for end users so it's always important to at least test both no matter what the default is.

@gridbugs
Copy link
Collaborator Author

This (#8020 (comment)) comment from @Leonidas-from-XIV is relevant:

I am not sure we need to have this configurable, it is well possible that by people having it configured in different ways, we reach a point where neither side has any benefits, since the upsides of MVS namely that newer packages don't break older ones don't actually materialize until people use it in a significantly large set of packages.

@avsm
Copy link
Member

avsm commented Jun 22, 2023

Deviating from opam's default solving strategy is a huge user-observable change. I really think this needs to be configurable, since at a minimum there needs to be a way to make the opam solver behaviour and the dune locking behaviour as similar as possible.

In fact, for the first outing of this feature, doesn't it make most sense to match the opam behaviour but make oldest a configurable option, and subsequently change the default in a future dune-version if there is a problem?

@rgrinberg
Copy link
Member

I am not sure we need to have this configurable, it is well possible that by people having it configured in different ways, we reach a point where neither side has any benefits, since the upsides of MVS namely that newer packages don't break older ones don't actually materialize until people use it in a significantly large set of packages.

I can't see how we can avoid having this configurable. Without configuration, it would it make it impossible to test either the lower or upper bounds before submitting something. Did I misunderstand this point?

Deviating from opam's default solving strategy is a huge user-observable change. I really think this needs to be configurable, since at a minimum there needs to be a way to make the opam solver behaviour and the dune locking behaviour as similar as possible.

I'm for having it configurable as well. But it is not an explicit goal to avoid deviating from opam. We've already deviating from opam in many and arguably more significant ways.

In fact, for the first outing of this feature, doesn't it make most sense to match the opam behaviour but make oldest a configurable option, and subsequently change the default in a future dune-version if there is a problem?

Making the oldest package is what I'm leaning for towards as well. Because it sounds like this behavior will be less surprising to users. But note that neither modes will actually match opam's behavior. Opam uses mccs as the solver and it wouldn't be feasible for us to copy that behavior.

@avsm
Copy link
Member

avsm commented Jun 22, 2023

I don't understand the overall workflow well enough.

it would require package maintainers to explicitly update their dependencies to track the latest versions.

If oldest is the default, then how do I update to the latest versions of packages? Do I need to do this step-by-step with each package?

And similarly, if I get the oldest release when asking for a new package to be added, wouldn't that just be really...old...and not have the latest interfaces? Technically, an ancient version of Cohttp is still available in opam-repo, but why would I want that pulled in for a new package?

releasing a new version of the package requires testing that the change doesn't break any reverse dependencies of the package and possibly patching or adding upper bounds to packages that would otherwise be broken.

This work has to be done anyway to get it into opam-repository, and is by design, since it gives the maintainers of those packages that are broken a chance to fix them, and generally keep the ecosystem moving forward. What I don't understand is: how does this choice help with dune package management? You have to release packages into opam-repository for them to be available to dune, right?

All in all, I'm surprised to see old even get a look in the door as a plausible default. It seems a poor choice when adding new packages, has to be overridden when updating packages to the new strategy, and I'm not sure what the reverse dependency problem mentioned in the original ticket actually is.

@samoht
Copy link
Member

samoht commented Jun 22, 2023

Let's try to keep as close as possible to opam default here. There are already enough aspects of the dune package management proposal that users will complain about; let's not add more potential for contentious discussions. Our goal here is adoption. Let's avoid creating any friction/noise around stuff which are not critical to the core proposal :-)

So I suggest staying as close as what the OCaml ecosystem has been used to in the last decade. And open a discussion with the wider community if you think that stuff that people are used to isn't optimal and can be improved (but not only for Dune).

@Alizter
Copy link
Collaborator

Alizter commented Jun 22, 2023

There are already enough aspects of the dune package management proposal that users will complain about

Could you expand on this?

@samoht
Copy link
Member

samoht commented Jun 22, 2023

Could you expand on this?

What I mean is that:

  • not everyone in the community uses (or will be using) Dune -- they'll tell you dune is too "alien", with a weird configuration language, weird default choice (like a new set of compiler warnings by default), a weird compilation model (with fake namespace), etc. let's not add another reason to discuss. It's ok to change/experiment with better default temporarily, but that should be closely followed by upstreaming those at the right place (and I feel we haven't always done this correctly in the past)
  • the dune package management is a HUGE feature that many (including me) have been pushing and looking for for a long time. It's also very new and very different from what Dune has been doing until now. There are many design decision core do Dune to make - let's focus on those instead of trying to change how the community does things. These are two different (big) problems. I would advise to not couple them to reduce risk.

@rgrinberg
Copy link
Member

And similarly, if I get the oldest release when asking for a new package to be added, wouldn't that just be really...old...and not have the latest interfaces? Technically, an ancient version of Cohttp is still available in opam-repo, but why would I want that pulled in for a new package?

We could just add the new package as a dependency with the appropriate lower bound (the latest version) on it. Otherwise, a naked dependency like cohttp implies that you support all versions of cohttp. That would be wildly optimistic.

This work has to be done anyway to get it into opam-repository, and is by design, since it gives the maintainers of those packages that are broken a chance to fix them, and generally keep the ecosystem moving forward. What I don't understand is: how does this choice help with dune package management? You have to release packages into opam-repository for them to be available to dune, right?

I imagine that this policy is particularly helpful for those that don't plan to publish to opam-repository (closed source, hobby projects, simple apps). If the user sticks to lower bounds, the user will tend to encounter less build errors between dependencies. For example, if I add a dependency on cohttp>=1.0.0 and regenerate my lock file, I will be far less likely to break something because the versions of all unrelated packages will remain the same. It's somewhat similar to opam's upgrade solver which minimize installs/reinstalls except its easy to understand.

Now for those users who are publishing to opam-repository. I don't think they should have the luxury of just testing one build plan. I would expect them to test their lower bounds and against a recently produced build plan to make sure the upper bounds are all correct. So I think this point is largely irrelevant to them.

However I see that this issue is much controversial than I expected. So I'm happy to defer this discussion to another time. I'm OK with having to configure dune to use MVS.

@gridbugs
Copy link
Collaborator Author

gridbugs commented Jun 22, 2023

Thanks for the discussion! We just discussed this further in the build system meeting at Tarides and reached the conclusion of defaulting to prefer_newest as it is most similar to what users will expect coming from opam and also the paradigm under which most package manifests in opam-repository are written. We'll add a flag to use prefer_oldest so users can test with both policies. If we decide prefer_oldest is something we really want we can have a bigger conversation on the discuss forum as we'd want to coordinate with the whole community.

@Leonidas-from-XIV
Copy link
Collaborator

While the issue is closed and we agreed to prefer newer versions I do want to write down my thoughts on this too.

not everyone in the community uses (or will be using) Dune -- they'll tell you dune is too "alien", with a weird configuration language, weird default choice (like a new set of compiler warnings by default), a weird compilation model (with fake namespace), etc. let's not add another reason to discuss.

Yes, but there will always be those people. I don't think that rejecting potentially good ideas because someone will dislike the approach is a good reason to avoid a proposal with a lot of potential benefits. Especially while those that want to are free to stay with their current approach. There is a vocal minority of dune-rejectors, but for the majority of users dune provides tangible upsides, and rethinking packaging is an opportunity to solve existing pain points.

And yes, some of the things that are too "alien" are indeed issues that we could try to tackle going forward, not saying all the criticism is unwarranted.

we're currently optimising for having the largest subset of co-installable packages as possible. If developers start adding stricter lower bounds, will we end up making it harder for opam-repository users to find solutions that respect their constraints?

Is this optimization strategy useful? For me as a commercial developer the biggest pain point was that every now and then my existing project broke because some dependency changed thus breaking CI (while I wasn't affected, since the packages on my local switch didn't change). In these cases one could pin the opam-repository, but that basically means that I am building an ad-hoc lockfile which has the added disadvantage that to get a new version I'd need to re-pin the opam-repository. In fact if I remember correctly ocaml-ci seems to implement a form of MVS, by looking at the dependencies and pinning the lowest version of the opam-repo that can satisfy those (which helps a lot with caching, too).

The other issue is that creating the biggest coinstallable set is very labor intense as it puts the duty of keeping the set large both on the package submitters as well as opam-maintainers that will have to deal with adjusting package dependencies basically forever, as new packages get submitted.

If oldest is the default, then how do I update to the latest versions of packages? Do I need to do this step-by-step with each package?

Why would you want to update all packages at once? If you need a feature that has been introduced in e.g. Lwt 5.3, you'd update your dependency to that, ask dune to install and you would get 5.3, while the rest of the packages stay the same. Currently what happens is that you get not only a newer Lwt but also a newer Fmt with fresh deprecation warnings that you now have to fix, creating unneccessary churn and work.

Finally, what I would like to point out that whatever we choose is not necessary to be kept into all infinity, we can still change if it turns out that some aspect is creating some hard to change issues down the road. The feature is not yet meant for mainstream usage so changes can still be done while we're in a kind of "early access" phase.

@avsm
Copy link
Member

avsm commented Jun 22, 2023

I wasn't looking to be controversial, just to understand exactly what is being proposed. @rgrinberg 's explanation went a long way to convince me of why old might be useful;

I imagine that this policy is particularly helpful for those that don't plan to publish to opam-repository (closed source, hobby projects, simple apps). If the user sticks to lower bounds, the user will tend to encounter less build errors between dependencies. For example, if I add a dependency on cohttp>=1.0.0 and regenerate my lock file, I will be far less likely to break something because the versions of all unrelated packages will remain the same. It's somewhat similar to opam's upgrade solver which minimize installs/reinstalls except its easy to understand.

Now for those users who are publishing to opam-repository. I don't think they should have the luxury of just testing one build plan. I would expect them to test their lower bounds and against a recently produced build plan to make sure the upper bounds are all correct. So I think this point is largely irrelevant to them.

In other words, there are three basic operations which need to be supported:

  • install (add a new package)
  • upgrade (update a subset of packages, keep the rest unmodified)
  • upgrade all (upgrade all packages to latest possible versions)

These are all supported by opam today, and are a blend of old/new as appropriate (which seems to be what @rgrinberg is also saying above).

@Leonidas-from-XIV wrote:

Why would you want to update all packages at once? If you need a feature that has been introduced in e.g. Lwt 5.3, you'd update your dependency to that, ask dune to install and you would get 5.3, while the rest of the packages stay the same.

Ever since the dawn of opam time, I update all packages at once and then pin back breakages (which are the exception rather than the rule) with an upper bound. Conversely, how do you know what could be updated if you don't know what's potentially upgradeable? Do you have a workflow in mind for users who currently update everything?

gridbugs added a commit to gridbugs/dune that referenced this issue Jun 22, 2023
Based on the discussion at ocaml#8021,
dune will prefer the newest versions of packages when solving
dependencies. This policy can be configured by a command line argument
to `dune pkg lock` and by a field of each context in dune-workspace.

This change includes some formatting changes to the messages printed
when solving dependencies which were necessary to handle the fact that
different build contexts can now have different package solutions.

Signed-off-by: Stephen Sherratt <stephen@sherra.tt>
@rgrinberg
Copy link
Member

. Conversely, how do you know what could be updated if you don't know what's potentially upgradeable? Do you have a workflow in mind for users who currently update everything?

Yes. At least originally, my idea was to have a dune pkg outdated [--transitive] command that would do exactly that. This command would only print which dependencies have a newer version. It would not necessarily update anything.

gridbugs added a commit to gridbugs/dune that referenced this issue Jun 23, 2023
Based on the discussion at ocaml#8021,
dune will prefer the newest versions of packages when solving
dependencies. This policy can be configured by a command line argument
to `dune pkg lock` and by a field of each context in dune-workspace.

This change includes some formatting changes to the messages printed
when solving dependencies which were necessary to handle the fact that
different build contexts can now have different package solutions.

Signed-off-by: Stephen Sherratt <stephen@sherra.tt>
@Leonidas-from-XIV
Copy link
Collaborator

(I am a bit wary of continuing this discussion as we have already decided to prefer newest versions, so my response is somewhat in the category of bikeshedding)

Conversely, how do you know what could be updated if you don't know what's potentially upgradeable?

I would like to step back and ask why we want newer versions of packages. Updating a package to me makes sense in two cases:

  1. There is a feature in a newer version that I would like to use, in such case I am aware of it and adjust the lower bound accordingly.
  2. There is a security issue and the library needs to be updated. For this we don't have a good solution when preferring oldest versions (apart from maybe marking the previous release with avoid-version), however I rarely see fix only point releases (the only packages having release branches I can think of are OCaml, Jane Street packages and Dune), it seems more common to just do fixes to the main branch and release such fixes as part of regular releases that might or might not break compatibility.

Updating all packages at once is not reasonable context, unless all your packages have security issues or you need to use new features from all of them. However, I might be missing some other case: I was initially also thinking that preferring newest packages is the thing to do, but upon consideration I could not point my finger at exactly why except for the fact that we (in the broad software developer sense, not even OCaml specific) have traditionally always done so.

opam-monorepo has an interesting optional feature in that regard (possibly requested by Mirage folks), where it you can update a dependency and it will prefer the currently locked versions of all other libraries if possible: tarides/opam-monorepo#305 thus minimizing the amount of changes to all dependencies.

On the other hand the main downside of preferring the lower bound could be that code could potentially ossify, with old versions of dependencies becoming the de-facto versions forever (e.g. Yojson 1.x over Yojson 2.x) and deeming any breaking API changes not worth the upgrade. I have to agree with Rudi: no matter what we do, we will have to test both lower and higher bounds, otherwise they will not be accurate. Currently the lower bounds often aren't since package authors don't test them and the upper bounds keep changing as we add new package versions to opam-repository.

rgrinberg pushed a commit that referenced this issue Jun 23, 2023
Based on the discussion at #8021,
dune will prefer the newest versions of packages when solving
dependencies. This policy can be configured by a command line argument
to `dune pkg lock` and by a field of each context in dune-workspace.

This change includes some formatting changes to the messages printed
when solving dependencies which were necessary to handle the fact that
different build contexts can now have different package solutions.

Signed-off-by: Stephen Sherratt <stephen@sherra.tt>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants