Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for "soft" or suggested constraints #7051

Open
2 tasks done
potiuk opened this issue Nov 17, 2022 · 27 comments
Open
2 tasks done

Support for "soft" or suggested constraints #7051

potiuk opened this issue Nov 17, 2022 · 27 comments
Labels
kind/feature Feature requests/implementations status/triage This issue needs to be triaged

Comments

@potiuk
Copy link

potiuk commented Nov 17, 2022

Feature proposed: constraints

It would be great is Poetry supports the fantastic feature that pip has: constraints.

Constraints are extremely useful for more complex applications that have many extras - thus optional dependencies and transitive dependencies as well. It is a great tool to provide reproducible installs of Python applications, without imposing strict pinning of dependencies and allowing the users of applications to manually upgrade and downgrade dependencies of the main "application" installed, even if they are relased after the main application has been released.

Short summary of how constraints work

When installing an application in pip user can specify --constraint flag with specification of constraints to use (in the same form as requirements - local file, http URL etc.). The constraints specified this way should be "pinned" versions (i.e ==VERSION only") and they change package resolution in the way that the only the version specified for the package is considered during dependency resolution.

Constraints are not "requirements" - if the user does not install specific requirement (for example because it is part of an optional extra), the package will not be installed even if it's specific version is specified in the constraint file. Also constraints are exclusively used to perform resolution when the "installation" process resolve packages, and they are immediately forgotten once this particular "install" command completes. This allows to manually upgrade any of the packages that were "pinned" by constraints as long as it is within "requirements" specified by other packages.

Why it is useful

It is useful to get reproducible installs of applications (not libraries) without limiting security upgrades (and non-security upgrades as well).

It allows for fully reproducible, yet secure "from the scratch" Python application installs (web apps, CLI apps - generally apps that are supposed to provide user-facing features rather than libraries for other apps) without pinning specific version of dependencies in "hard" way. Fully reproducible install means that no matter if you install it today or few years from now, the application should install correctly - no matter if direect or transitive dependencies released new versions.

Typically applications might pin their dependencies to specific version and this is how you typically approach "applications" (as opposed to libraries that typically have "open" dependencies) . If you want to have truly "reproducible" install, you need to pin all your dependencies this way (including transitive ones), because otherwise transitive dependencies might break your "from the scratch" install - impacting the "first contact" with your application.

However there is a drawback of that - because when you pin dependencies, user cannot - independently - upgrade any of the dependencies that are pinned - and if those dependencies release even a small security fix, the main applicaiton must be upgraded to take into account. This is a limitation of pinning. It means that user who wants to upgrade security fix must wait for main application to release new version. In the world where supply chain attacks are a thing, and where security becomes more and more important, giving the user option to upgrade independently dependencies after the fact of installing an application is crucial.

Constraints nicely allow to make "reproducible installs" while keeping the possibility of "security updates" for any dependencies.

Another consequences of using constraints is that it also allows the user of application to perform non-security updates for the dependencies, which is important in cases like Airflow, where Airlfow is not only application to run, but also is a platform which provides library for Python developers (in Airflow DAG Authors develop workflows as Python code and they often want to be able to upgrade libraries installed by Airflow).

Lack of support for constraints is the reason why Airlfow discourages usage of poetry (even though we woudl love to be able to get their users to use poetry).

Example

Apache Airlfow is heavily depending on constraints - they developed a mechanism to automatically upgrade their constraint files based on result of automated tests and the only "recommended" way of installing airflow is via pip with constraints. Lack of constraint support is the only reason why poetry is discouraged: https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html

The constraints of airflow are maintained automatically, tagged together with each version of Airflow - main version here: https://github.com/apache/airflow/tree/constraints-main.

Yes, Airlfow is a bit special case with almost 700 dependencies and ~100 extras which is a bit extreme, but many otther applications could benefit from that approach. Constraint mechanism in Airlfow is used for more than 3 years and it helped Airflow maintainers in numerous cases where 3rd-party dependencies would have broken Airlfow "clean install" for already relased historical packages, while allowing their users to upgrade many dependencies as needed. More elaboration of why it is needed and what problems it solved for Airflow are explained in this talk from PyWaw # 98 "Managing Python dependencies at scale" https://www.youtube.com/watch?v=mlOkkTuucSk

Alternatives

As @dimbleby mentioned in #7047 (comment) the #3225 closing reason was that it is the same as lock file.

While lock files are "almost" constraints, there is one difference, Lock files are development feature for someone who develops the application and once the package is uploaded to PyPI, the lock file remains only in the source code of the application. On the other hand, constraints are user facing feature. Users should be able to install an application from pypi (or another compliant repository) and apply constraints (for example taken from a published .lock file or file following requirements.txt format) as a single installation command similar to what Airlfow installation does with pip:

pip install "apache-airflow[celery]==2.4.3" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.4.3/constraints-3.7.txt"

The above is just an example, there might be other conventions used by other applications.

The convention should allow for different set constraints per Python MAJOR.MINOR verison (necessarily for big applications, set of dependencies slightly differ between Python 3.7, 3.8, etc. It would also be great to add other options - such as architecture (ARM/x86) but this is not as important as Python versions. This could be done per convention of the location of the file, or in case of poetry.lock it might be defined using poetry.lock features.

It should be possible to use lock files (and ideally also format compatible with requirements.txt) as "constraints" while installing a package by the user from PyPI, without having sources, pyproject.toml nor without having to copy poetry.lock manually to the local folder. In this sense, poetry.lock is not far from constraints, what is lacking is support for single-line installation, where constraints are specified as a remote URL to pull automatically and use durig installation by the end user from PyPI or another registry. In fact using and publishing poetry.lock as the "constraint" file to be used could be one of the main use-cases for poetry-managed applications.

Ideal properties of the constraints feature

  • should be possible to run poetry install NNN --constraint http://..... to apply constraints remotely
  • it should support both poetry.lock and traditional "requirements.txt" format to ease interoperabiliyt with pip
  • ideally there shoudl be a tool that could allow conversion between the lock and requirement.txt

If there is a consensus among poetry maintainers that this is a worthy feature, I am happy to help in both design and implementation of this. I have vast experience in managing dependencies in Airlfow with using constraints (I am the original author of the approach Airlfow uses and I evolved and maintained it over last 3 years or so).

  • I have searched the issues of this repo and believe that this is not a duplicate.

Actually there was a duplicate #3225 but it has been closed, but in the dicusson in #7047 I decided to open it againt.

  • I have searched the FAQ and general documentation and believe that my question is not already covered.
@potiuk potiuk added kind/feature Feature requests/implementations status/triage This issue needs to be triaged labels Nov 17, 2022
@neersighted neersighted changed the title Support for constraints Support for "soft" or suggested constraints Nov 17, 2022
@neersighted
Copy link
Member

I edited the title for you -- it's important to make clear you want pip style soft/suggested constraints (aka the --constraint flag), as we very much do support "constraints" in the default sense (constraining dependencies, also known as "requirements" in Python packaging).

No comments on your proposal yet as I don't have time to read it, but I think you should make sure to think about the design holistically, if you haven't already. How does this interact with Poetry as a PEP 517 backend (e.g. packages installed from a sdist or using pip install .)? How does this affect wheels built with Poetry? Should this functionality live in Poetry itself, or in a plugin like poetry export?

Do we need a custom format? It this metadata that is part of the lock file, or part of the project file, or external (in which case a plugin probably does make more sense). Should we capture a remote file reference in the project format?

The reason the old proposal was closed was mostly because it didn't answer any of those questions, and introduced what was essentially a parallel lock file without addressing the downsides of a partial solution. To have something usable, it needs to clearly benefit a large number of users without harming existing users or maintainers, and it needs to not have surprising sharp-edged interactions with the wider Python ecosystem/Poetry's interaction with said ecosystem over stable, defined interfaces (like PEP 517).

@potiuk
Copy link
Author

potiuk commented Nov 17, 2022

Good point - thanks for clarifying the name, naming is a key here and indeed the more you use some names in one context, the less you realise it might be differently understood in other.

Yes. I was actually thinking (eventually) about making it a more hollistic approach. Even in a form of PEP propoosal on how it could be made part of the PyPI and generic packaging format - so that you could publish such constraints as part of the publishing process in PyPI. However I think, before we approach a standardisation/pep we should have some validation if my concerns/problems we have in Airflow are also enough of a problems for others and whether we can get it validated with tools like poetry as well and then we could base standardisation efforts on experienced from those two tools - and also confirmed by other projects that might use the feature,

Re: PEP 517 - think it is largely orthogonal to my proposl. I think of "soft constraints" more as an end-user feature, something that users of published packages should do, rather than developers. I do not necessarily see the need of such "soft" constraints for the development purpose other than what is already there (at least not now). Whatever Poetry and other tools (like pip-tools) provide with lock files solves some of the problems already well, you can also do some custom stuff with pip but the end-user problem I described is something different.

I believe this is entirely new PEP and would love te explore an option of leading it together with pip and poetry maintainers eventually - but I think it is a bit too early to propose standards. It would be great if each tool leverages their own mechanisms (poetry lock for one) and let the developers use them and share with their users and then possibly we can think how we can turn it into PEP-worthy proposal getting experiences from at least pip and poetry world (and I am more than happy to make Airflow a playground for poetry's approach here).

@r-richmond
Copy link

r-richmond commented Nov 17, 2022

I may be off base here but I think the following would address the concerns and be a minimal change with significant value for airflow and other python libraries as well.

Add a --constraints-if-dependent-upon option to the poetry lock command. This option would accept a remote or local path that would limit dependency packages to specific versions or ranges if those packages are required based on the contents of the pyproject.toml tool.poetry config.

The difference between lock files & these constraints is that constraints would only apply if a given package was a dependency or transitive dependency from the poetry.tool config pyproject.toml . This differs from the poetry.lock file in that users won't install all the packages listed in the constraints file.

This would allow end users to use their pyproject.toml to generate a poetry.lock install of airflow/other-library that contained only supported versions of packages that they requested but also any other additional libraries/packages that they specify in their pyproject.toml

Example Workflow / Details & Overview

My pyproject.toml

[tool.poetry]
...

[tool.poetry.dependencies]
python = "^3.10"
apache-airflow = { version = "*" }
apache-airflow-providers-google = { version = "*" }

Commands

poetry lock --constraints-if-dependent-upon https://raw.githubusercontent.com/apache/airflow/constraints-2-4/constraints-source-providers-3.10.txt

poetry install --sync

Results

  1. My poetry lock file would contain all the packages required for apache-airflow-providers-google & apache-airflow and be limited to the supported versions of all dependencies.
  2. My poetry lock file would not contain packages only required for other packages like apache-airflow-providers-amazon despite those packages being listed in the constraints-if-dependent-upon file since those packages are not dependencies of my pyproject.toml

Benefits

  1. This allows library authors to publish a known set of working package versions that they can officially support while leaving the setup.py requirements looser so in general the python package can be a good library and not have so many version conflicts.
  2. Potential debug situation. If a user reports a problem with a python library ~pandas the user could re-run poetry lock --constraints-if-dependent-upon url to revert packages used by pandas to the last version they were tested with. and then see if they still have the issue. If the issue goes away it would be a clear signal that something in panda's dependencies broke (i.e. colorama) and that pandas should pin its colorama dependency until a fix is applied

@potiuk
Copy link
Author

potiuk commented Nov 27, 2022

Yes. I think something like that would work @r-richmond. One of the ideas that I had here is that many users seem to be using poetry to not only "develop" their applications but also to describe and manage their installation environment. I think this is a very valid case:

  • you describe some dependencies you want
  • you have a mechanism to keep them updated whenever you want to upgrade
  • but you use .lock file to keep the "installation" set of dependencies that should be used - in your CI when you build an image for your deployment, when your team want to install the complex set of dependencies in a consistent way for your team for development etc.

I believef poetry maintainers have foreseen that as a viable use case and something they want to optimise for and adding some way of specifying the constraints when you run poetry upgrade would be, I think really useful for the users - for example when you keep such "environment" for Airflow 2.4.3 and you want to upgrade to 2.5 - you would like to use 2.5 constraints of airflow to generate the new .lock (or simply find out that some of your "pinned" requirements are conflicting with those new constraints that Airflow 2.5.0 has and be able to do something about it. For example:

  1. Update your other pinned constraints (manually or with poetry's help)
  2. Download the latest 2.4.0 constraints to a local storage/git etc. and modify it to update the constraints to other versions of the dependencies

Currently I think there is no such possibility (other than some manual manipulation) and that seems like pretty valid and useful workflow for those people who maintain Airflow installation (and similar cases).

I find it really appealing as a case that we could collectively - few people involved, discussing it, possibly also we could ask pip maintainers to be involved or part of it, for some kind of standardisation on PEP for file/URL formats that would be supporting similar cases not only for Poetry but also for other tools. That needs seems ot be generic enough for other packaging tools and should be brought to PyPA IMHO.

@neersighted (and other poetry maintainers) what do you think about such case ? Is @r-richmond and my explanation somewhat appealing to your way of thinking?

@neersighted
Copy link
Member

neersighted commented Nov 27, 2022

I ultimately think this does not belong in Poetry -- or rather, Poetry proper.

I've done some thinking about design here, and I strongly object to the idea of adding a new artifact type (e.g. a URL associated with a package with a constraints.txt) or a command-line flag (too inflexible, doesn't allow merging), especially on the lock command.

The lock file is (and for now, should remain; I don't think this is the issue to reconsider how we conceptualize/design it) a cached resolution; nothing more, nothing less. To begin to overload it as project metadata/not a semi-disposable/reproducible artifact is, I think, counterproductive.


Off of the top of my head, I can think a more generic design that satisfies this ask while not conflicting with Poetry's existing design/overloading the meaning of the lock file:

  • Implement a poetry-plugin-importer that reads requirements.txt (this has been a common user ask over the years) and constraints.txt.
  • Generalize groups to allow them to be used for 'main' dependencies as well, with a exported = true marker to make them named groups of 'main' dependencies (just like they are currently named groups of 'dev' dependencies).
  • poetry import --constraints would then update pyproject.toml with the constraints from a constraints.txt for any package found in pyproject.toml or present in a lock file (if it exists). It could be limited to work on a single group with a flag.

Poetry has in general taken the stance that if you care about versions, they belong in the pyproject.toml/are now a top-level dependency. This would be congruent with that, and you could avoid exporting direct dependencies by putting them in a development group.


I'm not married to this approach, but in general we want to keep project configuration in the pyproject.toml file, and keep a cached resolution in the lock file. Like I said above, I don't think overloading the meaning of the lock file is a productive direction, and would likely happen during a major bump, if it happened at all, as we'd need to significantly retool parts of the Poetry CLI to make the lock file less prone to unexpected changes.

@r-richmond
Copy link

r-richmond commented Nov 27, 2022

but you use .lock file to keep the "installation" set of dependencies that should be used - in your CI when you build an image for your deployment, when your team want to install the complex set of dependencies in a consistent way for your team for development etc.

potiuk, Yes that is how I use it today.

For reference: example of a CI/CD process utilizing Poetry / poetry.lock

Files tracked in git

  1. pyproject.toml
  2. poetry.lock
  3. upgrade_packages.sh

Snippet of pyproject.toml

[tool.poetry.dependencies]
# python version
python = ">=3.10,<3.11"
### Generic tools
pydantic = "*"
devtools = "*"
### airflow[jdbc]
# this is done because the api changes in newer versions of jpype
JPype1 = "==0.6.3"
jaydebeapi = "==1.1.1"
apache-airflow = {extras = ["async", "celery", "google_auth", "cncf.kubernetes",
  "password", "redis"], version = ">=2.4.3"}
apache-airflow-providers-amazon = "*"
# apache-airflow-providers-apache-hdfs = "*"
# apache-airflow-providers-apache-hive = "*"
# apache-airflow-providers-apache-spark = "*"
# apache-airflow-providers-apache-sqoop = "*"
apache-airflow-providers-celery = "*"
# installing this without leveldb causes an error on startup
apache-airflow-providers-google = {extras = ["leveldb"], version=">=8.5.0"}
apache-airflow-providers-http = "*"
apache-airflow-providers-jdbc = "*"
apache-airflow-providers-mysql = "*"
apache-airflow-providers-oracle = "*"
apache-airflow-providers-postgres = "*"
apache-airflow-providers-salesforce = "*"
apache-airflow-providers-sftp = "*"
apache-airflow-providers-slack = "*"
apache-airflow-providers-ssh = "*"
apache-airflow-providers-tableau = "*"
# used in airflow dag repo
sqlparse = "*"
dacite = "*"
# used for tableau operators
tableauserverclient = "*"
tableauhyperapi = "*"
pandas = "*"
# used to mess with parquet files
pyarrow = "*"
# used for sql linting
sqlfluff = "*"
# DBT exploratory stuff
dbt-core = {version = ">=1.3.0"}
dbt-bigquery = {version = ">=1.3.0"}
# GE exploratory stuff
airflow-provider-great-expectations = "*"
great_expectations = "*"
# useful retry functionality
tenacity = "*"


[tool.poetry.dev-dependencies]
pytest = "*"
mypy = {version = ">=0.981"}
black = "*"
flake8 = "*"
isort = "*"
pre-commit = "*"
types-futures = "*"
types-protobuf = "*"
types-pytz  = "*"
types-PyYAML  = "*"
types-requests  = "*"
types-urllib3  = "*"

upgrade_packages.sh

poetry lock && poetry export -f requirements.txt --output requirements.txt --without-hashes && poetry export --with dev -f requirements.txt --output requirements-dev.txt --without-hashes

Snippet of Run inside Dockerfile

pip install -U pip setuptools && \
pip install --no-cache-dir --no-deps -r /resources/requirements.txt
# note originally I used `poetry install` but `pip install` generated smaller image sizes :shrug:

Summary

  1. pyproject.toml stores the packages I want to install. If needed for specific internal use cases I adjust versions in this file example jaydbeapi, & mypy.
  2. When I want to upgrade the packages I manually run poetry lock via upgrade_packages.sh
  3. poetry.lock this file along with the generated requirements[-dev].txt files store the explicit versions of each package to install in my generated docker image.

Summary Continued

  1. Of note how I'm using poetry.lock here is at odds with neersighted's take however, I think that this usage of poetry.lock is inline with the official docs

The lock file is (and for now, should remain; I don't think this is the issue to reconsider how we conceptualize/design it) a cached resolution; nothing more, nothing less. To begin to overload it as project metadata/not a semi-disposable/reproducible artifact is, I think, counterproductive.

How this feature would help

  1. By passing a constraints file to poetry.lock I could restrict the resolved versions to known stable versions without being forced to maintain that version information inside of my pyproject.toml
  2. For large or complex projects such as airflow avoiding duplicating the maintenance of the known good versions inside the pyproject.toml would represent a significant time savings and reduce the surface area for typos/mistakes significantly. It is a bit of an extreme example but Airflows constraints file is over 600 lines long. This would would increase the length of my pyproject.toml by nearly 10x and greatly reduce its readability.

Poetry has in general taken the stance that if you care about versions, they belong in the pyproject.toml/are now a top-level dependency.

I think there is a middle ground here that is being missed in that users & python library maintainers care deeply about i.e. did you use known good versions? In fact in Poetry's own issue template you ask for the poetry version for triaging. The purpose of this (I think) is to help inform the maintainers wether or not the user is using a known good version. For poetry the buck stops there but for libraries that depend on other packages it gets more complicated (as we all know).

Thoughts on import suggestion

poetry import --constraints would then update pyproject.toml with the constraints from a constraints.txt for any package found in pyproject.toml or present in a lock file (if it exists). It could be limited to work on a single group with a flag.

This is an interesting directional pattern. However it makes the pyproject.toml semi-discardable. I.e. what happens when I want to upgrade the packages 5 months later? I'd have to re run poetry immport --constraints but what would the resolution be if pyproject.toml specifies version 1.1 and the constrains specify 1.2? would it over write pyproject.toml? would it error? Both choices have significant downsides IMO. I'd be inclined to disregard the pyproject.toml and start over and re-run import to get a clean version again.. which feels unintended/suboptimal. With the proposed solution I'd just run poetry lock --constraints-if-present {new file/ new url}

Closing thoughts

and keep a cached resolution in the lock file

  • We are in agreement here. the proposed solution does not want to change this.

Like I said above, I don't think overloading the meaning of the lock file is a productive direction, and would likely happen during a major bump, if it happened at all,

  • We don't want to change the meaning of the poetry.lock file at all.
  • However, as I noted above the poetry.lock file is more than just a cached resolution

To begin to overload it as project metadata/not a semi-disposable/reproducible artifact is, I think, counterproductive.

  • It bears repeating I think this take is in disagreement with the docs and benefits of poetry.lock
Full paragraph from docs that is relevant parts in bold

Committing this file to VC is important because it will cause anyone who sets up the project to use the exact same versions of the dependencies that you are using. Your CI server, production machines, other developers in your team, everything and everyone runs on the same dependencies, which mitigates the potential for bugs affecting only some parts of the deployments. Even if you develop alone, in six months when reinstalling the project you can feel confident the dependencies installed are still working even if your dependencies released many new versions since then. (See note below about using the update command.)

  • What we are looking for is a way to tell poetry about some extra constraints to consider while locking. Due to this use case (Complex python library) the only existing path (adding to pyproject.toml) has a lot of downsides (size, maintainability, upgrading). Personally I think adding a new option to poetry lock makes the most sense but I'm sure the maintainers are more familiar with Poetry than I am so perhaps providing the example above has provided more context & sparked some inspiration for the feature request? 🤞

@neersighted
Copy link
Member

poetry.lock being a cached resolution is not at odds with the additional goal/benefit of reproducibility. But poetry.lock is not intended to represent configuration; the docs do not suggest this, and I believe that adding more facilities to manipulate the lock file directly are taking this counterproductive direction.

Poetry's solver should come up with a working solution every time, assuming that the metadata it is fed (aka the metadata of packages in your dependency tree) is good and there are no regressions in the latest compatible version of a package. However, to promote reproducibility, the lock file exists and makes sure that runs of Poetry across platforms and versions use the same resolution every time.

So far, so good. However, if we start leaking configuration into the lock file (e.g. "I want to use versions in this range" -- where did they come from? Why did they change?), the lock file goes from being "a resolution that we know works and can be re-used" to "a vital project configuration file that should not be perturbed."

We in general try to avoid encouraging lock file ossification and encourage people to re-test updating their dependencies often. The lock file is intended to record a working solution, but not represent the only known-good way to install your project. Your proposal is a step in the wrong direction, in my opinion, for several reasons:

  • We have to teach Poetry proper to read the constraints.txt format, a non-standard (pip-specific/specified) format that is only loosely specified.
  • The lock file now represents the only known good way to install the project, and encourages ossification/a culture of fear regarding touching the lock file.
  • Automated tooling/the existing ecosystem of tools that work with Poetry do not know about/respect this data model for the lock file, and will cause conflict and confusion.
  • Likewise, if we're feeding in data to perturb the solver from an unknown/non-recorded source, reproducing the lock file becomes harder... What if someone runs poetry lock unknowningly?
  • We already have a mechanism for constraining versions that doesn't "leak" dependencies into the metadata of built packages/the PEP 517 interface... We can extend the existing development dependencies instead of overloading the meaning of the lock file.

And besides overloading the model of the lock file being conceptually unpure, my additional objection is that support for a relatively minor user ask sets us down a entirely different direction for Poetry development. If the lock file is vital project configuration that cannot be damaged, as opposed to a cached resolution format that we have great flexibility with, development of improvements to Poetry gets harder and the maintenance burden increases. If the lock file is not semi-disposable but instead needs to be carefully tended to/managed/curated, we have to grow an entire set of (additional) tooling to curate it over time, and our surface area for user-impacting changes grows quite a bit. Currently we consider the lock file an implementation detail, and that has benefited Poetry quite a bit.

@potiuk
Copy link
Author

potiuk commented Nov 28, 2022

I like the direciton of poetry import --constraints to convert a constraints file to something poetry would understand. But I am a bit baffled why we are talking about updaitng pyproject.toml.

I would love to understand some in-s/out-s, and maybe I will try to explain where I am coming from in my words.

Maybe poetry is not the best tool for the job and maybe it's completely off the radar for poetry what I am trying to achieve.

I believe most of poetry's behaviour, .lock file etc. is about building and extending a package. I.e. - developing a package and its dependencies, it's about the project that has own python code, main package, and it has a number of dependencies. It is being developed and updated by a developer and the developers might continue to update, freeze (lock) the dependencies and share the .lock file with other developers. And eventually produces a package that can (if needs be) distributed via PyPI - for example as a wheel file.

That's where local pyproject.toml file (describing your package) is. I believe pyproject.toml is all about describing dependencies of your own package you develop.

For me --constraint feature of pip (and the way we use and recommend - or even require it - in airflow) is purely a user feature. User who wants to install (released by us) airflow 2.4.3 + n other packages and make sure they are not conflicting with each other, and when they want to upgrade airflow to 2.5.0 + n the same packages, they just want to change the version and run a magical command and voila - it is installed with the right set of dependencies.

I am not trying to improve live of someone who has their own python code and develops it and build their own packages, but someone who consistently (now and few months from now - in the CI environment etc.) wants to pull and install a number of dependencies that consists of the "virtual environment" and wants to have an easy path to upgrade those in the future by purely specifying "I want to know have airlfow 2.5.0 installed - plus those other packages as well, go figure which are the best versions.

This is the problem I am after - purely user feature.

I understand (and please correct me if I am wrong) @r-richmond you are one of the airflow users who uses (or maybe mis-uses - I do not know poetry that well - maybe this is not at all intended use of it) poetry to manage such "user installation".

For me such a user is purely a consumer of packages, not a producer of those (and IMHO pyproject.toml is all about producing packages). None of the relevant PEPs- PEP 517 and PEP 518 nor PEP 621 nor even PEP 660. mention the "user" case - they are all about building packages.

I am talking about purely installing packages from PyPI - without having to build those packages locally - using the wheel files (let's assume we have all the binary wheels needed). My "improvement" proposal is that such packages distributed in PyPI as wheel files could also have an optional meta-data (in PyPI) describing the "known good set of constraints for repeatable installation". Imagine "main" application package. And whenever user wants to get a consistent installation with those "known good dependencies" he might want to run (for example) pip install airflow[extras]==2.3.4 --use-known-constraints package1 package2 and the golden set of constraints of airflow 2.3.4 for that Python version will be used to resolve dependencies not only airlfow but also for package1 and package1. I imagined similar usage for poetry.

So I am not sure why do we need to involve pyproject.toml in this case ? Is it just to get list of packages to install? Are we thinking of using it for something else (for me pyproject.toml is a development tool not something that users would like to keep just to tell "those are dependencies I want to install".

Or maybe I am completely wrong about this? Maybe poetry simply should not be used in the case I am descrinbed? Maybe it does not have an ambition to be used for that case and we should simply use pip install as we do now? Or maybe there is another usage of poetry for that case that I do not understand.

I would really love to know answer to that question, because I feel we are talking about somewhat different use-cases.

@r-richmond
Copy link

I understand (and please correct me if I am wrong) @r-richmond you are one of the airflow users who uses (or maybe mis-uses - I do not know poetry that well - maybe this is not at all intended use of it) poetry to manage such "user installation".

correct

But poetry.lock is not intended to represent configuration; the docs do not suggest this,

But it does represent a cached resolution; nothing more, nothing less. Which it still will if this feature is implemented.

and I believe that adding more facilities to manipulate the lock file directly are taking this counterproductive direction.

Likewise, if we're feeding in data to perturb the solver from an unknown/non-recorded source, reproducing the lock file becomes harder... What if someone runs poetry lock unknowningly?

These were good comments. You've convinced me that adding an option to poetry lock should be avoided. Instead you've convinced me that the better place for this information would be in pyproject.toml however it would be something more like an option passed as constraints-if-dependent = {url or path to local file} preferably or if that is untenable the entire specifications (600+ lines in airflow's case, but copy paste-able) under perhaps a [tool.poetry.constraints-if-dependent] header?

Note: this differs from your suggestion of using grouped dependencies since the packages in this section wouldn't be installed by default, they would only be installed if they were required by something in the dependencies and if installed the version would be constrained to a version(s) as specified in the corresponding constraint.

Lastly, I don't want to distract anymore from potiuk's questions so I'll hold off for a bit baring a side discussion or direct question.

@neersighted
Copy link
Member

@potiuk, if this was additional metadata in the core metadata baked into a distfile, I would have no objections. We could introduce a package = { version = "", use-suggested-constraints = true } declaration in pyproject.toml, and generate a useful/meaningful error when two packages (or your top-level constraints) conflict.

However, I was thinking more along the lines of this as an addition to Poetry without any changes to Python packaging/the core metadata specification itself -- in that case, we do have to figure out how to map this onto existing constructs/in a Poetry-specific way.

In any case, I believe this belongs in pyproject.toml. My suggestion for an import-based workflow is because of the following:

  • We have a mechanism for constraining versions without leaking them into the built metadata already.
  • As already discussed, poetry.lock is not requirements.txt and has a different semantic meaning I do not want to overload.
  • I've also mentioned how I/the maintainers generally object to introducing non-standard features (especially ones outside our control, e.g. constraints.txt) into Poetry proper.
  • I also don't think having a constraints-if-dependent = {url or path to local file} artifact is good design or something we can commit to maintaining and supporting in the long term.

To answer other questions about an import-based approach:

  • Yes, importing would clobber versions already specified in pyproject.toml. I don't see how that is anything less than expected -- you're asking Poetry to narrow the range it solves for, after all.
  • We currently have an optional = true marker in pyproject.toml that lets us specify constraints for versions that are not otherwise installed. We currently use this for extras, but could take that as the basis for a "constraint if present feature" (e.g. indirect = true and extra-only = true instead of overloading the existing poorly-named marker).
  • [tool.poetry.constraints-if-dependent] is more tenable, but at that point I think we're reinventing dependency specification poorly.

I guess my opinion is this: If we can come up with a way to standardize this for the ecosystem, we'd be happy to contribute to the PEP process and eventually implement it. But if we're doing this on a one-off bespoke basis for Poetry, the bar will be pretty high as this is a (relatively, I know OpenStack, Ansible, and Airflow are all major users) niche feature that has far-reaching implications.

I also think that I/some of the other maintainers need to do a docs pass for design/philosophy as much of our thinking/perspective/the intended design of Poetry lives in our heads; hopefully however, the stance on the lock file is starting to make sense? That is to say, lock file is an implementation detail for speed and reproducibility and not a primary mechanism for specifying versions, unlike pip-tools, which is a tool to build frozen requirements.

@potiuk
Copy link
Author

potiuk commented Nov 28, 2022

Very insighful, Thanks @neersighted .

This is a cool discussion, I've learned a ton from it,

I think @r-richmond's case is a bit different case that I had in mind). I think I started to see how pyproject.toml woudl fall into this picture. @r-richmond - do I understand correctly that you treat the pyproject.toml of yours not as a "package build description" but mainly as a "deployment description" where you mix the packages to install and your own Python code (for example DAG files?) - and then you use poeetry to keep it all in check (but you never have an intention to build a package of yours - just to have a bunch of files being developed and installed packages that they use (for example airflow core)? Is this more or less correct decription?

I guess my opinion is this: If we can come up with a way to standardize this for the ecosystem, we'd be happy to contribute to the PEP process and eventually implement it. But if we're doing this on a one-off bespoke basis for Poetry, the bar will be pretty high as this is a (relatively, I know OpenStack, Ansible, and Airflow are all major users) niche feature that has far-reaching implications.

Yeah. I think that it is a small missing capability of PyPA that could make it into a PEP. And I do not think particularly that "constraints" format from pip is the best one - some of the things that I've learned in Airlfow woudl make it less than suitable (for example being able to variant it per usage case and per python version) - currently we have 15(!) constraint files - so we can definitely do better than that.

Knowing how opinionated (in different directions at that) people who are part of the PyPA are, i think it might be hell of a journey to propose something, get it approved and eventually implemented.

But I would love to try in the coming months.

I have an upcoming talk ion Thursday at PyDataGlobal online https://global2022.pydata.org/cfp/talk/BPFCBT/ - "Managing Python Dependencies at Scale" - and I think I will try to end it up with a question - "Is it something we should try to make into PEP? Is it needed?". I hope i can get some feedback - and maybe I can involve some pip and poetry team members to be on-board and help with making it happen.

@r-richmond
Copy link

r-richmond commented Nov 28, 2022

I think @r-richmond's case is a bit different case that I had in mind). I think I started to see how pyproject.toml woudl fall into this picture. @r-richmond - do I understand correctly that you treat the pyproject.toml of yours not as a "package build description" but mainly as a "deployment description" where you mix the packages to install and your own Python code (for example DAG files?) - and then you use poeetry to keep it all in check (but you never have an intention to build a package of yours - just to have a bunch of files being developed and installed packages that they use (for example airflow core)? Is this more or less correct description?

Yes, your summary as a "deployment description" is great/💯.

Reading about some of neersighted's concerns about things I'm not using (i.e distfile, Python packaging/the core metadata specification itself) has made me realize that I'm perhaps not using Poetry in the standard way, which is probably why they have concerns about poetry.lock i.e. (is not intended to represent configuration) that I'm not thinking about. i.e. for my deployment description the poetry.lock file already represents the specific config for my deployment and pyproject.toml + poetry lock is how I generate new/upgraded valid configurations.

Background on how I ended up here
  • I started using pipenv many years ago since at the time the dependency resolver in pip was not good
  • the pattern of specifying top-level requirements only in the pipfile while having the specific versions in pipfle.lock for reproducibility was a great win for upgrading and stability
  • Eventually pipenv had issues and I migrated to poetry If my memory serves me correctly, at the time the dependency resolver for poetry was much faster and our airflow's environment was complex so switching sped up the lock process by ~5x.
  • At the time I viewed poetry as a more or less drop in replacement for pipenv though I recognized at the time it also supported the python publisher's workflow (but that was an unneeded feature for my current use case).

Questions / Comments that should perhaps move to a discussion

Yes, importing would clobber versions already specified in pyproject.toml. I don't see how that is anything less than expected -- you're asking Poetry to narrow the range it solves for, after all.

  • For my "deployment description" If I pin say jaydebeapi = "==1.1.1" because of a subsequent api change. I don't want that pin to be overridden in my pyproject.toml I'd rather have an error saying unresolvable.
  • More importantly IMHO, I don't want transitive dependencies added to my pyproject.toml poetry.tool.dependencies If such a dependency is dropped by my core dependencies I do not want to install it or have a reference to it in my pyproject.toml poetry.tool.dependencies. With the given import suggestion how would such dependencies be removed/marked?
  • I also don't really want 600 packages in my poetry.tool.dependencies especially since I have no easy way to remove packages I'm not using.

  • rather than being prescriptive about how to fix the issue I'm curious @neersighted how would you recommend I solve this (perhaps I've gone down the completely wrong path in using poetry for dependency management in an application deployment).
    • Requirements / Goals
    1. Need to specify top level packages that will be installed (airflow, dbt, pydantic, see example ci/cd/ etc..)
    2. Don't want to pin the top level packages unless there is a version issue (i.e. want to be able to update regularly and easily).
    3. Want to have a list of all packages & versions that were installed into the deployment and for that to be in vcs.
    4. (What I currently can't do with poetry) When/If I open a bug with apache/airflow and get asked about the python packages I'm using to be able to confirm that I'm using versions were tested and passed all tests. Currently since Airflow uses Library style pinning (i.e. only pin/limit if there is a known issue), rarely I can/do pull down package versions that were released after airflow's bump that have incompatibilities. Airflow's response to me as a user is to make sure I'm using the versions that are specified in the constraints file since those were tested. For my current workflow this is a little hard since I'm using poetry and don't have a way to tell poetry to use those constraints as part of the lock process.
      • I don't want to pull 600 lines of package information into the poetry.tool.depedency section for the reasons stated above (don't need them all, 600 lines+ of maintenance I don't want)
      • I also don't want to push airflow to use stringent version requirements in its published package since it will cause a ton of version conflicts when trying to resolve. (especially since Ability to override/ignore sub-dependencies #697 was closed I'd have no way to manually correct these conflicts).

p.s. let me know if you think that question would be better in a discussion.
p.p.s. I'm trying to help push for the constraints Idea because I'd like to keep using poetry for my usecase / I'd also like for airflow to be able to provide a "supported" way to install airflow with poetry but that is only possible if there is a consensus on how to tell poetry to use/reference the latest tested package versions for airflow (I.e. constraints).

@potiuk
Copy link
Author

potiuk commented Nov 28, 2022

This thread is pure gold for my talk :). Thanks @r-richmond for such detailed description and question :). I would love to hear if there is an intereste in supporting case like this - because my talk (and I hope maybe future PEP?) is very much about this case.

@bjoernpollex-sc
Copy link

This is a very interesting discussion, but I'm not sure I got all of it! I think the use-case that brought me here I think most closely fits the "deployment description" mentioned by @r-richmond. Curiously, it is also about Airflow :)

In my case, I want to build DAGs that will be deployed into an Airflow instance that I don't fully control. I know exactly which package versions are available in that environment. I can't change any of the existing packages, but I can add new ones. So I want to use poetry to specify the dependencies specific to my code, but then respect the constraints of the target environment when resolving dependencies.

Ideally, the package list from the target environment should not live in my pyproject.toml. I might be targeting multiple different environments (or the environment might be updated), and in that case, I don't see why my pyproject.toml should change.

Does this scenario fit into this discussion?

@potiuk
Copy link
Author

potiuk commented Jan 12, 2023

Does this scenario fit into this discussion?

Very much so.

@KotlinIsland
Copy link
Contributor

@fjmacagno
Copy link

Bump, this would be really nice.

@douglaszickuhr
Copy link

Really nice feature to have indeed!

@hstravis
Copy link

Bump!

@codecakes
Copy link

+1 but Is there a simple reproducible example workaround by using pyproject.toml with poetry (as is) to getting airflow installed deterministically and working as airflow docs recommend, chimed several times by @potiuk, instead of using pip that takes care of this issue?

@mr-real
Copy link

mr-real commented Jul 28, 2023

@potiuk, as of the latest Poetry version, it seems to support constraints in a form of an optional dependency group. You just don't install that optional dependency group. I experimented with it, and it seems indeed that such an optional group puts appropriate constraints on the poetry lock, while not being installed unless explicitly asked to. I also confirmed that when you install such a project with pip, such an optional dependency group does not get installed. Please let us know if you were looking for any different sort of functionality, otherwise, this issue can be closed!

@mr-real
Copy link

mr-real commented Jul 28, 2023

Also, a ping for all who needed this feature: @codecakes, @hstravis, @douglaszickuhr, @fjmacagno, @bjoernpollex-sc, @r-richmond.

@potiuk
Copy link
Author

potiuk commented Jul 28, 2023

This is a different feature @sabiroid .

What optional dependency group is, is corresponding to pip extras. Constraints is a different idea. You can use constraints with extras to get reproducible installs but that's about all that the two have in common.

In constraints you specify the limits on all possible packages that you might install. for example in airlfow constraints is this https://github.com/apache/airflow/blob/constraints-2.6.2/constraints-3.10.txt

What the constrainst are in this case they say:

If - for whatever reason - you are installing dependency ABC, it should be X.Y.Z version. And it tells so for potential 650 dependencies of airflow (including transitive ones)

The optional dependency group means something different. They mean that tf you choose (as user) to install this optional group, you install also those dependencies following those requirements.

For example "pip install airflow[celery]" will install those dependencies https://github.com/apache/airflow/blob/main/setup.py#L268

    "celery>=5.2.3,<6"

This would be corresponding to poetry install --with celery - and it will install latest celery release

Now, constraint are working as an extra "soft" limit. Following from the example above:

pip install airflow[celery]==2.6.2 --constraint https://github.com/apache/airflow/blob/constraints-2.6.2/constraints-3.10.txt 

This builds on top of the optional feature of installing celery package, but one thing more it does it tells "and the celery package version installed should be EXACTLY 5.3.0 - because that's what you are constrained to by the constraints". And is the best way to get reproducible installs. It's really very similar to "lock file" of poetry but:

a) it can be stored externally - not in the source code of the project
b) you can have different "lock" file for different versions of python and versions of your software
c) you can tell your users to use it when they are installing your package rather than building it locally
d) it limits not the "required" dependencies of yours - but all potential optional dependencies you might want to install together with your software - so for example if you have multiple optional dependency groups, it can tell you versions of all dependencies (and their dependencies) of all potential dependency groups your software might have.

@cloutierjo
Copy link

cloutierjo commented Jul 29, 2023

After discussion in the #8251 feature request, it was deemed that my use case is apparently close enough to this to merge it together. While I do see similar concern, I'm not totally certain the solution for one would solve the solution of the other. (Even more looking at the latest comment from potiuk.) But in any case let me express my requirement and see what are your thoughts, hopefully we can move forward to a resolution for everyone.

As a developer from a project, I need to align my dependencies on the whole company whitelisted dependencies. We also want, as a team that most of our project keep all dependencies version as close as possible to avoid incompatibility and feature gap between the various projects on which we are working. The needs for that are thus :

  • To be able to have a reference in our pyproject to those shared dependencies
  • Shared dependencies should be available in a similar way to normal dependencies (including pipy and enterprise repository)
  • To be able to have additional version rules from our own project
  • The shared dependencies version should allow the whole set of version rules
  • The shared dependencies should not be installed unless specified in our project
  • They should get resolve like any transitive dependencies
  • They should be reresolved/updated when doing poetry lock or poetry update

Now my proposal was quite aligned on the BOM pattern from java that is implemented in maven and gradle. Maybe a bit simplified thought and would use the optional dependency pattern to achieve it.

I see 2 ways of doing this (all the detail in #8251, I dont want to add too much here):

  • Having the optional dependency of a direct dependancy of our project be used in the requirements resolution
  • Having the direct dependancy of an optional dependency of our project be used in the requirements resolution

I believe both of them answer all the need above, and are really not far from the current poetry feature, wouldn't require much (or any) change to the pyproject or cli. Hence optional dependencies of our project are already used to resolve version from transitive dependencies, and extra allow installing and thus resolving optional dependencies of a dependency. Also most importantly, that use case, imo, is clearly in scope of a "dependency management and packaging tool"

So what are your thoughts, would this solve the issue requested here? Do you think it's a good approach to the exposed issue?

@honnix
Copy link

honnix commented Jan 22, 2024

FWIW, I worked on #4005 years ago adding this feature but it was not considered to solve a real problem, and I gave up.

@elephantum
Copy link

I would like to add my use case for --constraints scenario.

In machine learning it is common to use pre-built docker images with some python packages already installed. And you really-really do not want to override these specific versions because they might contain some very specific hardware compatibility.

For example nvcr.io/nvidia/l4t-ml:r35.2.1-py3 for Nvidia Jetson hardware.

We constraint all the pip install operations to pre-installed versions of packages so that none of the original packages is overridden.

@scriptdruid
Copy link

Any update on this. Poetry works flawlessly for everything expect, fails for Airflow. We want to settle with Poetry for dependency management and packaging but it does not make sense to use pip and poetry both only because of lack of support for something like constraints in Poetry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Feature requests/implementations status/triage This issue needs to be triaged
Projects
None yet
Development

No branches or pull requests