Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

App data seeder: make pip by default periodically self-upgrade #1831

Closed
wants to merge 2 commits into from

Conversation

gaborbernat
Copy link
Contributor

@gaborbernat gaborbernat commented May 18, 2020

A conservative implementation of the outcome from https://discuss.python.org/t/how-should-virtualenv-behave-respective-of-seed-packages-e-g-pip-by-default/4146 poll.

The goal of the PR is to try to pull in new pip versions, however:

  • do not degrade virtual environment creation performance significantly,
  • do not trigger during CI runs (the feature is aimed mostly at end-users, that install virtualenv themselves).

The proposed change would make pip upgrade periodically:

  • every two weeks try to upgrade pip to a newer version

  • the upgrade runs not inline, but instead as a new background process (this makes the performance overhead not significant)

  • upgrade the seeded pip to a new version if there's a newer version that satisfies the following criteria

    • was discovered at least an hour ago (this is to ensure that CI environments don't start with an old version and mid-way through they switch to a newer version)
    • was released at least 28 days ago (allow 4 weeks for pip to do a bugfix release).

I consider this a reasonable middle-ground. We will mostly use bundled versions (and almost always in the CI), but for end-users, once a pip version is out there for a long time we do switch to it, as we consider that a proof of stability.

People who would like to always get the latest version should either set the environment variable VIRTUALENV_DOWNLOAD=1 or set download inside the virtualenv.ini file (or pass in the --download flag).

People who would like to always pin to a given version should either set the environment variable VIRTUALENV_PIP=20.1 or set pip inside the virtualenv.ini file (or pass in the --pip flag).

Resolves #1821

Copy link
Contributor Author

@gaborbernat gaborbernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pfmoore, @pradyunsg, @dstufft, @asottile check out if this solution works for you.

The code needs cleaned up a bit, add documentation, and some tests. But the core idea is there.

updated_wheel = Wheel(cache_dir / filename)
# use it only if released long enough time ago - use version number to approximate release
# only use if it has been released for at least 28 days
if datetime.now() - release_date > timedelta(days=28):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use periodically upgraded bundle if it's older release than 28 days

def _get_release_date(dest):
wheel = Wheel(dest)
# the most accurate is to ask PyPi - https://pypi.org/pypi/pip/json
try:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get release date primarily from PyPi

except Exception: # noqa
pass
# otherwise can approximate from the version number https://pip.pypa.io/en/latest/development/release-process/
released = datetime(year=2000 + wheel.version_tuple[0], month=wheel.version_tuple[1] * 3 + 1, day=1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or if that's not available then approximate from version number

]
kwargs = {"stdout": subprocess.PIPE, "stderr": subprocess.PIPE}
if sys.platform == "win32":
kwargs["creation_flags"] = DETACHED_PROCESS
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update is a detached new process to be run in the background (to not slow down creation).

return self._check_start(now)
else:
delta_since_last_completed = now - self.completed
if delta_since_last_completed < timedelta(days=13):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trigger periodic update once every two week

@@ -17,6 +17,13 @@


class PipInvoke(BaseEmbed):

packages = {
"pip": Version.latest,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using the legacy pip seeder, we always use the latest version. This will be latest version in bundled + search dir if running in no download mode, otherwise upstream latest.

" seeder {}".format(ensure_text(str(self.session.seeder))),
" added seed packages: {}".format(
", ".join(
sorted(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our report now we print exactly the version numbers of the seeded packages.

Signed-off-by: Bernat Gabor <bgabor8@bloomberg.net>
@gaborbernat gaborbernat changed the title Seeder - periodic update App data seeder: make pip by default periodically self-upgrade May 18, 2020
@asottile
Copy link
Contributor

my 2c: I don't think the upgrade should happen magically but should be an explicit command a user runs in order to pull in an upgrade.

always magically downloading the latest and only once every few weeks magically downloading the latest are the same problem set imo

@pradyunsg pradyunsg removed their request for review May 19, 2020 06:45
@pradyunsg
Copy link
Member

I'm gonna cop out of this discussion, since I'd like to avoid adding this to the things I'm actively thinking about. :)

@pfmoore
Copy link
Member

pfmoore commented May 19, 2020

I'm also going to opt out - my view as expressed on Discourse was that I prefer using the seeded version by default (for performance reasons, not reproducibility) but I'm OK with updating in the background if that's useful for the people who want the latest version to be used. I'll leave those people to say whether this solution suits them or not.

@pfmoore pfmoore removed their request for review May 19, 2020 07:26
@gaborbernat
Copy link
Contributor Author

gaborbernat commented May 19, 2020

Considering we have two maintainers of this and pip that don't actually want this now I'm strongly considering letting the always embedded camp win. If pip maintainers are not worried about this, Donalds worry of this holding back the evolution of the ecosystem feels unwarranted. (I'm also in favour of always embedded, but as the one driving the discussion I wanted to remove myself from the count 🤷‍♂️ sorry Donald - seems the always embed camp one both in public vote and maintainers vote)

Signed-off-by: Bernat Gabor <bgabor8@bloomberg.net>
@gaborbernat
Copy link
Contributor Author

Superseded by #1833

@pradyunsg
Copy link
Member

pradyunsg commented May 19, 2020

Considering we have two maintainers of this and pip that don't actually want this now I'm strongly considering letting the always embedded camp win

Hmph. I'm not saying that "I don't want this now" -- but rather that now isn't a good time for me to spend working/pushing on this specific thing, since I have other things to on my plate currently.

I do think the approach in this PR is better than always-embedded (which is better than status-quo). This is a compromise solution which makes everyone slightly unhappy but accomodates for every issue "well enough", vs any other choice which'll make some folks very unhappy and some folks very happy.

Further, and I've stated this before elsewhere, I don't think the public vote is a good argument for picking what functionality we provide, since I'm not sure we'd provided enough context and (more importantly) we had no metrics/checks in place to determine if/how it was swayed based on what "kinds" of users "showed up to the vote". As an example, all pip maintainers/distro packagers (except Paul and Debian's maintainer .-.) voted for the middle ground option. I don't know why these folks voted they way that they did but clearly, there's some additional weight to those?

I'd much rather prefer a what-are-the-arguments-and-trade-offs based choice, rather than what-won-a-popularity-contest where we have no direct way to know why it won.

@gaborbernat
Copy link
Contributor Author

gaborbernat commented May 19, 2020

I don't think the public vote is a good argument for picking what functionality we provide since I'm not sure we'd provided enough context

I don't agree on not providing enough context. There is a long wall of text before that poll (https://discuss.python.org/t/how-should-virtualenv-behave-respective-of-seed-packages-e-g-pip-by-default/4146) explaining the context, and multiple made follow-up posts to expand further on some context.

I also don't agree on public vote not being a good place. I agree it has some shortcomings but I'm not aware of any other better methods. Donald raised in the original issue that back then everyone was on board with always download. I challenged that statement, and made a more up-to-date request for comment/survey; which shows as things stand today there seem to be more people in favour of always embedded than download or periodic update. I've reached out on the usual PEP channels to advertise the topic, so anyone who had strong feelings about the topic could join in.

I'd much rather prefer a what-are-the-arguments-and-trade-offs based choice,

This has been debated in many posts on that topic. There's no clear winner here. It's a matter of preference: a choice between blazing fast and stable or always up to date (at some decrease in speed price).

@gaborbernat gaborbernat reopened this May 19, 2020
@gaborbernat gaborbernat deleted the set branch January 31, 2021 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ensure bundled pip does not get too out of date
4 participants