Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change compression format and S3 URL for Python runtime archives #1567

Merged
merged 1 commit into from
Apr 18, 2024

Conversation

edmorley
Copy link
Member

@edmorley edmorley commented Apr 16, 2024

(This change has been split out of the Heroku-24 PR for easier review.)

As part of the CNB multi-architecture support work, we need to change the Python runtime archive S3 URLs to include the architecture name. In addition, for the CNB transition from "stacks" to "targets", it would be helpful to switch from stack ID references (such as heroku-22) in the URL scheme, to the distro name+version (eg ubuntu and 22.04) available to CNBs via the CNB targets feature. See:
https://github.com/buildpacks/spec/blob/buildpack/0.10/buildpack.md#targets-1

Rather than duplicate the Python archives on S3 under different filenames/locations, it makes sense to migrate this buildpack to the new archive names too, so the same S3 archives can be used by both this buildpack and the CNB.

Moving to new archive names/URLs also means we can safely regenerate all existing Python versions to pick up the changes in #1566 (and changes made in the past, such as #1319, #1320, #1321 and #1322), since we won't have to worry about overwriting the old archives (which is something we've typically avoided, since it isn't compatible with the model of being able to roll back to an older buildpack version to return to prior behaviour).

Since we're changing the S3 URLs anyway, now is also a good time to make another change that would otherwise cause churn in the S3 URLs again (which affects people that pin buildpack version): Switching archive compression format from gzip to Zstandard (something that we've been wanting to do for a while).

Zstandard (aka zstd) is a much superior compression format over gzip (smaller archives and much faster decompression), and is seeing widespread adoption across multiple ecosystems (eg APT packages, Docker images, web browsers etc).

See:
https://github.com/facebook/zstd
https://github.com/facebook/zstd/blob/dev/programs/README.md#usage-of-command-line-interface

Our base images already have zstd installed (and for Rust for the CNB, there is the zstd crate available), so it's an easy switch.

Various compression levels were tested using zstd's benchmarking feature and in the end the highest level of compression picked, since:

  1. Unlike some other compression algorithms, zstd's decompression speed is generally not affected by the compression level.
  2. We only have to perform the compression once (when compiling Python).
  3. Even at the highest compression ratio, it only takes 20 seconds to compress the Python archives compared to the 10 minutes it takes to compile Python itself (when using PGO+LTO).

For the Ubuntu 22.04 Python 3.12.3 archive, switching from gzip to zstd (level 22, with long window mode enabled) results in a 26% reduction in compressed archive size.

GUS-W-15158299.
GUS-W-15505556.

@edmorley edmorley self-assigned this Apr 16, 2024
@edmorley
Copy link
Member Author

Builds for all supported Python versions have been triggered using the GitHub CLI:

for v in 3.8.{0..19} 3.9.{0..19} 3.10.{0..14} 3.11.{0..9} 3.12.{0..3}; do
  gh workflow run build_python_runtime.yml --ref new-url-structure-and-zstd -F "python_version=${v}"
done

And can be viewed here:
https://github.com/heroku/heroku-buildpack-python/actions/workflows/build_python_runtime.yml?query=branch%3Anew-url-structure-and-zstd

@edmorley edmorley changed the base branch from main to improved-eol-error-messages April 18, 2024 09:56
@edmorley edmorley marked this pull request as ready for review April 18, 2024 09:59
@edmorley edmorley requested a review from a team as a April 18, 2024 09:59
@edmorley edmorley force-pushed the improved-eol-error-messages branch from acb5259 to b04f3eb Compare April 18, 2024 15:46
Base automatically changed from improved-eol-error-messages to main April 18, 2024 15:48
edmorley added a commit that referenced this pull request Apr 18, 2024
For cases where a requested Python version is both (a) EOL, and (b) was
never built for that stack (such as is the case when we add new stacks),
previously the generic "version isn't available for this stack" error
message was shown instead of the more specific EOL Python version error
message.

Now, the EOL version check is performed first before the S3 presence
check, so the more specific EOL message is shown for this case.

In addition to improving the UX, making this change now reduces the
test fixture churn both when we add a new stack and for #1567.

I've also dropped the "PyPy is no longer supported" error message
and associated test, since very few apps ever used it and it's now been
19 months since support was removed in #1364, so it's fine to show the
generic "Python version isn't available" error message for it instead.

GUS-W-15541279.
As part of the CNB multi-architecture support work, we need to change
the Python runtime archive S3 URLs to include the architecture name. In
addition, for the CNB transition from "stacks" to "targets", it would be
helpful to switch from stack ID references (such as `heroku-22`) in the
URL scheme, to the distro name+version (eg `ubuntu` and `22.04`)
available to CNBs via the CNB targets feature.

See:
https://github.com/buildpacks/spec/blob/buildpack/0.10/buildpack.md#targets-1

Rather than duplicate the Python archives on S3 under different
filenames, it makes sense to migrate this buildpack to the new
archive names too, so the same S3 archives can be used by both
this buildpack and the CNB.

Moving to new archive names/URLs also means we can safely regenerate all
existing Python versions to pick up the changes in #1566 (and changes
made in the past, such as #1320), since we won't need to overwrite the
old archives and so rolling back to an older buildpack version will work
as expected.

Since we're changing the S3 URLs anyway, now is also a good time to
switch archive compression format from gzip to Zstandard (something
that's long overdue).

Zstandard (aka zstd) is a much superior compression format over gzip
(smaller archives and much faster decompression), and is seeing
widespread adoption across multiple ecosystems (eg APT packages, Docker
images, web browsers etc).

See:
https://github.com/facebook/zstd
https://github.com/facebook/zstd/blob/dev/programs/README.md#usage-of-command-line-interface

Our base images already have `zstd` installed (and for Rust for the CNB,
there is the `zstd` crate available), so it's an easy switch.

Various compression levels were tested using zstd's benchmarking feature
and in the end the highest level of compression picked, since:
1. Unlike some other compression algorithms, zstd's decompression speed
   is generally not affected by the compression level.
2. We only have to perform the compression once (when compiling Python).
3. Even at the highest compression ratio, it only takes 20 seconds to
   compress the Python archives compared to the 10 minutes it takes to
   compile Python itself (when using PGO+LTO).

For the Ubuntu 22.04 Python 3.12.3 archive, switching from gzip to zstd
(level 22, with long window mode enabled) results in a 26% reduction in
compressed archive size.

GUS-W-15158299.
GUS-W-15505556.
@edmorley edmorley merged commit 2c35ea4 into main Apr 18, 2024
5 checks passed
@edmorley edmorley deleted the new-url-structure-and-zstd branch April 18, 2024 15:53
@heroku-linguist heroku-linguist bot mentioned this pull request Apr 18, 2024
edmorley added a commit to heroku/buildpacks-python that referenced this pull request May 2, 2024
…argets (#197)

A `libcnb.rs` release supports a single Buildpack API version, so
whenever we update to a libcnb release that now implements a newer
Buildpack API version, we must switch to that version in the buildpack
at the same time.

This change updates the buildpack to the latest libcnb release, which
requires both a switch to Buildpack API 0.10, a switch from stacks to
targets, and also some adjustments for layer API changes.

As part of the switch from stacks to targets, the buildpack now consumes
the Python runtime from the new S3 location/filenames (that use distro
name/version in the URL instead of stack ID), which were added in:
heroku/heroku-buildpack-python#1567

The new archives also now use Zstandard (aka zstd) for compression
instead of gzip, which results in a faster download due to the smaller
archive size (for example, the Ubuntu 22.04 Python 3.12.3 AMD64 archive
was 26% smaller) as well as faster decompression. This required
switching from the `flate2` crate to the `zstd` crate.

A side-effect of switching to the new S3 files is that the archives for
Python 3.7 are no longer available, since I intentionally did not build
them given that Python 3.7 is EOL. As such, this change also drops
support for Python 3.7 (something that the classic buildpack has already
done, and would have been done here already if it were not for being
blocked on #8).

The switch to targets unblocks Heroku-24/multi-architecture support,
which will be handled in a later PR.

See:
https://github.com/heroku/libcnb.rs/blob/main/CHANGELOG.md#0210---2024-04-30
https://github.com/buildpacks/spec/releases/tag/buildpack%2Fv0.10
https://github.com/buildpacks/spec/blob/buildpack/0.10/buildpack.md#targets-1
https://docs.rs/zstd/latest/zstd/

Closes #192.
Closes #194.
GUS-W-15261168.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants