Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tarballs change after release #2781

Closed
freswa opened this issue Dec 5, 2022 · 22 comments
Closed

Tarballs change after release #2781

freswa opened this issue Dec 5, 2022 · 22 comments
Labels

Comments

@freswa
Copy link

freswa commented Dec 5, 2022

Arch Linux maintainer here. We've received 7 reports so far this year, due to changed tarball checksums after a release has been pulled.
See also: https://bugs.archlinux.org/task/76747

Is this intentional?

@freswa freswa added bug new Triage required labels Dec 5, 2022
@konstruktoid
Copy link
Contributor

So if you've pulled e.g https://github.com/ansible/ansible-lint/archive/refs/tags/v6.9.1.tar.gz the checksums changed later on? And where do you get those checksums from?

@freswa
Copy link
Author

freswa commented Dec 5, 2022

We fetch the tarballs from https://github.com/ansible/ansible-lint/archive/v6.9.1/ansible-lint-6.9.1.tar.gz

Until today the b2sum was
7cbc525f0fa873b6309d9aded24ec29f4678aa8323555af7d2796bfb5d313761fec90e8033b61864fd28f6c69757c4f7fb62a674b694d0e4a08b5238ea417ad5

now it's
73035bdbd6c1bdee5566d5dee5a8461953d99bbaf896d7a9764c32419e74c3d23883ac1a7548de3e21372195f99c325c371cead7397bf64d6d033d9a2f81ed01

@konstruktoid
Copy link
Contributor

73035bdbd6c1bdee5566d5dee5a8461953d99bbaf896d7a9764c32419e74c3d23883ac1a7548de3e21372195f99c325c371cead7397bf64d6d033d9a2f81ed01 matches https://github.com/ansible/ansible-lint/archive/refs/tags/v6.9.1.tar.gz but I don't have any history of the packages besides those at https://github.com/ansible/ansible-lint/releases so I can't go back two days.
Either way a package checksum shouldn't change.

@freswa
Copy link
Author

freswa commented Dec 5, 2022

Guess we can't do much now, but I'll leave this open for the next checksum change

@ssbarnea
Copy link
Member

ssbarnea commented Dec 6, 2022

That is indeed interesting, I wonder if that is related to one of this things:

  • rarely the release pipeline might fail, so nothing gets published to pypo.org and in a case likely this I remove the tag and trigger the pipeline again. This happened probably twice in the last year and the time delay between the tag "move" is likely less than an hour.
  • Few times I made minor modifications to the release notes from https://github.com/ansible/ansible-lint/releases for previous releases. This should not modify the tag but I am not really sure how github is building these archives and if they include extra information which is not really from git.

It is easy to see when release pipeline failed by looking at https://github.com/ansible/ansible-lint/actions/workflows/release.yml and you can even check the numbers which are consecutive. If I am correct last time release failed for lint was about 7 months ago, and few days ago.

Indeed this should stay open until we identify the reason and find a solution.

I wonder if we can setup a webhook for spotting a tag deletion and/or move, it could prove useful. If anyone knows a system that can be used for that let me know. I use https://newreleases.io/ for lots of projects but that one would only notify when a new tag is created, which is not what we need in this case.

Sadly I that checksum is not the git commit hash, so I cannot search git history to identify it. Probably we need to get our hold on two archives built from the same tag that have different checksums and investigate them.

Googling on tar and changed checksum, got me on https://stackoverflow.com/questions/52668432/tar-package-has-different-checksum-for-exactly-the-same-content -- apparently lots of reasons why an archive could change as they could contain other "goodies" inside. Maybe we should ask github because they are producing these archives not us, we have zero control over their creation.

@ssbarnea ssbarnea removed the new Triage required label Dec 6, 2022
@freswa
Copy link
Author

freswa commented Dec 6, 2022

This happened probably twice in the last year and the time delay between the tag "move" is likely less than an hour.

I receive notifications for updated packages every 15 Minutes. So this could be the reason. Did the publish fail for the 6.9.1 release?

Maybe we should ask github because they are producing these archives not us, we have zero control over their creation.

We are building a lot of pacages from github. Besides manually changed tarballs and moved tags, I'm not aware of a similar problem in a different project.

@ssbarnea
Copy link
Member

ssbarnea commented Dec 7, 2022

@freswa Can you elaborate in "notifications for updated packages every 15 Minutes", for the ansible-lint itself? I really need to get my hands on a pair of such packages to compare them, you made me very curious and I also want help help you avoid headaches.

Out of curiosity, isn't possible to monitor sdist from pypi instead of source git? I know for sure that on pypi is impossible to update a package once pushed, something that cannot be achieved with github tar archives.

And no, release did not fail for v6.9.1, and what job did run can be seen on on actions tab.

My suspicion is that github might decide to change there tarballs on their side based on the text of the release notes page, which can be edited even after the release.

Digging on Github forum I found few threads but I am not sure it these answer our questions:

As I seen info about tar not being able to produce the same archives on different systems, I wonder if the zip archive checksum might be more stable? -- That assumes that this is caused by archival and not the effective content being changed.

@freswa
Copy link
Author

freswa commented Dec 7, 2022

Can you elaborate in "notifications for updated packages every 15 Minutes", for the ansible-lint itself?

I'm using nvchecker with this configuration:

[ansible-lint]
source = 'github'
github = 'ansible/ansible-lint'
use_latest_release = true
prefix = 'v'

It's run every 15 Minutes with /usr/bin/nvchecker-notify to generate desktop notifications.

@dvzrv
Copy link

dvzrv commented Dec 7, 2022

My suspicion is that github might decide to change there tarballs on their side based on the text of the release notes page, which can be edited even after the release.

The release tarball only changes if the commit from which it was generated changes (aka. "moving the tag") or if github decided to change their way of generating the tarball (which, given that it would break everyone, is unlikely to ever happen knocks on wood).
The former is something you as a project control, the latter is something that is a) baked into git (git-archive) and b) controlled by github itself.

@ssbarnea
Copy link
Member

Closing because there is nothing to track for now, we will reopen if needed.

@Toolybird
Copy link

Toolybird commented Dec 24, 2022

It's happened again. Here is the diff between the tarball I downloaded previously and the one I downloaded just now:

--- 1/ansible-lint-6.10.0/.git_archival.txt     2022-12-15 07:51:47.000000000 +1100
+++ 2/ansible-lint-6.10.0/.git_archival.txt     2022-12-15 07:51:47.000000000 +1100
@@ -1 +1 @@
-ref-names: HEAD -> main, tag: v6.10.0
+ref-names: tag: v6.10.0

@konstruktoid
Copy link
Contributor

oh, it includes git stuff as well when calculating, not just the code.
but why not use the release packages or the repo .zip?

@freswa
Copy link
Author

freswa commented Dec 24, 2022

I can confirm the report by @Toolybird. The zip file includes the .git_archival.txt as well - that won't help.

@ssbarnea
Copy link
Member

Basically that needs to be raised and tracked with github, not us. If I remember, they stated clearly that they do not make any guarantees regarding checksums of archives, so basically one needs to avoid using archives if they want to check checksums.

@freswa
Copy link
Author

freswa commented Dec 24, 2022

@ssbarnea So github runs python setuptools scm on your repo periodically and uploads a new tar and zip file?

The behavior has been introduced with this commit, just for reference.

@ssbarnea
Copy link
Member

ssbarnea commented Dec 24, 2022

Only when a new tag is created, basically on a release. src tar.gz is uploaded to pypi.

@freswa
Copy link
Author

freswa commented Dec 24, 2022

But we clearly see that there have been two v6.10.0 tarballs with different version of the .git_archival.txt. So somehow either the actions or someone manually regenerated the tarball and now the HEAD ref is gone since more commits were added in the meantime.
I doubt this is done by github automagically.

@ssbarnea
Copy link
Member

That is the idea, these archives are magically done by github not by us and anyone looking at our open source gha workflows can see that.

Please note that the moment when this thread will be locked is approaching fast as it already went too spammy. There is nothing to be done on our side AFAIK.

@GhostLyrics
Copy link
Contributor

GhostLyrics commented Dec 25, 2022

@ssbarnea have you considered implementing what's stated in one of the threads you linked?

So if you need to be absolutely certain that a release artifact won’t change, uploading a release artifact is the way to go, whether it’s a duplication in the common case or not.

That way it's also easier to avoid a potential supply-chain attack (provided I understood this thread correctly).

@ssbarnea
Copy link
Member

Unless someone else will contribute such changes (in a way that does not make maintenance harder), this will not happen. Anyone looking for extra security should avoid using github as a source of truth and better look at pypi package and implement checksum verification.

@konstruktoid
Copy link
Contributor

@konstruktoid
Copy link
Contributor

https://github.blog/changelog/2023-01-30-git-archive-checksums-may-change/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants