-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation Error: Hashes are for integrity, not security #12541
Comments
No, they can't. From the first line under the Hash-checking Mode section:
We're not using the hashes in the URLs/index to check for malicious alteration; we're checking it against a known good hash embedded in a local requirements.txt file. |
It is alarming if the PyPI team thinks it's OK that their documentation misleads users into thinking that their unsigned hashes provide security, especially considering the numerous incidents of supply chain vulnerabilities have caused issues with open-source projects due to publishing infrastructure compromise in recent years. The documentation should clearly state that hashes do not provide security, as you said above. Please re-open this issue; it's not OK to lie to users. |
Checking against a known-good local hash does protect against tampering and compromise of the index/infrastructure, since you're checking that the artifact has not been modified compared to the last known-good copy (which the hash is generated from, assuming a decent enough hash function). If you disagree with this, I'd like to understand what kind of compromise you're thinking of that would lead to a requirements file such as the following to install a malicious copy of
I have literally quoted the documentation, which states what I have rephrased. I did not say that hashes do not provide security here. |
Sorry, but the whole point is that there is no such thing as a "known-good" hash if it you didn't cryptographically verify the signature of the hash. How did the hash get from the Internet onto the computer? That's the vulnerability. Please re-open this ticket so we can properly educate users of the risks and limitations of PyPI. |
In that case, I believe you're looking past what I'm saying.
I think you're mixing provenance with the ability to tell if something has changed unexpectedly (i.e. integrity). Provenance guarentees will be available once https://peps.python.org/pep-0458/ and https://peps.python.org/pep-0480/ end up being implemented, and until then validating that the files aren't tampered between two uses has significant and meaningful benefits, over the default of not even doing that. If you want me to reopen this ticket, please provide an answer to the request I made in my previous comment for a clarification/explanation. |
The security property that I'm referring to Authenticity -- verifying that the software is authentic. That the downloaded software originated from the developers (and not someone malicious). The documentation suggests that the hashes provide security that protects the user from downloading maliciously modified software. That's authenticity. And it's wrong; the hashes do not provide any assurance of authenticity.
Which question did I not address? |
I personally do not find the documentaiton misleading and find it very well placed and your request unfounded @maltfield. Security is not As a Security Comitee member of the Apache Software Foundation, we are working on achieving more security with our Yet we are actually looking at adding hashes to our constraint files https://github.com/apache/airflow/blob/constraints-2.8.2/constraints-3.8.txt as to add Consider that as an opinion of somoene who helps to overlook security in 100s of ASF projects, and helps to define policies and security approach in a Foundation that did security well before it was fashionable. |
This one:
|
The hash feature doesn't provide "some security". It provides zero security. Without a signature, It protects against corrupt downloads. That's not security. |
The compromise is that the user downloads maliciously modified software. I'm not sure why this isn't obvious. How did the user get the hash in the example command? |
How did the user ensure that their copy of Python, and their copy of pip, is not compromised? The way you are framing everything in terms of absolutes, and accusing the pip maintainers of bad faith in the way we document pip's features, is neither helpful nor welcome. Please consider both your tone and the message you are giving. If you, personally, don't feel that pip is sufficiently secure, then by all means don't use it. No-one is forcing you to do so. Others can make their own decision. |
Most OS package managers, apt included, verify the authenticity of all packages with cryptographic signatures. This is documented here: I believe I've answered your question. Please re-open this issue. |
It is some level of security. There are scenarios where it matters Here we simply disagree., Utlimately On a human level I have a suggestion to you @maltfield - before it gets any further. Remember there are other humans on the other side. I propose you to consider that you've been listened to, your opinion was considered and rejected. Happens. I think appreciating all the effort that maintainers do to make things working well (and every day better) in order you can use the software for free, often in their own personal time, away from their families, and things they get money for, I think appreciation of that is better than fighiting with them over minute and completely meaningless details in a long run. If I may suggest something from my experience - even if I finally got the small thing I fought for - that was overall a bad idea for me to get confrontational here. If I regret something then was how short-sighted and stupid I was back then to get into that rabbit hole, and I would gladly go back in time and revert it. IMHO you will get much more with accepting and trying to understand the other side an accepting that other people. might have different opinions and when they have a merit of the projects, they have the right to make decisions that are right for them and their users. But, well that's my opinion, experience and advice - you might take it or not, up to you. |
If this project was open to listening to proposals, this ticket wouldn't have been immediately closed (before some dialog). The PyPI team is being rude by immediately closing something as "won't fix" when a user informs them of harm that they're causing users, and contributes to the project by taking the time to highlight bugs and their solutions. I am a human taking time out of my day to file this bug to better all python users. We all make mistakes. Not all bugs are shallow. It shouldn't be considered rude to report bugs and advocate for harm reduction. Please re-open this ticket. |
I don't have an opinion on the larger conversation, but hashes protect you on recreation of environment but not on initial creation of environment. For example if you had sourced your hashes before the 25th December for the PyTorch Dependency Confusion attack you would have been safe: https://pytorch.org/blog/compromised-nightly-dependency/ There are many other situations in which an attacker may be able to insert a version that matches but not a hash that matches. This of course does not fully solve supply chain security, but it does protect against certain attack in some situations. |
AFAIK, the hashes downloaded "before the 25th of December" would still lack signatures. So, no, you would not have been safe from several supply chain vulnerabilities, including a publishing infrastructure compromise. |
You would have been safe from that attack, the attackers files had different hashes. Preventing a real world security attack, I wasn't talking about security as an abstract concept. |
Also @maltfield since you are insisting, I would heartily recommend you to do your homework and rather than throw a bunch of random links and expressing your opinion about what "security of supply chain" is we should refer to standards. I'd heartily recommend you to get familiar with this - widely accepted in the industry - standard describing security of supply chain: https://slsa.dev/spec/v1.0/levels This standard described there, is not narrowly focusing on having or not having cryptographic signatures. Cryptographic signatures are only a small part of the supply chain security and actually when you focus exclusively (as you do) only on cryptographic signatures as a sign of "security" - you are making a huge mistake - which gives you a false sense of security. Because "Security of supply chain" is way, way, way more than that - those are processes, build platforms and wealth of other things - nicely described in the standard. And I have a very interesting surprise for you. Regardless if the hashes are cryptographically signed or not - it is still L0 (zero) level in SLSA. Signing hashes or packages cryptographically on its own does not move a needle when it comes to a security level in SLSA. Go and check it yourself. What DOES change it and moves it to L1 - is to have a verifiable way on determining how the packages were build. Which cryptographic signatures tell absolutely nothing about. Reproducible builds however, do. In fact, they do - and bring the SLSA level 1. And - this might be even more surprising to you @maltfield - it does not matter if the packages are signed at all to be at level 1:
So if would use your arguments. YOUR solution cannot be named "security" either - because it does not provide the security - not even Level 1 of SLSA. Even more - anything that provides L1 cannot be named security, because there are L2, L3 levels that provide even more security (note - there are currently not known public platforms of any sort that provide level 3 security - though a number of platforms out there strive to achieve it). So I think if take your way of thinking and rather than applying it to - pretty narrow and small part of "Supply security" about cryptographics signing being the only criteria to name something "Security", we should not name anything with "Security" - because it does not achieve highest level of security. I suggest you do some more research and reading in this aspect - it might help you to expand your - currently pretty narrrowly focused on cryptographic signing - knowledge about supply chain security. Security is like Ogres - it has layers. And many of them. |
potluk, in my python project we have reproducible builds, hashes, and I cryptographically sign all my hashes. Reproducible builds are important. If you're suggesting that we add a note to the documentation indicating that reproducible builds are an important aspect in supply chain security that PyPI is currently not enforcing, I would agree that is a valuable addition to the documentation. Cryptographic signatures are necessary. Currently the documentation is spreading misinformation to developers, making them think that adding an unsigned hash to their download provides security. I encounter a lot of devs that think hashes provide security. I think part of the problem is documentation like this. Let's fix the docs so that devs aren't misinformed, leaving their build process (and therefore all of their users) vulnerable. Please re-open this issue so the documentation can be fixed. |
Please, just stop. You've made your point, and been heard. We (the pip developers) don't agree with you. I'm sorry if that frustrates you, but it's a reality you have to deal with. Re-stating the same assertions won't change the outcome. Plenty of other people with security expertise have read pip's documentation and no-one other than you has tried to make the claims you've made here. Is your only concern here with the word "secure"? Because if so, it is only used twice in that whole section. If all you wanted was to change those two occurrences to use a different word, you could have submitted a PR and done that. I don't personally think it's necessary, but we could have assessed a simple terminology change without all of the aggression and name-calling that your original post included. Pip's documentation is not the place to educate users about security best practices. Nor is pip the only weak link in the Python package distribution chain. People are aware of the weaknesses, and working on them. Yelling specifically about one section of the pip documentation shows a stunning lack of awareness of the bigger picture here. Edit: If you continue the discussion in the same way as you have up to now, I'm going to lock this issue as "too heated". If you wish to avoid that, moderate your future comments. You know by now the tone we expect from you - it's simply that you're "open, considerate and respectful" as stated in the code of conduct. |
Yes, I'd like to submit a PR to improve this documentation. Generally, I think it's best-practice to:
Instead of the above expected process, this bug report was closed immediately without any dialog. That's not very encouraging to community members who want to contribute to improve the project. Please re-open this issue so we can fix the bugs in the documentation, as described above. |
Description
Currently the documentation has a subsection titled
Hash-checking Mode
under a section calledSecure installs
.This is extremely misleading because checking hashes (that are not cryptographically signed) does not provide additional security via authenticating the package cryptographically.
Hash checking is for verifying that the file wasn't corrupted when it was being downloaded. It verifies integrity. It does not verify that the package wasn't maliciously altered. Note that this is the case because if the package was maliciously altered (eg by a publishing infrastructure compromise), then the attacker could just as easily modify the hashes such that pip will happily install the malicious module.
Security via payload authentication is done with cryptographic signature. Commonly, this involves hashes by signing a digest file including hashes. But if the hashes are not cryptographically singed, then only integrity is assured; authenticity is not assured.
Expected behavior
The documentation should move the
Hash-checking Mode
into another section titledCorruption Checking
and it should add a warning indicating that pip currently does not have a built-in mechanism to cryptographically verify the authenticity of packages that it downloads, which leaves users vulnerable to downloading malicious software due to attacks such as publishing infrastructure compromisepip version
Python version
OS
How to Reproduce
docs
dir) https://pip.pypa.io/en/stable/topics/secure-installs/Output
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: