Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation Error: Hashes are for integrity, not security #12541

Closed
1 task done
maltfield opened this issue Feb 26, 2024 · 21 comments
Closed
1 task done

Documentation Error: Hashes are for integrity, not security #12541

maltfield opened this issue Feb 26, 2024 · 21 comments
Labels
resolution: not a bug Determined as not a bug in pip

Comments

@maltfield
Copy link

Description

Currently the documentation has a subsection titled Hash-checking Mode under a section called Secure installs.

This is extremely misleading because checking hashes (that are not cryptographically signed) does not provide additional security via authenticating the package cryptographically.

Hash checking is for verifying that the file wasn't corrupted when it was being downloaded. It verifies integrity. It does not verify that the package wasn't maliciously altered. Note that this is the case because if the package was maliciously altered (eg by a publishing infrastructure compromise), then the attacker could just as easily modify the hashes such that pip will happily install the malicious module.

Security via payload authentication is done with cryptographic signature. Commonly, this involves hashes by signing a digest file including hashes. But if the hashes are not cryptographically singed, then only integrity is assured; authenticity is not assured.

Expected behavior

The documentation should move the Hash-checking Mode into another section titled Corruption Checking and it should add a warning indicating that pip currently does not have a built-in mechanism to cryptographically verify the authenticity of packages that it downloads, which leaves users vulnerable to downloading malicious software due to attacks such as publishing infrastructure compromise

pip version

Python version

OS

How to Reproduce

  1. Go to the sphinx docs (generated by this repo's docs dir) https://pip.pypa.io/en/stable/topics/secure-installs/
  2. See misleading documentations with factual errors

Output

No response

Code of Conduct

@maltfield maltfield added S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels Feb 26, 2024
@pradyunsg
Copy link
Member

Note that this is the case because if the package was maliciously altered (eg by a publishing infrastructure compromise), then the attacker could just as easily modify the hashes such that pip will happily install the malicious module.

No, they can't. From the first line under the Hash-checking Mode section:

This mode uses local hashes, embedded in a requirements.txt file, to protect against remote tampering and network issues.

We're not using the hashes in the URLs/index to check for malicious alteration; we're checking it against a known good hash embedded in a local requirements.txt file.

@pradyunsg pradyunsg added resolution: not a bug Determined as not a bug in pip and removed type: bug A confirmed bug or unintended behavior S: needs triage Issues/PRs that need to be triaged labels Feb 26, 2024
@maltfield
Copy link
Author

maltfield commented Feb 26, 2024

We're not using the hashes in the URLs/index to check for malicious alteration

It is alarming if the PyPI team thinks it's OK that their documentation misleads users into thinking that their unsigned hashes provide security, especially considering the numerous incidents of supply chain vulnerabilities have caused issues with open-source projects due to publishing infrastructure compromise in recent years.

The documentation should clearly state that hashes do not provide security, as you said above.

Please re-open this issue; it's not OK to lie to users.

@pradyunsg
Copy link
Member

Checking against a known-good local hash does protect against tampering and compromise of the index/infrastructure, since you're checking that the artifact has not been modified compared to the last known-good copy (which the hash is generated from, assuming a decent enough hash function).

If you disagree with this, I'd like to understand what kind of compromise you're thinking of that would lead to a requirements file such as the following to install a malicious copy of samplepackage.

samplepackage==0.1 \
    --hash=sha256:0b93408e04eeef3bdbc97ff29eb819c46a4d610c649f5101999cff7ed9781396

The documentation should clearly state that hashes do not provide security, as you said above.

I have literally quoted the documentation, which states what I have rephrased. I did not say that hashes do not provide security here.

@maltfield
Copy link
Author

Checking against a known-good local hash

Sorry, but the whole point is that there is no such thing as a "known-good" hash if it you didn't cryptographically verify the signature of the hash.

How did the hash get from the Internet onto the computer? That's the vulnerability.

Please re-open this ticket so we can properly educate users of the risks and limitations of PyPI.

@pradyunsg
Copy link
Member

the whole point is that there is no such thing as a "known-good" hash if it you didn't cryptographically verify the signature of the hash.

In that case, I believe you're looking past what I'm saying.

How did the hash get from the Internet onto the computer? That's the vulnerability.

I think you're mixing provenance with the ability to tell if something has changed unexpectedly (i.e. integrity).

Provenance guarentees will be available once https://peps.python.org/pep-0458/ and https://peps.python.org/pep-0480/ end up being implemented, and until then validating that the files aren't tampered between two uses has significant and meaningful benefits, over the default of not even doing that.

If you want me to reopen this ticket, please provide an answer to the request I made in my previous comment for a clarification/explanation.

@maltfield
Copy link
Author

maltfield commented Feb 26, 2024

The security property that I'm referring to Authenticity -- verifying that the software is authentic. That the downloaded software originated from the developers (and not someone malicious).

The documentation suggests that the hashes provide security that protects the user from downloading maliciously modified software. That's authenticity. And it's wrong; the hashes do not provide any assurance of authenticity.

If you want me to reopen this ticket, please provide an answer to the request I made in my previous comment for a clarification/explanation.

Which question did I not address?

@potiuk
Copy link
Contributor

potiuk commented Feb 26, 2024

I personally do not find the documentaiton misleading and find it very well placed and your request unfounded @maltfield.

Security is not all-or-nothing. Never. There are various level of security you can achieve by applying various techniques. This document has nothing about Authenticity and Provenance. It speaks about "more security" not "ultimate security" and very nicely describes what the hash feature provides. - without even hinting at provenance and authenticity. Not quite sure where you draw that authenticity is hinted. If all that you think is 'Security' header then your definition of Security is pretty narrow.

As a Security Comitee member of the Apache Software Foundation, we are working on achieving more security with our PyPI distribution - likely becoming trusted publisher and - when available and when PEPs that @pradyunsg mentioned are implemented adding more security by adding provenance, cryptographic signatures (likely with sigstore) - and we discuss closely with various communities on way we can improve security, but it does not mean that "some security levels" cannot be achieved without it (for example in case of Apache Software Foundation the binaries already have cryptographic signatures checkums https://downloads.apache.org/airflow/2.8.2/. Even more - our builds are binary reproducible (which in connnection with cryptographics signatures is even MORE security that what you claim as the only thing that can be named as "security" .

Yet we are actually looking at adding hashes to our constraint files https://github.com/apache/airflow/blob/constraints-2.8.2/constraints-3.8.txt as to add more security soon, and actually the title of that chapter, led me to start considering doing it - not for corruption setting but precisely to freeze those constraints for our users so that in case of possible future breaches they can rely that the packages were not modified after we released our software. That definitely adds more security, without providing ultimate security.

Consider that as an opinion of somoene who helps to overlook security in 100s of ASF projects, and helps to define policies and security approach in a Foundation that did security well before it was fashionable.

@pradyunsg
Copy link
Member

Which question did I not address?

This one:

I'd like to understand what kind of compromise you're thinking of that would lead to a requirements file such as the following to install a malicious copy of samplepackage.

samplepackage==0.1 \
    --hash=sha256:0b93408e04eeef3bdbc97ff29eb819c46a4d610c649f5101999cff7ed9781396

@maltfield
Copy link
Author

Security is not all-or-nothing

The hash feature doesn't provide "some security". It provides zero security. Without a signature, It protects against corrupt downloads. That's not security.

@maltfield
Copy link
Author

Which question did I not address?

This one:

I'd like to understand what kind of compromise you're thinking of that would lead to a requirements file such as the following to install a malicious copy of samplepackage.

samplepackage==0.1 \
    --hash=sha256:0b93408e04eeef3bdbc97ff29eb819c46a4d610c649f5101999cff7ed9781396

The compromise is that the user downloads maliciously modified software. I'm not sure why this isn't obvious.

How did the user get the hash in the example command?

@pfmoore
Copy link
Member

pfmoore commented Feb 26, 2024

How did the user get the hash in the example command?

How did the user ensure that their copy of Python, and their copy of pip, is not compromised?

The way you are framing everything in terms of absolutes, and accusing the pip maintainers of bad faith in the way we document pip's features, is neither helpful nor welcome. Please consider both your tone and the message you are giving.

If you, personally, don't feel that pip is sufficiently secure, then by all means don't use it. No-one is forcing you to do so. Others can make their own decision.

@maltfield
Copy link
Author

maltfield commented Feb 26, 2024

How did the user ensure that their copy of Python, and their copy of pip, is not compromised?

apt-get install python3 python3-pip

Most OS package managers, apt included, verify the authenticity of all packages with cryptographic signatures. This is documented here:

I believe I've answered your question. Please re-open this issue.

@potiuk
Copy link
Contributor

potiuk commented Feb 26, 2024

The hash feature doesn't provide "some security". It provides zero security. Without a signature, It protects against corrupt downloads. That's not security

It is some level of security. There are scenarios where it matters Here we simply disagree.,

Utlimately pip maintainers decide how they want to communicate with their users. That's their right and well, you might argue as much as you want, but all you might have are opinions, and proposals. And decisions are not yours to make.

On a human level I have a suggestion to you @maltfield - before it gets any further. Remember there are other humans on the other side.

I propose you to consider that you've been listened to, your opinion was considered and rejected. Happens.

I think appreciating all the effort that maintainers do to make things working well (and every day better) in order you can use the software for free, often in their own personal time, away from their families, and things they get money for, I think appreciation of that is better than fighiting with them over minute and completely meaningless details in a long run.

If I may suggest something from my experience - even if I finally got the small thing I fought for - that was overall a bad idea for me to get confrontational here. If I regret something then was how short-sighted and stupid I was back then to get into that rabbit hole, and I would gladly go back in time and revert it.

IMHO you will get much more with accepting and trying to understand the other side an accepting that other people. might have different opinions and when they have a merit of the projects, they have the right to make decisions that are right for them and their users.

But, well that's my opinion, experience and advice - you might take it or not, up to you.

@maltfield
Copy link
Author

maltfield commented Feb 26, 2024

If this project was open to listening to proposals, this ticket wouldn't have been immediately closed (before some dialog).

The PyPI team is being rude by immediately closing something as "won't fix" when a user informs them of harm that they're causing users, and contributes to the project by taking the time to highlight bugs and their solutions. I am a human taking time out of my day to file this bug to better all python users.

We all make mistakes. Not all bugs are shallow. It shouldn't be considered rude to report bugs and advocate for harm reduction.

Please re-open this ticket.

@notatallshaw
Copy link
Member

The hash feature doesn't provide "some security". It provides zero security. Without a signature, It protects against corrupt downloads. That's not security.

I don't have an opinion on the larger conversation, but hashes protect you on recreation of environment but not on initial creation of environment.

For example if you had sourced your hashes before the 25th December for the PyTorch Dependency Confusion attack you would have been safe: https://pytorch.org/blog/compromised-nightly-dependency/

There are many other situations in which an attacker may be able to insert a version that matches but not a hash that matches.

This of course does not fully solve supply chain security, but it does protect against certain attack in some situations.

@maltfield
Copy link
Author

maltfield commented Feb 26, 2024

For example if you had sourced your hashes before the 25th December for the PyTorch Dependency Confusion attack you would have been safe: https://pytorch.org/blog/compromised-nightly-dependency/

AFAIK, the hashes downloaded "before the 25th of December" would still lack signatures. So, no, you would not have been safe from several supply chain vulnerabilities, including a publishing infrastructure compromise.

@notatallshaw
Copy link
Member

AFAIK, the hashes downloaded "before the 25th of December" would still lack signatures. So, no, you would not have been safe from several supply chain vulnerabilities, including a publishing infrastructure compromise.

You would have been safe from that attack, the attackers files had different hashes. Preventing a real world security attack, I wasn't talking about security as an abstract concept.

@potiuk
Copy link
Contributor

potiuk commented Feb 27, 2024

Also @maltfield since you are insisting, I would heartily recommend you to do your homework and rather than throw a bunch of random links and expressing your opinion about what "security of supply chain" is we should refer to standards.

I'd heartily recommend you to get familiar with this - widely accepted in the industry - standard describing security of supply chain: https://slsa.dev/spec/v1.0/levels

This standard described there, is not narrowly focusing on having or not having cryptographic signatures. Cryptographic signatures are only a small part of the supply chain security and actually when you focus exclusively (as you do) only on cryptographic signatures as a sign of "security" - you are making a huge mistake - which gives you a false sense of security. Because "Security of supply chain" is way, way, way more than that - those are processes, build platforms and wealth of other things - nicely described in the standard.

And I have a very interesting surprise for you. Regardless if the hashes are cryptographically signed or not - it is still L0 (zero) level in SLSA. Signing hashes or packages cryptographically on its own does not move a needle when it comes to a security level in SLSA. Go and check it yourself.

What DOES change it and moves it to L1 - is to have a verifiable way on determining how the packages were build. Which cryptographic signatures tell absolutely nothing about. Reproducible builds however, do. In fact, they do - and bring the SLSA level 1. And - this might be even more surprising to you @maltfield - it does not matter if the packages are signed at all to be at level 1:

Provenance may be incomplete and/or unsigned at L1. Higher levels require more complete and trustworthy provenance.

So if would use your arguments. YOUR solution cannot be named "security" either - because it does not provide the security - not even Level 1 of SLSA. Even more - anything that provides L1 cannot be named security, because there are L2, L3 levels that provide even more security (note - there are currently not known public platforms of any sort that provide level 3 security - though a number of platforms out there strive to achieve it).

So I think if take your way of thinking and rather than applying it to - pretty narrow and small part of "Supply security" about cryptographics signing being the only criteria to name something "Security", we should not name anything with "Security" - because it does not achieve highest level of security.

I suggest you do some more research and reading in this aspect - it might help you to expand your - currently pretty narrrowly focused on cryptographic signing - knowledge about supply chain security.

Security is like Ogres - it has layers. And many of them.

@maltfield
Copy link
Author

maltfield commented Feb 27, 2024

potluk, in my python project we have reproducible builds, hashes, and I cryptographically sign all my hashes.

Reproducible builds are important. If you're suggesting that we add a note to the documentation indicating that reproducible builds are an important aspect in supply chain security that PyPI is currently not enforcing, I would agree that is a valuable addition to the documentation.

Cryptographic signatures are necessary. Currently the documentation is spreading misinformation to developers, making them think that adding an unsigned hash to their download provides security.

I encounter a lot of devs that think hashes provide security. I think part of the problem is documentation like this. Let's fix the docs so that devs aren't misinformed, leaving their build process (and therefore all of their users) vulnerable.

Please re-open this issue so the documentation can be fixed.

@pfmoore
Copy link
Member

pfmoore commented Feb 27, 2024

Please re-open this issue so the documentation can be fixed.

Please, just stop. You've made your point, and been heard. We (the pip developers) don't agree with you. I'm sorry if that frustrates you, but it's a reality you have to deal with. Re-stating the same assertions won't change the outcome. Plenty of other people with security expertise have read pip's documentation and no-one other than you has tried to make the claims you've made here.

Is your only concern here with the word "secure"? Because if so, it is only used twice in that whole section. If all you wanted was to change those two occurrences to use a different word, you could have submitted a PR and done that. I don't personally think it's necessary, but we could have assessed a simple terminology change without all of the aggression and name-calling that your original post included.

Pip's documentation is not the place to educate users about security best practices. Nor is pip the only weak link in the Python package distribution chain. People are aware of the weaknesses, and working on them. Yelling specifically about one section of the pip documentation shows a stunning lack of awareness of the bigger picture here.

Edit: If you continue the discussion in the same way as you have up to now, I'm going to lock this issue as "too heated". If you wish to avoid that, moderate your future comments. You know by now the tone we expect from you - it's simply that you're "open, considerate and respectful" as stated in the code of conduct.

@maltfield
Copy link
Author

maltfield commented Feb 27, 2024

Yes, I'd like to submit a PR to improve this documentation. Generally, I think it's best-practice to:

  1. Submit a bug report
  2. Have some dialog with the devs about the issue
  3. Wait for the devs to indicate that the PR is welcome
  4. Do the work
  5. Submit the PR

Instead of the above expected process, this bug report was closed immediately without any dialog. That's not very encouraging to community members who want to contribute to improve the project.

Please re-open this issue so we can fix the bugs in the documentation, as described above.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
resolution: not a bug Determined as not a bug in pip
Projects
None yet
Development

No branches or pull requests

5 participants