-
Notifications
You must be signed in to change notification settings - Fork 518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.load() and FullLoader still vulnerable to fairly trivial RCE #420
Comments
I agree it should be well documented that FullLoader is not safe. A blacklist approach like the one I implemented in #386 is just too hard to maintain and although it can always be improved, it is just easier if users don't expect any kind of security protection from it. |
Is a CVE going to be assigned for this? Also, to "avoid" these kind of problems in the future (and dealing with CVEs and stuff) I think it should be made clear that FullLoader is not safe. Because as it is right now, a user may think it is a good alternative to safe_load while it is not. |
I have not filed for a new CVE, but I contacted RedHat to see if the previous one (CVE-2020-1747) could be updated (I don't know if this is standard protocol here or not) +1 on making it very clear that FullLoader is not safe. As of right now, I believe the only mention of FullLoader is in the page https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation , which claims that it |
No, that is not possible. This should get a new CVE, with a wording like "Incomplete fix for CVE-2020-1747 still allows to execute arbitrary code through FullLoader", or something like that. |
@ingydotnet @perlpunk @arxenix Did any of you already requested a CVE (except to Red Hat)? If not, Red Hat can assign the CVE for this. Also, @ingydotnet what do you intend to do with regard to documenting the unsafe behaviour of FullLoader? |
My personal preference would still be to make SafeLoader the default. |
@ret2libc I sent a request for a new CVE ID to RedHat already |
I was considering pulling |
I know, I am from Red Hat. I just want to make sure none else has already requested the CVE to someone else. @ingydotnet @perlpunk did you? Otherwise, I'm going to assign the CVE from the Red Hat pool. |
@ret2libc no, I did not request one. |
@ret2libc no. |
@perlpunk I don't see defaulting to SafeLoader as an option. We have no idea how much code that would break. ... Let's step back a sec and look at the problem we are trying to solve. PyYAML was loud and clear in its doc from the first release that PyYAML (a serialization module) had the same vulnerability landscape as Pickle. ie It is not safe to load data from a source you cannot trust. Bad things could happen.
Somewhat ironically, we are encouraged and don't bat an eye about running Who are we trying to protect? This is a literal question. I would like to see the actual applications that are affected. Show me the people that are asking for untraceable YAML from unknown sources and loading it for them with PyYAML. This "became my problem" when a CVE was filed against PyYAML for something that was effectively "people can get hurt when they use your dining utensils, and don't read the instructions and use it to eat things from a dumpster". Not that I cared about the CVE, since I felt exactly the same as I do now, but the CVE triggered all kinds of production systems. I was forced to take an action that was unnecessary. I never knew who filed the CVE or how to respond to them. I never got a confirmation that the original CVE was closed. I tried to find a middle ground with warnings to read the docs, be explicit in your code, and provide a safer default. I agree that the FullLoader safety is proving to be not that useful. I'm currently leaning towards making FullLoader an alias for UnsafeLoader, keeping that as the default, documenting it, and calling it a day. |
@ingydotnet it's still possible to get arbitrary code execution with only
Ultimately, it's your choice what you decide to do with the library, but let me state my opinions. I definitely have seen projects in the wild that are loading YAML via Developers tend to be lazy, no one wants to read the docs. This is why it's important to follow the principle of having secure defaults. It's important for a library to attempt to protect its users (even if they dont read :p) Here's a quote from the ReactJS (popular facebook-made frontend library) documentation which explains their reasoning for their function
My observations are that the mental model that many developers have of YAML is that it's a simple data interchange format exactly like JSON. Not a complex serialization language. In the same way they don't expect I also believe that it is okay to break backwards compatibility in favor of security. How many people are really relying on PyYAML's ability to serialize complex objects? I don't have too much insight into this, but my thoughts are -- not many. From some quick Github code search results that I did, there are ~762k files that use PyYAML. Of those, up to 529k files are currently using the default FullLoader as the loading mechanism, which is vulnerable to arbitrary code execution. 220k call safe_load or specify SafeLoader, and only 13k explicitly use unsafe_load or specify UnsafeLoader. |
I'm not a python expert, but I think that FullLoader in its current state already prevents (de)serializing most objects as they are dumped with Regarding the default: The problem I see is that most use cases for YAML do not involve serializing objects. Many people simply do not expect the default I think anything different from the default YAML Core Schema (or the basic 1.1 types) should be optional for any YAML implementation. Yes, it is documented that PyYAML does this, and yes, changing it will break more things (additionally to code that already broke). |
This was assigned CVE-2020-14343 |
I am thinking about how to move towards safe_load as the default load() action. I need more time to plan that. @arxenix, just wondering, can you think of an exploit that involves just
prints:
|
@ingydotnet if it's limited to only those two tags ( But the moment you start allowing classes other than If you're concerned about breaking too many projects, maybe a reasonable thing could be to only allow classes from the My recommendation would be to make FullLoader not allow |
Also, note that it is written nowhere that FullLoader should not be used on untrusted input (or at least I could not find it). Actually, the documentation says that FullLoader does not execute arbitrary code, so that is why it becomes, again, "your problem" and a CVE-worthy issue. By saying Independently of what you choose for the default load method, I suggest making this point clear in the documentation. |
Have you considered signing (HMACing) the serialized blob to ensure at unserialization time that it is trusted (has been serialized by someone who had the key)? This would solve the security problem and may be done in a way that the current parser ignores... so that existing apps keep functioning and no behaviour change is necessary. This is a very common way (for other frameworks in other languages) to solve the security problem. |
@ret2libc Agreed. I'll update the doc this weekend, even if it takes longer to decide how to play the whole thing. |
This also flagged from our side, thanks for looking at this. Just to understand, does that mean the fix will just be mentioning it on the documentation? Would it be posisble to rename the method as "insecure_full_load"? Conscious that just adding this in the docs would not be enough to actually remove this from being a common vulnerability. Edit: Just caught up with the thread. It sounds like it may require further assessment given projects that would be affected by the breaking change caused by updating the top level load function. One question, are there currently any capabilities for potential feature flags that could introduce backwards compatibility if this breaking changes was to be introduced? At least until the next major update (e.g. env vars, config file vars, build params, etc) |
Hi, is there any news about this? |
Hello, |
I have an idea of how I'd like to proceed. I'll write about it here this week, and also update the wiki page. |
RCE resolved in new version yaml/pyyaml#420
RCE resolved in new version yaml/pyyaml#420
RCE resolved in new version yaml/pyyaml#420
PyYAML 5.4 was released a couple of days ago with a fix for: - https://ubuntu.com/security/CVE-2020-14343 - yaml/pyyaml#420 - https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation The changes otherwise appear to be backwards compatible: - https://github.com/yaml/pyyaml/blob/5.4.1/CHANGES Being able to use a later version is important for companies that have automatic dependency scanning for CVEs.
5.3.1 fixed partially vulnerabilities disclosed in CVE-2020-1747. A complete fix was debated at yaml/pyyaml#420 and eventually got patched in 5.4.1 Changeset: yaml/pyyaml@3.12...5.4.1 Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>
5.3.1 fixed partially vulnerabilities disclosed in CVE-2020-1747. A complete fix was debated at yaml/pyyaml#420 and eventually got patched in 5.4.1 Changeset: yaml/pyyaml@3.12...5.4.1 Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>
5.3.1 fixed partially vulnerabilities disclosed in CVE-2020-1747. A complete fix was debated at yaml/pyyaml#420 and eventually got patched in 5.4.1 Changeset: yaml/pyyaml@3.12...5.4.1 Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>
5.3.1 fixed partially vulnerabilities disclosed in CVE-2020-1747. A complete fix was debated at yaml/pyyaml#420 and eventually got patched in 5.4.1 Changeset: yaml/pyyaml@3.12...5.4.1 Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>
5.3.1 fixed partially vulnerabilities disclosed in CVE-2020-1747. A complete fix was debated at yaml/pyyaml#420 and eventually got patched in 5.4.1 Changeset: yaml/pyyaml@3.12...5.4.1 Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>
5.3.1 fixed partially vulnerabilities disclosed in CVE-2020-1747. A complete fix was debated at yaml/pyyaml#420 and eventually got patched in 5.4.1 Changeset: yaml/pyyaml@3.12...5.4.1 Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>
Any updates on when this might be fixed ? |
6.0 is ready for beta now. Will go out this week. |
Original commit: a001f27 Per suggestion yaml#420 (comment) move a few constructors from full_load to unsafe_load.
Original commit: a001f27 Per suggestion yaml#420 (comment) move a few constructors from full_load to unsafe_load.
5.3.1 fixed partially vulnerabilities disclosed in CVE-2020-1747. A complete fix was debated at yaml/pyyaml#420 and eventually got patched in 5.4.1 Changeset: yaml/pyyaml@3.12...5.4.1 Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>
As of 5.3.1 .load() defaults to using FullLoader and FullLoader is still vulnerable to RCE when run on untrusted input. As demonstrated by the examples below, #386 was not enough to fix this issue.
Some example payloads:
I do not believe this is entirely fixable unless PyYAML decides to use secure defaults, and make .load() equivalent to .safe_load() ( #5 )
FullLoader should probably be removed, as I don't see the purpose of it.
The text was updated successfully, but these errors were encountered: