Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CVE-2023-27043] Parsing errors in email/_parseaddr.py lead to incorrect value in email address part of tuple #102988

Closed
tdwyer opened this issue Mar 24, 2023 · 31 comments
Assignees
Labels
3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes 3.12 bugs and security fixes stdlib Python modules in the Lib dir topic-email type-bug An unexpected behavior, bug, or error type-security A security issue

Comments

@tdwyer
Copy link
Contributor

tdwyer commented Mar 24, 2023

Bug report

I have discovered a way to cause email.utils.parsaddr() and email.utils.getaddresses() to erroneously return the Real Name portion of a RFC 2822 Address Header in the Email Address portion of the returned tuple. This vulnerability enables me to bypass security systems which allow or deny access or actions based on the email address.

Example one, this is an issue if the system should only allow for accounts to be created for email addresses which are part of a whitelist e.g. only government emails, or a blacklist e.g. block attacker@example.com. If ether parseaddr() or getaddresses() is used to parse the Address Header and the Email Address part of the returned tuple is checked ageist the whitelist/blacklist, it would be checking the value in the Real Name part of of the Address Header instead of the Email Address. Then, to finish account registration the system sends the reply to the full Address Header, which is normally the case, the reply would be sent to the actual Email Address and not the Real Name part of the header.

This is similar to the type of attack which was used to exploit the French governments Tchap chat system a few years back.

Tchap: The super (not) secure app of the French government
https://medium.com/@fs0c131y/tchap-the-super-not-secure-app-of-the-french-government-84b31517d144

Example two, this would also be an issue if an email security system is written in Python and uses parseaddr() or getaddresses() to parse the Address Header of incoming email and then checks the Email Address part of the tuple against a whitelist/blacklist. I can use this vulnerability to trick that system into checking the Real Name part of the Address Header instead of the actual Email Address which would have been blocked. Then after this, the email system which actually replies to the email dose not use Python's parseaddr() or getaddresses() functions to get the email address, or simply uses the full Address Header to send a reply.

For all of these test cases Golang returns a Parsing Error, but all versions of Python including Python 2, Python 3, and the most resent version in Github all result in the Real Name part of the address being returned in the Email Address part of the tuple.

NOTE: This issue has been assigned: CVE-2023-27043
Link: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-27043

[VulnerabilityType Other]
CWE-1286: Improper Validation of Syntactic Correctness of Input

I have been in communication with security@python.org about this issue, and was told they are fine with addressing this issue publicly. Additionally, I have created a PR which will detect these parsing errors and return an empty tuple to indicate a parsing error instead of returning the erroneous output.

Example Script to show this erroneous output:

from email.utils import parseaddr
rfc2822_ADDRESS = "Thomas Dwyer<tdwyer@example.org>"
print("Valid RFC2822 Address Parse Output")
parseaddr(rfc2822_ADDRESS)
print()
print("Test output...")
print()
specials = '()<>@,:;.\"[]'
a = "alice@example.org"
b = "<bob@example.org>"
for i in specials:
    c = a + i + b
    print(c)
    parseaddr(c)

Example Golang program which shows that Golang returns a parsing error:

package main

import (
 "fmt"
 "net/mail"
)

func main() {
   fmt.Println((&mail.AddressParser{}).Parse("Thomas Dwyer <thomas@example.org>"))
   fmt.Println((&mail.AddressParser{}).Parse("thomas@example.org"))
   fmt.Println((&mail.AddressParser{}).Parse("alice@example.org<bob@example.org>"))
   fmt.Println((&mail.AddressParser{}).Parse("alice@example.org(<bob@example.org>"))
   fmt.Println((&mail.AddressParser{}).Parse("alice@example.org)<bob@example.org>"))
   fmt.Println((&mail.AddressParser{}).Parse("alice@example.org<<bob@example.org>"))
   fmt.Println((&mail.AddressParser{}).Parse("alice@example.org><bob@example.org>"))
   fmt.Println((&mail.AddressParser{}).Parse("alice@example.org@<bob@example.org>"))
   fmt.Println((&mail.AddressParser{}).Parse("alice@example.org,bob@example.org>"))
   fmt.Println((&mail.AddressParser{}).Parse("alice@example.org:<bob@example.org>"))
   fmt.Println((&mail.AddressParser{}).Parse("alice@example.org;<bob@example.org>"))
   fmt.Println((&mail.AddressParser{}).Parse("alice@example.org.<bob@example.org>"))
   fmt.Println((&mail.AddressParser{}).Parse(`alice@example.org"<bob@example.org>`))
   fmt.Println((&mail.AddressParser{}).Parse("alice@example.org[<bob@example.org>"))
   fmt.Println((&mail.AddressParser{}).Parse("alice@example.org]<bob@example.org>"))
   fmt.Println((&mail.AddressParser{}).Parse("alice@example.org<bob@example.org>"))
}

Your environment

  • CPython versions tested on: All versions of Python

Linked PRs

@bitdancer
Copy link
Member

These addresses are problematic. Take, for example, alice@example.org(bob@example.org. The new email parser views this as a valid address (alice@example.org) followed by an incomplete comment, which is the correct parsing of that string per the RFCs. alice@example.org@bob@example.org on the other hand is treated as an invalid address because of the extra @. I haven't actually looked at the others to see what the new parser does, but in general these look like valid addresses followed by junk (some of which also happens to be a valid address).

In general, as has been pointed out repeatedly, if you depend on the content of the address fields of an email message for security you are going to get burnt; we can't fix that.

I personally don't care a lot about the old API, but "fixing" this could break applications, since addresses that were previously considered valid (and arguably are valid in at least one case) will be rejected. I think the risk is low, but the benefit is low as well. In the new API, these should all have non-empty defects (might want to check that), so running with policy(raise_on_defect=True) or checking the defects attribute of the header will give you as much protection as there is to be had.

Now, you will note that you can easily put "From: Alice alice@example.org" as the from header on a message that did not come from alice@examp.org. So this fix does not provide any useful protection against your example 2. Being able to exploit "put the real address in the display name, some junk, and my address at the end" requires that the email system doing the replying treat the address differently than python and chooses to use the trailing part of the invalid address as the one to send to rather than the leading valid address that is followed by junk. How likely is that? Have you run tests? Do you have examples of this occurring? I suppose an email system could choose to look first at the address in angle brackets, especially if it is regex based rather than a standard RFC tokenizing parser.

So, I can see an argument to be made that we shouldn't apply postel's rule (be liberal in what you accept) so strongly here, and should in fact treat any address that does not end in a valid character as invalidating that address. I don't know what the backward compatibility consequences of that would be, but if you implement this fix maybe we'll find out. If it turns out not to be a problem, we could change the rules in the new parser as well, and invalidate the address rather than just registering a defect (for most of these examples, at least).

@tdwyer
Copy link
Contributor Author

tdwyer commented Mar 24, 2023

Hello, I agree that not all of the cases being checked in the example python script are a problem. In the case of adding a ,, that is actually how a list of addresses is delimitated so for getaddresses() to return 2 address tuples is valid. However, parseaddr() is only for parsing a single address. So much so, that it is hard coded to only return the first element of the list returned by the parser return addrs[0]. The main solution for this in parseaddr() in my PR is to check if the returned list of parsed addresses is longer than one. The proposed solution in my PR for getaddresses() is to count the number of addresses which are in the input to be the same as the number of addresses returned by the parser. Additionally, it will check for a [ in the returned email address and replace that with an empty tuple instead. I even added all of the wacky address header examples from RFC 2822 to the test_email.py file. With my change in place, all valid inputs are still parsed correctly and all tests pass.

#102990

For all of the cases being checked, they all cause the parser to return both alice address and the bob address in the email address part of their own tuple. So, if the alice@example.com(bob@example.org is a valid address followed by an incomplete comment, why should it return the incomplete comment in the email address part of it's own tuple? When Golang parses that address it returns a parsing error <nil> mail: misformatted parenthetical comment

Python getaddresses() output which puts the misformatted parenthetical comment in the email address part of an additional tuple:

[('', 'alice@example.com'), ('', 'bob@example.com')]

My feeling is that if what is put in the email address part of the tuple is not a valid email address, it should return a parsing error instead.

As for the attacks, the email address may not have actually come in via an email. For example, if users signup for an account with an email address, that email address may be put into a HTML form field. If parseaddr() is used to parse this and check the email address returned with a whitelist, it will check alice@example.com and pass but signup emails would be sent to bob@exmaple.com.

This was the case for the Tchap system

Tchap: The super (not) secure app of the French government
https://medium.com/@fs0c131y/tchap-the-super-not-secure-app-of-the-french-government-84b31517d144

The need to validate email addresses for things like account signup is very common practice. Even the company I work for allows for user signup with an email address, and we used parseaddr() to check the addresses with a blocklist. In retrospect, the attack examples I provided in the original post are not good. I should have described these 2 attacks instead. I agree that an email server can send whatever it likes in any of the email headers. However, in the case of account signup the attack only takes place if the email address is a valid email address where the attacker can receive the signup completion link.

Golang will return a parsing error indicating that when these headers are parsed they are returning more than a single address.

"Thomas Dwyer" <thomas@example.org> <nil>
<thomas@example.org> <nil>
<nil> mail: expected single address, got "<bob@example.org>"
<nil> mail: misformatted parenthetical comment
<nil> mail: expected single address, got ")<bob@example.org>"
<nil> mail: expected single address, got "<<bob@example.org>"
<nil> mail: expected single address, got "><bob@example.org>"
<nil> mail: expected single address, got "@<bob@example.org>"
<nil> mail: expected single address, got ",bob@example.org>"
<nil> mail: expected single address, got ":<bob@example.org>"
<nil> mail: expected single address, got ";<bob@example.org>"
"alice@example.org." <bob@example.org> <nil>
<nil> mail: expected single address, got "\"<bob@example.org>"
<nil> mail: expected single address, got "[<bob@example.org>"
<nil> mail: expected single address, got "]<bob@example.org>"
<nil> mail: expected single address, got "<bob@example.org>"

@gpshead gpshead added the type-security A security issue label Mar 27, 2023
@hhucke
Copy link

hhucke commented Apr 24, 2023

Erm, @bitdancer ...

in the relevant RFCs (here likely the RFC 2822) you usually have EBNFs expressing the grammar to follow. And according to this one there is no such thing as an "incomplete comment".
Either a string satisfies the grammar and is therewith valid or it doesn't and is therewith something which should throw an error or at least a "parsing error".
IMHO its as easy as that.

@bitdancer
Copy link
Member

Look up Postel's rule. The grammar doesn't recognize an incomplete comment as a valid comment, but that doesn't change the fact that syntactically it is an incomplete comment. But I've already agreed that I think we probably shouldn't apply Postel's rule so strongly in this case.

Also, we never throw parsing errors, we only register defects. This is a hard rule for the email parser, and there are many practical reasons for this. What we'd be changing here is whether or not we extract valid addresses out of questionable strings or treat them as invalid addresses and not return them. Which is what you are proposing, and I am in agreement, even though I worry a little bit about backward compatibility (breaking applications that currently handle the broken output of some mail programs that is not intentionally malicious).

@tdwyer
Copy link
Contributor Author

tdwyer commented Apr 25, 2023

First, I'd like to say, my PR added even more checks to Lib/test/test_email/test_email.py to check all of the wacky examples in RFC 2822 and all of those and all of the current tests pass. All correctly formatted addresses are still parsed correctly after my fix is applied.

PR: #102990

I'd also like to address the concerns about returning an empty 2-tuple ('', '') to indicate a parsing error. The comment for the def parseaddr(adds): function in Lib/email/utils.py says that is the correct thing to do

Link:

def parseaddr(addr):

def parseaddr(addr):
    """
    Parse addr into its constituent realname and email address parts.

    Return a tuple of realname and email address, unless the parse fails, in
    which case return a 2-tuple of ('', '').
    """

This issue is actually caused by a couple of bugs.

  1. The code should detect incorrect syntax and return a 2-tuple of ('', '') to indicate that parsing has failed
  2. There are 2 functions to fix, parseaddr() and getaddresses(). Both of these functions need to check for syntax errors. However, the parseaddr() also needs to check for invalid input. That function should only be passed in a single address to check. However, if it is given a list of addresses it is hard coded to just returns the first 2-tuple from the list. What it should do is return ('', '') here too or maybe this one could actually return a real Error. In my PR, I have it check for this and return ('', ''). This is part of what causes the issue in parseaddr() because when these invalid addresses are passed in, the bug causes these to be parsed as a list of addresses each with one part in the Email Address part of the tuple. The first 2-tuple of that list is the one which contains the Real Name part in the Email Address part of the 2-tuple.

The fix for CVE-2019-16056 which was very similar to this CVE-2023-27043 ... was to return an empty 2-tuple ('', '')
Link: https://bugs.python.org/issue34155#msg347157

Code example
Link:

elif self.field[self.pos] == '@':

elif self.field[self.pos] == '@':
    # bpo-34155: Don't parse domains with two `@` like
    # `a@malicious.org@important.com`.
    return EMPTYSTRING

Additionally, the code return ('', '') if it failed to parse out a valid domain
Link:

if not domain:

if not domain:
    # Invalid domain, return an empty address instead of returning a
    # local part to denote failed parsing.
    return EMPTYSTRING

@gpshead gpshead added 3.11 only security fixes 3.10 only security fixes 3.9 only security fixes 3.8 (EOL) end of life 3.7 (EOL) end of life 3.12 bugs and security fixes labels Apr 28, 2023
@gpshead gpshead self-assigned this Apr 28, 2023
@vstinner vstinner changed the title Parsing errors in email/_parseaddr.py lead to incorrect value in email address part of tuple [CVE-2023-27043] Parsing errors in email/_parseaddr.py lead to incorrect value in email address part of tuple May 9, 2023
@vstinner
Copy link
Member

vstinner commented May 9, 2023

I created https://python-security.readthedocs.io/vuln/email-parseaddr-realname.html to track fixes of this issue.

@tdwyer
Copy link
Contributor Author

tdwyer commented May 31, 2023

@gpshead I had to create a new PR because I messed up Git when trying to push the changes: #105127

For some reason, the new PR is not being picked up by this Issue. I do have a link to this Issue in the PR Description. I even tried creating a new PR with the Branch name issues102988 like before, but that didn't get linked ether.

Oh, it is linked. I was looking at the wrong spot on this page.

tdwyer added a commit to tdwyer/cpython that referenced this issue May 31, 2023
… tupleto indicate the parsing error (old API)
@gpshead gpshead removed the 3.7 (EOL) end of life label Jul 10, 2023
gpshead added a commit that referenced this issue Jul 10, 2023
… to indicate the parsing error (old API) (#105127)

Detect email address parsing errors and return empty tuple to indicate the parsing error (old API). This fixes or at least ameliorates CVE-2023-27043.

---------

Co-authored-by: Gregory P. Smith <greg@krypto.org>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 10, 2023
… tuple to indicate the parsing error (old API) (pythonGH-105127)

Detect email address parsing errors and return empty tuple to indicate the parsing error (old API). This fixes or at least ameliorates CVE-2023-27043.

---------

(cherry picked from commit 18dfbd0)

Co-authored-by: Thomas Dwyer <github@tomd.tel>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
gpshead added a commit that referenced this issue Jul 10, 2023
…y tuple to indicate the parsing error (old API) (GH-105127) (#106612)

gh-102988: Detect email address parsing errors and return empty tuple to indicate the parsing error (old API) (GH-105127)

Detect email address parsing errors and return empty tuple to indicate the parsing error (old API). This fixes or at least ameliorates CVE-2023-27043.

---------

(cherry picked from commit 18dfbd0)

Co-authored-by: Thomas Dwyer <github@tomd.tel>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
hrnciar pushed a commit to fedora-python/cpython that referenced this issue Aug 7, 2024
…n email.parseaddr() (python#111116)

Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

Co-Authored-By: Thomas Dwyer <github@tomd.tel>
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
….parseaddr() (python#111116)

Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

Co-Authored-By: Thomas Dwyer <github@tomd.tel>
encukou pushed a commit to encukou/cpython that referenced this issue Sep 6, 2024
…n email.parseaddr() (pythonGH-111116)

Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

Co-Authored-By: Thomas Dwyer <github@tomd.tel>
(cherry picked from commit 4a153a1)
encukou added a commit to encukou/cpython that referenced this issue Sep 6, 2024
…n email.parseaddr() (pythonGH-111116)

Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

(cherry picked from commit 4a153a1)

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-Authored-By: Thomas Dwyer <github@tomd.tel>
encukou pushed a commit to encukou/cpython that referenced this issue Sep 6, 2024
…n email.parseaddr() (pythonGH-111116)

Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

Co-Authored-By: Thomas Dwyer <github@tomd.tel>
(cherry picked from commit 4a153a1)
encukou pushed a commit to encukou/cpython that referenced this issue Sep 6, 2024
…n email.parseaddr() (pythonGH-111116)

Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

(cherry picked from commit 4a153a1)

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-Authored-By: Thomas Dwyer <github@tomd.tel>
encukou pushed a commit to encukou/cpython that referenced this issue Sep 6, 2024
…n email.parseaddr() (pythonGH-111116)

Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

(cherry picked from commit 4a153a1)

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-Authored-By: Thomas Dwyer <github@tomd.tel>
ambv pushed a commit that referenced this issue Sep 6, 2024
…l.parseaddr() (GH-111116) (#123766)

Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

Co-Authored-By: Thomas Dwyer <github@tomd.tel>
(cherry picked from commit 4a153a1)

Co-authored-by: Victor Stinner <vstinner@python.org>
ambv pushed a commit that referenced this issue Sep 6, 2024
…l.parseaddr() (GH-111116) (#123767)

Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

(cherry picked from commit 4a153a1)

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Thomas Dwyer <github@tomd.tel>
encukou pushed a commit to encukou/cpython that referenced this issue Sep 6, 2024
… email.parseaddr() (pythonGH-111116)

Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

(cherry picked from commit 4a153a1)

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-Authored-By: Thomas Dwyer <github@tomd.tel>
encukou pushed a commit to encukou/cpython that referenced this issue Sep 6, 2024
… email.parseaddr() (pythonGH-111116)

Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

(cherry picked from commit 4a153a1)

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-Authored-By: Thomas Dwyer <github@tomd.tel>
encukou added a commit to encukou/cpython that referenced this issue Sep 6, 2024
encukou added a commit to encukou/cpython that referenced this issue Sep 6, 2024
encukou added a commit to encukou/cpython that referenced this issue Sep 6, 2024
encukou added a commit to encukou/cpython that referenced this issue Sep 6, 2024
ambv pushed a commit that referenced this issue Sep 6, 2024
….parseaddr() (GH-111116) (#123769)

Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

(cherry picked from commit 4a153a1)

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-Authored-By: Thomas Dwyer <github@tomd.tel>
ambv pushed a commit that referenced this issue Sep 6, 2024
….parseaddr() (GH-111116) (#123770)

Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

(cherry picked from commit 4a153a1)

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-Authored-By: Thomas Dwyer <github@tomd.tel>
ambv pushed a commit that referenced this issue Sep 6, 2024
…l.parseaddr() (GH-111116) (#123768)

Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

(cherry picked from commit 4a153a1)

Co-authored-by: Victor Stinner <vstinner@python.org>
Co-Authored-By: Thomas Dwyer <github@tomd.tel>
@ambv ambv closed this as completed Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes 3.12 bugs and security fixes stdlib Python modules in the Lib dir topic-email type-bug An unexpected behavior, bug, or error type-security A security issue
Projects
None yet
Development

No branches or pull requests