Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create new Python Bug to Header Parsing Issue #41

Open
malvidin opened this issue Feb 26, 2020 · 7 comments
Open

Create new Python Bug to Header Parsing Issue #41

malvidin opened this issue Feb 26, 2020 · 7 comments
Assignees

Comments

@malvidin
Copy link
Contributor

malvidin commented Feb 26, 2020

In test_headeremail2list_2, it mentions Python bug 27257. However, Bug 27257 appears to be related to empty groups in the header, not issues with obsolete period. With Python 3.7, I do not have any issues with the decoded value, unless the eml_parser should include address groups.

with open(os.path.join(samples_dir, 'sample_bug27257.eml'), 'rb') as fhdl:

From the bug:

To: unlisted-recipients: ;,
""@pop.kundenserver.de (no To-header on input)
The current output below appears to be the expected output.
'to': ['@pop.kundenserver.de']

From the RFC:

To: A Group:Ed Jones c@a.test,joe@where.test,John jdoe@one.test;
Again, the current output below appears to be the expected output.
'to': ['c@a.test', 'joe@where.test', 'jdoe@one.test']

I have not found a related issue in the Python bug tracker, but perhaps something like the following in _header_value_parser.py would be appropriate to prevent the exception:

@sim0nx
Copy link
Member

sim0nx commented Feb 28, 2020

Thanks for your analysis.
I agree that 27257 does not seem to be related. I unfortunately don't recall this exactly, but I probably meant another one instead.

Regarding the workaround, this is still necessary though, same on 3.7 as on 3.8. I just retested it with the problematic sample included in the samples folder of this repo.

Regarding your suggestion, _header_value_parser is private so I can't include that one. I haven't tested it but from looking at that function I don't think it would solve the issue I am trying to workaround "Test.test@example.com".
Did you test this?
Would you be interested in making a pull-request ?

@sim0nx sim0nx self-assigned this Feb 28, 2020
malvidin added a commit to malvidin/cpython that referenced this issue Feb 28, 2020
When an email header address field contains a period directly before the angle bracket, the default header policy raises an Attribute error, rather than adding a header defect.

GOVCERT-LU/eml_parser#41
@malvidin
Copy link
Contributor Author

With the modification to the Python 3.7 email._header_value_parser.py, the following is my output. This causes test_headeremail2list_2 to fail, as intended, because the default Python header parser succeeds.

I created pull request 18687 to address this issue.
python/cpython#18687

>>> msg_test = email.message_from_string("""From:         John Doe.<john.doe@example.com>

Test e-mail. with a https://www.google.com:5000?test
""", policy=email.policy.default)

>>> msg_from = msg_test.get_all('from')
>>> print(msg_from[0].addresses[0].display_name, msg_from[0].addresses[0].addr_spec)
John Doe. john.doe@example.com

>>> print(json.dumps(eml_parser.eml_parser.parse_email(msg_test), indent=2, default=json_serial))
{
  "body": [
    {
      "uri_hash": [
        "ac6bb669e40e44a8d9f8f0c94dfc63734049dcf6219aac77f02edf94b9162c09"
      ],
      "content_header": {},
      "hash": "a46645c9d7598af7036fc173380b1bce4fe6a4e16313523e29e31cbee6eec6e2"
    }
  ],
  "header": {
    "subject": "",
    "from": "john.doe@example.com",
    "to": [],
    "date": "1970-01-01T00:00:00+00:00",
    "received": [],
    "header": {
      "from": [
        "\"John Doe.\" <john.doe@example.com>"
      ]
    }
  }
}

@sim0nx
Copy link
Member

sim0nx commented Mar 3, 2020

With the modification to the Python 3.7 email._header_value_parser.py, the following is my output. This causes test_headeremail2list_2 to fail, as intended, because the default Python header parser succeeds.

I created pull request 18687 to address this issue.
python/cpython#18687

Great! Thank you!

@malvidin
Copy link
Contributor Author

malvidin commented Mar 3, 2020

This appears to be related to this issue. The pull request I made only addresses one case, I'll look at addressing the other later this week.
https://bugs.python.org/issue30988

@malvidin
Copy link
Contributor Author

malvidin commented Mar 4, 2020

This pull addresses the issue more completely, so I closed my pull request.
python/cpython#15600

The following can be used to

import inspect
import email
import email.policy

display_name_source = inspect.getsource(email._header_value_parser)
header_parser_15600 = [
    ("if res[0][0].token_type == 'cfws':", 
     "if isinstance(res[0], TokenList) and res[0][0].token_type == 'cfws':"),
    ("if res[-1][-1].token_type == 'cfws':", 
     "if isinstance(res[-1], TokenList) and res[-1][-1].token_type == 'cfws':"),
    ('''
        if leader is not None:
            token[0][:0] = [leader]
            leader = None
        name_addr.append(token)
''', '''
        if leader is not None:
            if isinstance(token[0], TokenList):
                token[0][:0] = [leader]
            else:
                token[:0] = [leader]
            leader = None
        name_addr.append(token)
''')
]

display_name_source_new = display_name_source
for prev, fix in header_parser_15600:
    display_name_source_new = display_name_source_new.replace(prev, fix)

exec(display_name_source_new , email._header_value_parser.__dict__)


email.message_from_string("""From:         John Doe.<john.doe@example.com>
To: . Doe <jxd@example.com>

Test e-mail body.
""", policy=email.policy.default).items()

@malvidin
Copy link
Contributor Author

An upstream fix should be deployed, I'll try to find time to check this week.

python/cpython#15600

@malvidin
Copy link
Contributor Author

It is fixed in Python 3.13.0b1, so it should make it into Python 3.13 this fall.

tests\test_emlparser.py:250 (TestEMLParser.test_headeremail2list_2)
self = <tests.test_emlparser.TestEMLParser object at 0x00000000050F50F0>

    def test_headeremail2list_2(self) -> None:
        """Here we test the headeremail2list function using an input which should trigger
        a email library bug 27257
        """
        with pathlib.Path(samples_dir, 'sample_bug27257.eml').open('rb') as fhdl:
            raw_email = fhdl.read()
    
        msg = email.message_from_bytes(raw_email, policy=email.policy.default)
    
        # just to be sure we still hit bug 27257 (else there is no more need for the workaround)
>       with pytest.raises(AttributeError):
E       Failed: DID NOT RAISE <class 'AttributeError'>

test_emlparser.py:261: Failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants