Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fast_mail_parser should return all headers #12

Open
bra-fsn opened this issue Sep 16, 2024 · 0 comments
Open

fast_mail_parser should return all headers #12

bra-fsn opened this issue Sep 16, 2024 · 0 comments

Comments

@bra-fsn
Copy link

bra-fsn commented Sep 16, 2024

Currently, fast_mail_parser return only the last header value if there are multiple headers with the same key (like Received, which is nearly almost the case).

An example program with Python's build-in parser:

# Hardcoded email data (for the sake of example, the email data is included here as a raw string)
email_data = """\
From: sender@example.com
To: recipient@example.com
Subject: Test Email
Date: Mon, 13 Sep 2024 10:00:00 +0200
Received: from mail.example.com (mail.example.com [192.0.2.1])
        by smtp.example.com with ESMTP id abc123
        for <recipient@example.com>; Mon, 13 Sep 2024 09:55:00 +0200
Received: from smtp2.example.com (smtp2.example.com [192.0.2.2])
        by mail.example.com with ESMTP id def456
        for <recipient@example.com>; Mon, 13 Sep 2024 09:50:00 +0200
Received: from relay.example.com (relay.example.com [192.0.2.3])
        by smtp2.example.com with ESMTP id ghi789
        for <recipient@example.com>; Mon, 13 Sep 2024 09:45:00 +0200
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit

This is a test email.
"""

import email
from email import policy
from email.parser import BytesParser
from io import BytesIO

def parse_received_headers(email_data):
    """Parse the Received: headers from a hardcoded EML string."""
    # Convert the email string to bytes (as we would normally read an EML file in bytes)
    email_bytes = BytesIO(email_data.encode('utf-8'))

    # Parse the email message
    email_message = BytesParser(policy=policy.default).parse(email_bytes)

    # Extract Received headers in the order they appear
    received_headers = []
    for header, value in email_message.items():
        if header.lower() == 'received':
            received_headers.append(value)

    return received_headers

# Call the function with the hardcoded email
received_headers = parse_received_headers(email_data)

# Print out the Received headers in order
for i, header in enumerate(received_headers, 1):
    print(f"Received Header {i}:\n{header}\n")
$ python test.py
Received Header 1:
from mail.example.com (mail.example.com [192.0.2.1])        by smtp.example.com with ESMTP id abc123        for <recipient@example.com>; Mon, 13 Sep 2024 09:55:00 +0200

Received Header 2:
from smtp2.example.com (smtp2.example.com [192.0.2.2])        by mail.example.com with ESMTP id def456        for <recipient@example.com>; Mon, 13 Sep 2024 09:50:00 +0200

Received Header 3:
from relay.example.com (relay.example.com [192.0.2.3])        by smtp2.example.com with ESMTP id ghi789        for <recipient@example.com>; Mon, 13 Sep 2024 09:45:00 +0200

and the same with fast_mail_parser:

email_data = """\
From: sender@example.com
To: recipient@example.com
Subject: Test Email
Date: Mon, 13 Sep 2024 10:00:00 +0200
Received: from mail.example.com (mail.example.com [192.0.2.1])
        by smtp.example.com with ESMTP id abc123
        for <recipient@example.com>; Mon, 13 Sep 2024 09:55:00 +0200
Received: from smtp2.example.com (smtp2.example.com [192.0.2.2])
        by mail.example.com with ESMTP id def456
        for <recipient@example.com>; Mon, 13 Sep 2024 09:50:00 +0200
Received: from relay.example.com (relay.example.com [192.0.2.3])
        by smtp2.example.com with ESMTP id ghi789
        for <recipient@example.com>; Mon, 13 Sep 2024 09:45:00 +0200
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit

This is a test email.
"""

from fast_mail_parser import parse_email, ParseError
from pprint import pprint as pp
email = parse_email(email_data)
pp(email.headers)
$ python test-fmp.py
{'Content-Transfer-Encoding': '7bit',
 'Content-Type': 'text/plain; charset="utf-8"',
 'Date': 'Mon, 13 Sep 2024 10:00:00 +0200',
 'From': 'sender@example.com',
 'MIME-Version': '1.0',
 'Received': 'from relay.example.com (relay.example.com [192.0.2.3]) by '
             'smtp2.example.com with ESMTP id ghi789 for '
             '<recipient@example.com>; Mon, 13 Sep 2024 09:45:00 +0200',
 'Subject': 'Test Email',
 'To': 'recipient@example.com'}

Specifically for the Received header, the most significant one if the first, but fast_mail_parser returns only the last one. Could you please add (not to break compatibility) another header representation, which correctly lists all headers, maybe with a list of tuples?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant