Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signed messages only show one attachment, the smime.p7m file itself #223

Closed
2 of 4 tasks
z3r0privacy opened this issue Jan 11, 2022 · 10 comments
Closed
2 of 4 tasks
Labels
Accepted This feature request has been accepted and will be developed Complete This feature has been fully implemented. enhancement

Comments

@z3r0privacy
Copy link

Bug Metadata

  • Version of extract_msg: 0.28.7
  • Your python version: Python 3.7.2
  • How did you launch extract_msg?
    • My command line or
    • I used the extract_msg package

Describe the bug
When opening a msg-file that is digitally signed and has attachements, the Message() object will only show a single attachment which happens to be the signature (smime.p7m). The other attachment(s) are not loaded or shown.

What code did you use or can we use to reproduce this error?

# mail.msg is a digitally signed mail with a pdf attachment
import extract_msg
msg = extract_msg.Message("path/to/mail.msg")
len(msg.attachments)  # yields 1
msg.attachments[0].longFilename  # yields smime.p7m
# msg.attachments[1] throws list index out of range, therefore the pdf cannot be accessed
msg.body  # yields the correct body

Is there a message.msg file you want to share to help us reproduce this?

  • Uploaded message example_msg.zip
  • Emailed message as an attachment to admins: [Enter Subject Line Here]

Traceback

No useful Traceback received

Screenshots
None

Additional context

  • The signature of the provided example is invalid since it is a selfsigned certificate
  • Therefore, you may get a warning when opening the file with Outlook
@TheElementalOfDestruction
Copy link
Collaborator

After looking at it the pdf is embedded inside that file you have, inside of a base64 stream. In fact, it looks like that stream contains the entire data section of the msg, in some kind of plain text format that I'm not familiar with. That'll take some time to write something that can actually read through that file, unless I can manage to find something that can already do so.

Take a look at the data of that attachment and You'll be able to see what I mean, as it's just plain text.

Not exactly an error, just something annoying the have to deal with that wasn't known about in the past.

@TheElementalOfDestruction TheElementalOfDestruction added Accepted This feature request has been accepted and will be developed enhancement labels Jan 12, 2022
@TheElementalOfDestruction
Copy link
Collaborator

If you can confirm that the content of a similar message that you sign comes out looking to be the same format as this one, then I can see about properly adding support for it. Probably won't be included in 0.29.0 but may be included in a version of it.

Also if you happen to know something in python that can already handle it then let me know and I can see about trying to quickly add support to 0.29.0

@z3r0privacy
Copy link
Author

Thanks for your feedback. I looked through some other similar mails. It seems like it's always the same format.

  1. Some header information
  2. Metadata about the content to be followed (format, encoding, etc)
  3. The data (in some samples, the HTML content of the message was there in plain HTML, in some it was base64 encoded)
  4. More content

Maybe this helps? https://tools.ietf.org/doc/python-m2crypto/howto.smime.html
And this RFC looks a lot like what I have seen: https://www.rfc-editor.org/rfc/rfc8551#section-3.5.3.3

@TheElementalOfDestruction
Copy link
Collaborator

Not sure yet if the attachments will always be in the right place to appropriately just grab from an index, but I'll take a look at the other stuff when I have a chance, thanks.

@TheElementalOfDestruction
Copy link
Collaborator

Yeah, this is rather complicated to do as it looks like it will require the use of another dependency, although I would probably only add this one as a soft dependency rather than a hard one. It also looks like decryption of anything that uses the signing would take a long time to implement as it would need to be able to appropriately load your keys.

It does look like the email module can at least vaguely parse the mime data, although using it so far has been rather difficult. Trying to get attachments has not yet succeeded in allowing me access to the PDF file, and I honestly don't know why. Still seeing what I can do about that.

@TheElementalOfDestruction TheElementalOfDestruction added the In Progress This issue or feature request has been confirmed or approved, respectively, and is being worked on. label May 30, 2022
@TheElementalOfDestruction
Copy link
Collaborator

TheElementalOfDestruction commented May 30, 2022

Hello there, It's been a bit and I haven't had much time to look at this until now. I've created a small script that helps handle the single signed attachment for you to test. It contains a function processAttachment which returns a list of SignedFile instances, which have directly accessible name, data, and mime properties. I've tested it on the example you gave and it worked well, so I'm handing it off for further testing. Should your testing of it go well, then I am prepared to start properly integrating it into extract-msg so signed messages can properly be parsed.

Let me know how it goes.

signed_helper_test.zip

Edit: Quick note, this test file requires the mailbits python package from pypi to work right now. I will likely drop it as a requirement when implementing this into the module if I can comfortably do so.

@TheElementalOfDestruction
Copy link
Collaborator

Signed attachments are now accessible, although embedded message files in a signed message are not handled (and in theory should be able to actually be directly saved as an MSG file, although I have no examples to work with directly). I'll still be working on further progressing this feature though

@z3r0privacy
Copy link
Author

Sorry for the very late response, but somehow I missed the notification on the update... However I've tried with a few more samples (version 0.36.2) and it did seem to work fine in the short test. Going to add it to our test systems in the next few weeks - Thanks!

@TheElementalOfDestruction
Copy link
Collaborator

Glad to hear things are working.

@TheElementalOfDestruction
Copy link
Collaborator

Closing as this should be completed.

@TheElementalOfDestruction TheElementalOfDestruction added Complete This feature has been fully implemented. and removed In Progress This issue or feature request has been confirmed or approved, respectively, and is being worked on. labels Jul 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted This feature request has been accepted and will be developed Complete This feature has been fully implemented. enhancement
Projects
None yet
Development

No branches or pull requests

2 participants