Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError caused by name stream not parsing correctly #320

Closed
1 of 2 tasks
programmarchy opened this issue Jan 11, 2023 · 7 comments
Closed
1 of 2 tasks

KeyError caused by name stream not parsing correctly #320

programmarchy opened this issue Jan 11, 2023 · 7 comments
Assignees
Labels
Complete This feature has been fully implemented.

Comments

@programmarchy
Copy link
Contributor

programmarchy commented Jan 11, 2023

Bug Metadata

  • Version of extract_msg: 0.38.4
  • Your python version: Python 3.9.14
  • How did you launch extract_msg?
    • My command line or
    • I used the extract_msg package

Describe the bug

In some messages, entry names do not parse correctly. Then, when the properties dictionary is built, a KeyError is thrown because a name for the corresponding entry does not exist.

What code did you use or can we use to reproduce this error?

Just called the constructor on my file: msg = extract_msg.Message(file_path)

Is there a message.msg file you want to share to help us reproduce this?

I can't attach the .msg file since it has sensitive data, but I did extract the binary data from the names stream to share. I've attached that as a .bin file, along with the code I used to extract the binary data, and code that shows the resulting malformed dictionary from the parsing done in the constructor of the Named class.

Traceback

Traceback (most recent call last):
  File "/home/user/project/.venv/lib/python3.9/site-packages/extract_msg/msg.py", line 792, in named
    return self.__named
AttributeError: 'Message' object has no attribute '_MSGFile__named'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/project/names_repro.py", line 6, in <module>
    msg = extract_msg.Message(file_path, delayAttachments=True)
  File "/home/user/project/.venv/lib/python3.9/site-packages/extract_msg/message_base.py", line 109, in __init__
    self.named
  File "/home/user/project/.venv/lib/python3.9/site-packages/extract_msg/msg.py", line 804, in named
    self.__named = Named(self)
  File "/home/user/project/.venv/lib/python3.9/site-packages/extract_msg/named.py", line 68, in __init__
    self.__properties.append(StringNamedProperty(entry, names[entry['id']]) if entry['pkind'] == NamedPropertyType.STRING_NAMED else NumericalNamedProperty(entry))
KeyError: 2906

Additional context

names_repro.zip

  • names.bin - a binary representation of the names section of my .msg file
  • names_bin_extract.py - code I used to extract the bin file
  • names_bin_parse.py - isolated code from Named constructor showing resulting dictionary
@programmarchy
Copy link
Contributor Author

programmarchy commented Jan 11, 2023

Here's the output of names_bin_parse.py. Notice the last item in the dictionary.

{0: 'x-ms-exchange-organization-authas', 72: 'XmlExtractedAddresses', 120: 'XmlExtractedPhones', 160: 'XmlExtractedEmails', 200: 'XmlExtractedUrls', 236: 'PropertyExistenceTracker', 288: 'NetworkMessageId', 324: 'ExchangeApplicationFlags', 376: 'acceptlanguage', 408: 'ConversationIndexTrackingEx', 468: 'DetectedLanguage', 504: 'x-ms-exchange-organization-authsource', 584: 'IsSigned', 604: 'IsReadReceipt', 636: 'x-ms-exchange-organization-originalclientipaddress', 740: 'x-ms-exchange-organization-originalserveripaddress', 844: 'x-originating-ip', 880: 'x-forefront-antispam-report', 940: 'LatestMessageWordCount', 988: 'InferenceMessageIdentifier', 1044: 'InferenceClassificationTrackingEx', 1116: 'InferenceClassificationResult', 1180: 'TriageFeatureVector', 1224: 'EntityDocument', 1256: 'EntityNames', 1284: 'EntityExtractionServiceDiagnosticContext', 1368: 'EntityExtractionSuccess', 1420: 'CurrentMessageSafetyFlags', 1476: 'TeeVersion', 1500: 'BigFunnelCorrelationId', 1548: 'HeaderBodyFragmentList', 1596: 'ItemExchangeDiagnostics', 1648: 'IsQuotedTextChanged', 1692: 'ConversationTreeParentRecordKey', 1760: 'LastIndexingAttemptTime', 1812: 'IsPartiallyIndexed', 1852: 'x-ms-publictraffictype', 1900: 'HasQuotedText', 1932: 'X-Microsoft-Antispam-Message-Info', 2004: 'EntityExtraction/ExtractLanguage1.0', 2080: 'TeeInferResult', 2112: 'EntityExtraction/Url1.0', 2164: 'EntityExtraction/PostalAddress', 2228: 'EntityExtraction/PhoneNumber', 2288: 'authentication-results', 2336: 'EntityExtraction/Email', 2384: 'EntityExtraction/Contact1.0', 2444: 'received-spf', 2472: 'ConversationContributions', 2528: 'EntityExtraction/OpportunityAtRisk1.0', 2608: 'x-ironport-av', 2640: 'XmlExtractedMeetings', 2684: 'EntityExtraction/Events1.0', 2740: 'XmlExtractedContacts', 2784: 'ImageNaturalHeight', 2824: 'ImageNaturalWidth', 2864: 'ImageThumbnailError', 2908: 'mageNaturalHeight"\x00ImageNaturalWidth\x00&\x00ImageThumbnailError0\x00ExchangeApplicationFlags6\x00ConversationIndexTrackingEx$\x00ImageNaturalHeight"\x00ImageNaturalWidth\x00&\x00ImageThumbnailError$\x00ImageNaturalHeight"\x00ImageNaturalWidth\x00&\x00ImageThumbnailError'}

names bin

@TheElementalOfDestruction
Copy link
Collaborator

Looking at the documentation, your files would actually be considered to be a blatant violation of the standard. However, looking at it I can actually see a bit of a better way to handle the named stream which would allow your files to be understood, which is to just parse each name as it is requested instead of parsing them all and hoping the creating program did it right.

The reason I say your files are in violation is that it has a named entry that does not start on the 4 byte boundary, as stated in the documentation here:
image

The weird thing is that earlier in the stream your file was actually doing this correctly, as you can see it added 2 null characters at 0x9DE and 0x9DF before it started with the size for the next one. I'd honestly recommend reporting this inconsistency to whatever program made those MSG files (if it was Microsoft Outlook, I would urge you even more).

With a slight adjustment of my code, those files with bad streams will still work fine, and any errors like this in the future will have a much better error message to them.

@TheElementalOfDestruction TheElementalOfDestruction added the In Progress This issue or feature request has been confirmed or approved, respectively, and is being worked on. label Jan 11, 2023
@programmarchy
Copy link
Contributor Author

Interesting... I will try to find out more about where these files came from. Thanks for your help.

@TheElementalOfDestruction
Copy link
Collaborator

I just pushed a commit to next release with some new code for named properties. Can you let me know if it fixes those files?

You can install from that branch with this command:

python -m pip install git+https://github.com/TeamMsgExtractor/msg-extractor@next-release

@programmarchy
Copy link
Contributor Author

@TheElementalOfDestruction That fixes KeyError I was receiving. Nice work!

@TheElementalOfDestruction TheElementalOfDestruction added the Nearly Implemented This feature has nearly been implemented and will be available in the next version. label Jan 12, 2023
@TheElementalOfDestruction
Copy link
Collaborator

I'll mark this as complete once it's in a final release

@TheElementalOfDestruction
Copy link
Collaborator

This should now be completely fixed in 0.39.0

@TheElementalOfDestruction TheElementalOfDestruction added Complete This feature has been fully implemented. and removed In Progress This issue or feature request has been confirmed or approved, respectively, and is being worked on. Nearly Implemented This feature has nearly been implemented and will be available in the next version. labels Jan 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Complete This feature has been fully implemented.
Projects
None yet
Development

No branches or pull requests

2 participants