-
-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UUencoded attachment parsing #80
Comments
I added the respective uuencoded mail sample from the SpamScope repository (its name is File "mailparser/utils.py", line 562, in write_sample
f.write(payload)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0612' in position 57: ordinal not in range(128 SpamScope doesn't run into that exception, because it ships with its own write method that succeeds in writing the (now broken) file. AssertionError: 'c2d03e7cb36e48c625d9243ee3fdc401' != '4f2cf891e7cfb349fca812091f184ecc'
- c2d03e7cb36e48c625d9243ee3fdc401
+ 4f2cf891e7cfb349fca812091f184ecc As for the fix, I decided to simply re-encode the uuencoded attachment in base64 and call it a day. That's because the
if binary:
with open(sample, "wb") as f:
f.write(base64.b64decode(payload))
if raw_dict["binary"]:
try:
payload = base64.b64decode(raw_dict["payload"]) However, just keeping For a better solution it might be beneficial to simply treat |
Hi @sylencecc, thanks a lot for your PR. Are you working on SpamScope porting? |
Yeah, we have some py3-only dependencies, which is why I'm currently in the process of porting it over. If you're interested in the results, I'll happily send a pull request as soon as I'm done. However, I'm not keeping backwards compatibility: it won't run with py2 anymore. In addition to that, I'm only using the Docker-based version, so I won't touch the Ansible stuff for now. Docker-wise, since the SpamScope image depends on
Let me know if I should send you PR requests for all that stuff. A separate branch might be appropriate. |
Yes you can send me all PRs.
Good job. |
Please have a look at my SpamScope fork and the Storm Dockerfile which the new SpamScope image depends on. So far, the included tests run fine, as does the default debug topology. The project we're using SpamScope in also seems to run on py3 without further issues. However, due to the mediocre test coverage we can't be all too confident that this update doesn't break anything. Moreover, I didn't update the Ansible playbooks due to a lack of time.
|
Hi @sylencecc, I will check your code. Now I can't. For code I will test it. |
Any news on that? Did you find time to test the py3 changes? |
Hi @sylencecc I will test it. I'm working on it best effort. |
When dealing with attachments encoded via uuencoding (
Content-transfer-encoding
isuuencode
orx-uuencode
), mail-parser treats them as text, as can be seen inparse()
(mailparser.py:378
):Within the
else
block, the payload is correctly decoded withp.get_payload(decode=True)
, but then passed toported_string()
which attempts to encode the returned bytes to UTF-8 inutils.py:85
:Since
errors
are ignored, encoding doesn't fail, but returns a attachment stripped of all bytes that can't be encoded in utf-8 (that can be easily verified by attempting to write that binary to disk withwrite_attachments
).I encountered this issue while porting SpamScope to Python3, which has a test
test_store_samples_unicode_error
that parses and saves a uuencoded attachment. According to the test, the resulting file should have a MD5 checksum of2ea90c996ca28f751d4841e6c67892b8
. That test passes with Python2, because the incorrectly parsed payload does indeed have that hash. However, with Python3 the hash changes due to differences in unicode handling. However, the correct checksum is actually4f2cf891e7cfb349fca812091f184ecc
.The text was updated successfully, but these errors were encountered: