-
Notifications
You must be signed in to change notification settings - Fork 937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mail::Body#to_s uses wrong String encoding #1413
Comments
This was originally reported in #809 but closed based on a misunderstanding of later comments. |
Nicely spotted, that is the exact issue we're seeing. And misunderstandings so easily occur here, when we're talking about encodings, but not those encodings, the other encoding, which is really charset. 🙄 And it's especially hard because you can't visually tell the difference and in so many cases everything still works even though it is wrong. |
It does seem, though, that using That said, it still seems problematic that |
Only if the email is multi-part. Otherwise,
I agree, it's broken.
|
I'm running this issue too, while processing a message sent into the app from Cloudmailin.
So what's the solution? What do we expect (And why is this 5-year-old (#1216), very serious, app-crashing bug still unresolved?) |
Not a proper solution to the underlying problem, but here's the quick workaround I came up with, in case it's helpful to anyone... html = fix_encoding(@inbound.html_part)
private def fix_encoding(part)
body_str = part.body.decoded
# Why is the encoding sometimes ASCII-8BIT (and sometimes already converted to UTF-8) instead of
# respecting the claimed encoding? Why does trying to encode it as UTF-8 result in:
# Encoding::UndefinedConversionError ("\xE2" from ASCII-8BIT to UTF-8)? We may never know. (See
# https://github.com/mikel/mail/issues/1413, etc.) But here is a workaround.
logger.debug "body_str.encoding: #{body_str.encoding} (part.charset: #{part.charset})"
unless body_str.encoding == Encoding.find('utf-8')
new_encoding =
if (body_str.encode('utf-8') rescue false)
'utf-8'
else
part.charset
end
logger.warn "Using force_encoding to change from #{body_str.encoding} to #{new_encoding} (part.charset: #{part.charset})"
body_str = body_str.force_encoding(new_encoding)
end
body_str
end |
Given an email encoded with a ISO-8859-1 charset, I'd expect the strings coming out of mail methods to be either:
However, we see them returned with ASCII-8BIT encodings which doesn't seem right - and which causes problems on subsequent handling of those strings.
How to reproduce
The following script illustrates the issue:
Running it outputs
While the exception comes from inside the JSON gem, I'd risk assessing that the root cause is because
body_string.encoding
is#<Encoding:ASCII-8BIT>
and not#<Encoding:ISO-8859-1>
. To verify this we can add abefore performing the JSON encoding, which makes the script run without exceptions.
Versions
Possibly related issues
I did look through the existing issues, and while I wasn't able to find any that matches the issue exactly, there are quite a few that seem related. Many are older, though, and/or missing reproduction steps:
The text was updated successfully, but these errors were encountered: