-
-
Notifications
You must be signed in to change notification settings - Fork 829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PreviewText field in MessageSummary is incorrect for the email with charset="ks_c_5601-1987" #1755
Comments
Could you construct a sample message with such a message body for me? This way I could add it to my unit tests, debug it, and figure out a proper solution? Thanks! |
Sure, here is the text file with message source. Please, let me know if it's what you need |
Quick question: I started to take a look at this and there are 2 implementations for preview text.
Option 2 is by far the most prevalent because very few servers actually support the PREVIEW feature. That said, based on your original bug report, it sounds like you've already done some debugging and have narrowed the bug down to the ReadLiteral method doing incorrect character conversion. Does that mean your server supports the PREVIEW feature? I just want to make sure I'm looking into the correct bits of code. |
Yeah, I have been trying to debug it. Server doesn't support PREVIEW feature, so I see that previewText value finally is read as a small chunk of the body and decoded inside ReadLiteralAsync as UTF8, which is incorrect since it's not UTF8. It seems like we need somehow to put here the correct encoding from the body charset |
I would have assumed that https://github.com/jstedfast/MailKit/blob/master/MailKit/Net/Imap/ImapFolderFetch.cs#L918 would have taken care of this because QueueFetchPreviewCommand sets up a FetchPreviewContext which gets the raw body data added to it via the FetchStreamAsync callback method which consumes the body content into a Perhaps what is happening is that there is a DecoderException for ks_c_5601-1987 (due to incomplete data that ends in the middle of a multibyte character?) which then causes fallback to iso-8859-1? |
Take a look at the above commit - it attempts to reproduce the issue in the unit tests but it works fine for me. That said, I suspect that it's working for me because the unit test is likely missing something important but I don't know what it is. As per my previous comment, perhaps the issue is that the unit test has a shorter message body than the message body that is causing you problems and if the message body were longer, then it might hit a charset conversion issue that results in fallback to iso-8859-1. Do you think it would be possible for you to take a look at the above unit test that I just added use that as a template for obtaining an IMAP server response that causes this issue? |
I've scraped some Korean text off of Wikipedia to try and get a long string of text and it still works, so my thinking is that perhaps you did not register the international text encodings in your app? Make sure at program startup, you call the following line of code: System.Text.Encoding.RegisterProvider (System.Text.CodePagesEncodingProvider.Instance); |
Sorry, I didn't have time to try your recommendations above, I'll do it tomorrow or during the weekend Regarding encodings - it's registered exactly as you wrote and I'm able to read body and subject in correct format when I call MailFolder.GetMessageAsync() |
I've tried to copy-paste responses from the server into unit test that you mentioned, but I'm not sure that I did it correctly. |
Okay, so this is what we're interested in: C: D00000007 UID FETCH 284 (UID FLAGS INTERNALDATE ENVELOPE BODYSTRUCTURE PREVIEW BODY.PEEK[HEADER.FIELDS (IMPORTANCE SECURE-REPLY-SENDER)])
S: * 144 FETCH (UID 284 FLAGS (\Seen) INTERNALDATE "17-May-2024 07:48:43 +0000" PREVIEW {213}
S: � � � �ȳ�ϼ� �, � �⸦ �ٶ�ϴ�. �̹� �ָ� � �ȹ� � �ֳ�? �ٵ� � �ñ� �ٶ�Կ�! � �ȹ�ϴ� �̵� �ֽ�ϴ�. � � �غ�. � �! ENVELOPE [snip] BODY[HEADER.FIELDS (IMPORTANCE SECURE-REPLY-SENDER)] {2}
S:
S: )
S: D00000007 OK Fetch completed (0.001 + 0.000 secs). Unfortunately, the IMAP server does claim to support the PREVIEW feature and it is providing the preview text in the wrong charset. It MUST provide the preview text in UTF-8, but it is providing it in ks_c_5601-1987 as you've noted. You could try submitting a bug here: https://github.com/dovecot/core The specification in question can be found here: https://www.rfc-editor.org/rfc/rfc8970.html
If you file a bug against Dovecot, feel free to /cc me so that I can follow along and chime in if needed. As a work-around, you could try disabling server-side PREVIEW by doing this in your code after authenticating: client.Capabilities &= ~ImapCapabilities.Preview; This will unfortunately make the query a bit slower, but it should provide better results until Dovecot fixes the bug. Hope that helps. |
Yes, you're right, the IMAP server supports preview. |
Don't worry about it, I can understand why you misunderstood.
We can say it's a bug because clients and servers are supposed to implement specifications exactly or they aren't worth writing.
You don't have to disable it in Dovecot in a config file, just disable it in MailKit by using the snippet I pasted in my previous comment.
Because we don't necessarily know the body charset. |
if we don't know anything about body charset, how do we read Korean text in the subject ? I thought, in most cases it should be the same. |
Every header (or even every email address in a To/From/Cc header) and every MIME part of the message can use a different charset. You can't rely on a subject charset to be the same as the body. |
IMailFolder.FetchAsync returns incorrect PreviewText in case when email has Content-Type: text/plain; charset="ks_c_5601-1987"
The result is "� � � �ȳ�ϼ� �, � �⸦ �ٶ�ϴ�. �̹� �ָ� � �ȹ� � �ֳ�? �ٵ� � �" but should be like this "안녕하세요 여러분 내 주제는 여기 안녕하세요 여러분"
This charset is registered on the environment and I can read the correct body when fetch body parts, but can't get the correct value inside PreviewText.
I can see in the code that finally PreviewText value is read like UTF8 but what to do if there is another encoding. Is there way to customize it ?
Platform (please complete the following information):
Exception
NO
To Reproduce
Call FetchAsync and check the result when message has charset="ks_c_5601-1987"
Expected behavior
PreviewText should have readable value
The text was updated successfully, but these errors were encountered: