Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect serialization of Unicode characters in NewtonSoftJsonSerializer #1508

Merged
merged 1 commit into from
Dec 6, 2015

Conversation

easuter
Copy link
Contributor

@easuter easuter commented Dec 6, 2015

Currently Akka.NET uses System.Text.Encoding.Default.GetBytes() to obtain a "blob" representation of messages that need to be sent outside the local actor system.

This is what the .NET documentation has to say regarding the use of Encoding.Default:

Different computers can use different encodings as the default, and the default encoding can even change on a single computer. Therefore, data streamed from one computer to another or even retrieved at different times on the same computer might be translated incorrectly. In addition, the encoding returned by the Default property uses best-fit fallback to map unsupported characters to characters supported by the code page. For these two reasons, using the default encoding is generally not recommended. To ensure that encoded bytes are decoded properly, you should use a Unicode encoding, such as UTF8Encoding or UnicodeEncoding, with a preamble. Another option is to use a higher-level protocol to ensure that the same format is used for encoding and decoding.

The system ANSI code page defined by Default covers the ASCII set of characters, but the encoding is different from the encoding for ASCII. Because all Default encodings lose data, you might use UTF8 instead. UTF-8 is often identical in the U+00 to U+7F range, but can encode other characters without loss.

In the project I'm currently working on I experienced this problem. Results returned from locally-deployed actors presented no problems at all, while results from remote actors had Unicode characters replaced with question marks.

Akka.NET's object serialization should have no effect on the actual content of the message, and I imagine this will cause problems in clusters that have nodes with different default encodings.

The bug is easy to reproduce fortunately, using a LINQPad script for example:

var originalJson = JsonConvert.SerializeObject(new { UnicodeChar = "★" }, Newtonsoft.Json.Formatting.None);
var encodingDefaultBytes = Encoding.Default.GetBytes(originalJson);
var encodingUtf8Bytes = Encoding.UTF8.GetBytes(originalJson);

var encodingDefaultJson = Encoding.Default.GetString(encodingDefaultBytes);
var encodingUtf8Json = Encoding.UTF8.GetString(encodingUtf8Bytes);

originalJson.Dump("Orignal JSON object:");
encodingDefaultJson.Dump("Encoding.Default GetBytes/GetString JSON object:");
encodingUtf8Json.Dump("Encoding.UTF8 GetBytes/GetString JSON object:");

Script output:

Orignal JSON object:
{"UnicodeChar":"★"} 

Encoding.Default GetBytes/GetString JSON object:
{"UnicodeChar":"?"} 

Encoding.UTF8 GetBytes/GetString JSON object:
{"UnicodeChar":"★"} 

The solution is to use Encoding.UTF8.GetBytes() instead. I've built the src/core/Akka project with this patch applied and am currently using the resulting DLL to work around this issue in my cluster.

Updated the Json.NET serializer to encode strings as UTF-8 instead of
using the current framework/system default
(System.Text.Encoding.Default).

This fixes corruption of Unicode characters when the default encoding
isn't capable of representing them correctly.
@rogeralsing
Copy link
Contributor

Ah, you are correct ofc! 👍

rogeralsing added a commit that referenced this pull request Dec 6, 2015
Fix incorrect serialization of Unicode characters in NewtonSoftJsonSerializer
@rogeralsing rogeralsing merged commit 95827ae into akkadotnet:dev Dec 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants