Fix incorrect serialization of Unicode characters in NewtonSoftJsonSerializer #1508

easuter · 2015-12-06T14:46:15Z

Currently Akka.NET uses System.Text.Encoding.Default.GetBytes() to obtain a "blob" representation of messages that need to be sent outside the local actor system.

This is what the .NET documentation has to say regarding the use of Encoding.Default:

Different computers can use different encodings as the default, and the default encoding can even change on a single computer. Therefore, data streamed from one computer to another or even retrieved at different times on the same computer might be translated incorrectly. In addition, the encoding returned by the Default property uses best-fit fallback to map unsupported characters to characters supported by the code page. For these two reasons, using the default encoding is generally not recommended. To ensure that encoded bytes are decoded properly, you should use a Unicode encoding, such as UTF8Encoding or UnicodeEncoding, with a preamble. Another option is to use a higher-level protocol to ensure that the same format is used for encoding and decoding.

The system ANSI code page defined by Default covers the ASCII set of characters, but the encoding is different from the encoding for ASCII. Because all Default encodings lose data, you might use UTF8 instead. UTF-8 is often identical in the U+00 to U+7F range, but can encode other characters without loss.

In the project I'm currently working on I experienced this problem. Results returned from locally-deployed actors presented no problems at all, while results from remote actors had Unicode characters replaced with question marks.

Akka.NET's object serialization should have no effect on the actual content of the message, and I imagine this will cause problems in clusters that have nodes with different default encodings.

The bug is easy to reproduce fortunately, using a LINQPad script for example:

var originalJson = JsonConvert.SerializeObject(new { UnicodeChar = "★" }, Newtonsoft.Json.Formatting.None);
var encodingDefaultBytes = Encoding.Default.GetBytes(originalJson);
var encodingUtf8Bytes = Encoding.UTF8.GetBytes(originalJson);

var encodingDefaultJson = Encoding.Default.GetString(encodingDefaultBytes);
var encodingUtf8Json = Encoding.UTF8.GetString(encodingUtf8Bytes);

originalJson.Dump("Orignal JSON object:");
encodingDefaultJson.Dump("Encoding.Default GetBytes/GetString JSON object:");
encodingUtf8Json.Dump("Encoding.UTF8 GetBytes/GetString JSON object:");

Script output:

Orignal JSON object:
{"UnicodeChar":"★"} 

Encoding.Default GetBytes/GetString JSON object:
{"UnicodeChar":"?"} 

Encoding.UTF8 GetBytes/GetString JSON object:
{"UnicodeChar":"★"}

The solution is to use Encoding.UTF8.GetBytes() instead. I've built the src/core/Akka project with this patch applied and am currently using the resulting DLL to work around this issue in my cluster.

Updated the Json.NET serializer to encode strings as UTF-8 instead of using the current framework/system default (System.Text.Encoding.Default). This fixes corruption of Unicode characters when the default encoding isn't capable of representing them correctly.

rogeralsing · 2015-12-06T16:13:05Z

Ah, you are correct ofc! 👍

Fix incorrect serialization of Unicode characters in NewtonSoftJsonSerializer

rogeralsing added the needs review label Dec 6, 2015

rogeralsing added a commit that referenced this pull request Dec 6, 2015

Merge pull request #1508 from easuter/dev

95827ae

Fix incorrect serialization of Unicode characters in NewtonSoftJsonSerializer

rogeralsing merged commit 95827ae into akkadotnet:dev Dec 6, 2015

rogeralsing removed the needs review label Dec 6, 2015

This was referenced Jan 18, 2016

Akka.NET v1.0.6 release (DEV) #1653

Merged

Akka.NET v1.0.6 release (PRODUCTION) #1654

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix incorrect serialization of Unicode characters in NewtonSoftJsonSerializer #1508

Fix incorrect serialization of Unicode characters in NewtonSoftJsonSerializer #1508

easuter commented Dec 6, 2015

rogeralsing commented Dec 6, 2015

Fix incorrect serialization of Unicode characters in NewtonSoftJsonSerializer #1508

Fix incorrect serialization of Unicode characters in NewtonSoftJsonSerializer #1508

Conversation

easuter commented Dec 6, 2015

rogeralsing commented Dec 6, 2015