Implement JSON token stream deserializer #454

jdisanti · 2021-06-03T00:36:04Z

This adds a JSON token streaming deserializer to smithy-json for #161.

It passes all tests in JSONTestSuite, except for some notable ones:

Number parsing is more lenient. We could put in more effort to make it more strictly adhere to the JSON spec, but it may not be worth the effort, and the leniency isn't going to result in any misinterpretations of valid JSON data.
There are some test cases around invalid Unicode sequences which are expected to pass but always fail since this implementation is coercing all JSON strings into Rust strings, and Rust wants valid UTF-8. I don't think accepting these invalid sequences would be valuable for our SDK implementation since the protocols call for valid UTF-8 in strings. I was wrong here.
JSONTestSuite has some tests around multiple values, such as [][], which is totally valid for this implementation since it would result in start_array, end_array, start_array, end_array.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

rcoh

overall I like the direction, but we need to avoid allocating strings during parsing

rcoh · 2021-06-03T01:18:31Z

rust-runtime/smithy-json/src/deserialize.rs

+    }
+
+    /// Reads a JSON string out of the stream.
+    fn read_string(&mut self) -> Result<String, Error> {


the XML parsing library I used didn't unescape in the tokenizer. That enabled the tokenizer to be allocation-free (and for input that you know won't need to be escaped (like base64), you can even skip the unescape step later)

I don't think we want to be allocating for every string we read out of the input stream, especially since lots of them (eg. a key in a map) we never need to own

This makes a lot of sense. Will refactor.

rcoh · 2021-06-03T01:19:57Z

rust-runtime/smithy-json/src/deserialize.rs

+pub enum Token {
+    StartArray,
+    EndArray,
+    ObjectKey(String),


these almost certainly want to be &'a str that refer into the input

rcoh

Also, if there is a harness to attach this to the Json test suite, we should definitely include that

jdisanti · 2021-06-03T15:39:30Z

Looks like the unescaping has a bug around UTF-16 surrogate pairs. Will examine fixing that in addition to decoupling parsing from unescaping.

rcoh

LGTM! we should add a proper fuzzing harness either here or in a follow up PR. A bench suite would also be good but not critical right now

jdisanti · 2021-06-03T22:15:31Z

I will add fuzzing in a follow-up PR.

Implement JSON token stream deserializer

585f347

rcoh reviewed Jun 3, 2021

View reviewed changes

jdisanti added 2 commits June 3, 2021 12:18

Stop allocating string values and fix surrogate pair unescaping

0a2c800

Add documentation on how to test against JSONTestSuite

adb5600

rcoh approved these changes Jun 3, 2021

View reviewed changes

Merge branch 'main' into json-de

a98b7d6

jdisanti merged commit 2cef1e6 into smithy-lang:main Jun 4, 2021

jdisanti deleted the json-de branch June 4, 2021 21:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement JSON token stream deserializer #454

Implement JSON token stream deserializer #454

jdisanti commented Jun 3, 2021 •

edited

Loading

rcoh left a comment

rcoh Jun 3, 2021

jdisanti Jun 3, 2021

rcoh Jun 3, 2021

rcoh left a comment

jdisanti commented Jun 3, 2021

rcoh left a comment

jdisanti commented Jun 3, 2021

Implement JSON token stream deserializer #454

Implement JSON token stream deserializer #454

Conversation

jdisanti commented Jun 3, 2021 • edited Loading

rcoh left a comment

Choose a reason for hiding this comment

rcoh Jun 3, 2021

Choose a reason for hiding this comment

jdisanti Jun 3, 2021

Choose a reason for hiding this comment

rcoh Jun 3, 2021

Choose a reason for hiding this comment

rcoh left a comment

Choose a reason for hiding this comment

jdisanti commented Jun 3, 2021

rcoh left a comment

Choose a reason for hiding this comment

jdisanti commented Jun 3, 2021

jdisanti commented Jun 3, 2021 •

edited

Loading