Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add json module #8111

Merged
merged 3 commits into from
Mar 8, 2024
Merged

Add json module #8111

merged 3 commits into from
Mar 8, 2024

Conversation

michalmuskala
Copy link
Contributor

@michalmuskala michalmuskala commented Feb 12, 2024

This implements the json module as specified in EEP 68.

Copy link
Contributor

github-actions bot commented Feb 12, 2024

CT Test Results

    2 files     94 suites   34m 31s ⏱️
2 041 tests 1 993 ✅ 48 💤 0 ❌
2 350 runs  2 300 ✅ 50 💤 0 ❌

Results for commit c714e21.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@rickard-green rickard-green added the team:PS Assigned to OTP team PS label Feb 12, 2024
Copy link
Contributor

@dgud dgud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments
Docs are a bit sparse :-)
Misses meta-data since 27.

But overall looks good to me.

lib/stdlib/src/json.erl Outdated Show resolved Hide resolved
lib/stdlib/src/json.erl Outdated Show resolved Hide resolved
lib/stdlib/src/json.erl Show resolved Hide resolved
lib/stdlib/src/json.erl Outdated Show resolved Hide resolved
lib/stdlib/src/json.erl Outdated Show resolved Hide resolved
lib/stdlib/src/json.erl Outdated Show resolved Hide resolved
lib/stdlib/test/json_SUITE.erl Show resolved Hide resolved
lib/stdlib/src/json.erl Outdated Show resolved Hide resolved
Comment on lines 52 to 64
-ifdef(PROPER).
-export([
property_string_roundtrip/1,
property_integer_roundtrip/1,
property_float_roundtrip/1,
property_object_roundtrip/1,
property_escape_all/1
]).

-define(PROPER_NO_TRANS, true).
-define(PROPER_NO_IMPORT_PARSE, true).
-include_lib("proper/include/proper.hrl").
-endif.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, property tests are usually put in their own suite, using the ct_property_test framework as described here. At least all the property tests for other modules in OTP that I know of are organized this way. For example, see stdlib/test/base64_property_test_SUITE.erl and the corresponding stdlib/test/property_test/base64_prop.erl.

@michalmuskala
Copy link
Contributor Author

michalmuskala commented Feb 19, 2024

This changes the string escaping implementation to read multiple bytes at once, and applies several other encoding optimisations. With those changes, in my benchmarks on an M1 mac, the new json module significantly outperforms Jason by 1.5-2.5x, and jiffy (except for one case where it's on-par).
This includes implementing custom UTF8 validation that significantly outperforms /utf8 patterns in case where we need to know the length of the codepoint, but don't need to know its value.
Unfortunately constructing binaries with private_append turned out to still be slower than building an IOlist of parts - even if benchmarked with a final iolist_to_binary call.

Btw, in the implementation of UTF8 validation using tuples and element calls to do lookups, rather than functions with integer patterns seems to be considerably faster - I believe this is because of optimisations the compiler tries to do by fusing certain clauses, which in this case leads to a pesimisation.

@josevalim
Copy link
Contributor

This includes implementing custom UTF8 validation that significantly outperforms /utf8 patterns in case where we need to know the length of the codepoint, but don't need to know its value.

I can see such a function being useful in other scenarios. Perhaps it makes sense to export it as part of unicode?

lib/stdlib/src/json.erl Outdated Show resolved Hide resolved
lib/stdlib/src/json.erl Show resolved Hide resolved
lib/stdlib/src/json.erl Outdated Show resolved Hide resolved
lib/stdlib/src/json.erl Outdated Show resolved Hide resolved
lib/stdlib/src/json.erl Show resolved Hide resolved
@michalmuskala
Copy link
Contributor Author

This includes implementing custom UTF8 validation that significantly outperforms /utf8 patterns in case where we need to know the length of the codepoint, but don't need to know its value.

I can see such a function being useful in other scenarios. Perhaps it makes sense to export it as part of unicode?

Unfortunately this function being efficient in here very much depends on it being placed in the same module as the caller allowing the compiler to analyse and optimise them together. I'm not convinced this would be particularly efficient as a standalone function. I'd prefer to focus on further optimising /utf8 patterns, so the "intuitive" way of doing things is also the fastest.

@codeadict
Copy link
Contributor

This includes implementing custom UTF8 validation that significantly outperforms /utf8 patterns in case where we need to know the length of the codepoint, but don't need to know its value.

I can see such a function being useful in other scenarios. Perhaps it makes sense to export it as part of unicode?

Would this help with UTF-8 validation in Cowboy and Bandit?

@michalmuskala
Copy link
Contributor Author

I've pushed commits that tune the performance of decoder (and tiny changes to encoder). This also includes the change to decoder to remove the empty_object and empty_array settings in favour of calling callbacks, e.g. ArrayFinish(ArrayStart(Acc)).

Current benchmark results for an M1 mac can be found in https://gist.github.com/michalmuskala/00e615f5de4305027abecffed45af07b

The decoder is the fastest out of all libraries I've checked, except for jiffy in couple specific data sets (notably lots of tiny strings and unicode-heavy data). In other cases the decoder is usually ~30% faster than Jason and jiffy (and up to 2x faster) and 2x faster than jsone (and up to 5x faster).

I've also improved documentation and added some more examples.

@michalmuskala
Copy link
Contributor Author

Would this help with UTF-8 validation in Cowboy and Bandit?

cowboy already does this - this is actually how I learned about this technique, though I'm pretty sure my implementation is more performant.

https://github.com/ninenines/cowlib/blob/cc04201c1d0e1d5603cd1cde037ab729b192634c/src/cow_ws.erl#L581-L588

lib/stdlib/src/json.hrl Outdated Show resolved Hide resolved
lib/stdlib/src/json.hrl Outdated Show resolved Hide resolved
@saleyn saleyn mentioned this pull request Feb 23, 2024
lib/stdlib/src/json.erl Outdated Show resolved Hide resolved
@essen
Copy link
Contributor

essen commented Feb 26, 2024

Would this help with UTF-8 validation in Cowboy and Bandit?

cowboy already does this - this is actually how I learned about this technique, though I'm pretty sure my implementation is more performant.

Nice! I might have to revisit this, the VM has changed a lot this past decade.

lib/stdlib/src/json.erl Outdated Show resolved Hide resolved
@dgud dgud added the testing currently being tested, tag is used by OTP internal CI label Feb 26, 2024
@michalmuskala
Copy link
Contributor Author

I've squashed the commits and rebased on top of latest master

@dgud dgud removed the testing currently being tested, tag is used by OTP internal CI label Feb 27, 2024
@bjorng bjorng added the testing currently being tested, tag is used by OTP internal CI label Feb 28, 2024
michalmuskala and others added 2 commits March 7, 2024 14:43
This implements the `json` module as specified in
[EEP 68](https://github.com/erlang/eep/blob/master/eeps/eep-0068.md).
Add a separate API for streaming data, this is needed to make numbers work as expected since
there is no way of knowing when a number is complete and doesn't continue in the next package.

Allow the user to call decode_continue(NewBin, State) to complete the parsing.
We also need 'end_of_input' argument to let the user signal that there is no more data in the
case that stream only contained an integer or is an incomplete Json object.
@dgud dgud added testing currently being tested, tag is used by OTP internal CI and removed testing currently being tested, tag is used by OTP internal CI labels Mar 7, 2024
And also add the copyright to the OTP copyright file.
@dgud dgud merged commit 80f48bd into erlang:master Mar 8, 2024
16 checks passed
@michalmuskala michalmuskala deleted the json branch March 8, 2024 15:48
@lpil
Copy link
Contributor

lpil commented Mar 8, 2024

Congratulations and thank you @michalmuskala!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:PS Assigned to OTP team PS testing currently being tested, tag is used by OTP internal CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.