-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the option to iteratively encode JSON. #29
Conversation
This is related to matrix-org/synapse#6998. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few thoughts here.
Nowadays, the speed benefit of setting ensure_ascii=True
and fixing up afterwards is much reduced on simplejson, and non-existent on stdlib json (see #9 (comment)), and the thing about U+2028 is incorrect as of simplejson 3.14.0.
I think it would be entirely reasonable to drop the _unascii
optimisation, with the expectation that anyone looking for optimal performance should use stdlib json.
I also suspect (without evidence) that it is more efficient to do a single str.encode("utf-8")
on a larger result than it is to do many of them on smaller results.
In short: can we avoid the for/yield loops? (maybe we should just make _canonical_encoder
and _pretty_encoder
public?)
Just to make sure I understand -- this would require bumping the minimum version of simplejson to 3.14.0 for correctness? I'll check this as a separate PR.
We could also define the interface for the new functions as returning strings, not bytes, that would just differ from the old interface. 😄 My concern with making |
From some testing locally this seems to be true. See #30 |
4d71cdb
to
825a031
Compare
I think this is the remaining open question about this PR. I agree that it is probably more efficient to encode the longer string than a shorter one (although I'll try to do some quick perf checks of this). |
fair point. maybe better to do as you suggest and just define the new functions to return strings. Or possibly: supply two variants: |
I was curious to benchmark this a bit: https://gist.github.com/clokep/20c7cf34006099120bea5bbbb1c76c97 The results with Python 3.7.7 were:
I think the tl;dr is that it doesn't matter too much which way we do it? Not sure if this changes your opinion or not! |
interesting. slightly surprising. In that case maybe we just stick with what you've got. Up to you, really. |
I think keeping the return type consistent is nice. We can use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
This expands the APIs available from canonicaljson to include methods that return iterators (essentially calling
iterencode
instead ofencode
).