Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove simplejson dependency #23

Open
palango opened this issue Jun 4, 2019 · 10 comments
Open

Remove simplejson dependency #23

palango opened this issue Jun 4, 2019 · 10 comments

Comments

@palango
Copy link

palango commented Jun 4, 2019

Currently the package pulls simplejson as a dependency.

This caused some unexpected trouble in raiden-network/raiden#4174 . It also seems that since python 2.6 simplejson is no longer a big performance gain: https://github.com/kennethreitz/requests/issues/3052#issuecomment-197021335

So my idea would be to remove the dependency. If you agree I can create a PR.

@richvdh
Copy link
Member

richvdh commented Jun 5, 2019

Hrm.

python 2.7 improved the performance of the stdlib json serialiser, but it did not integrate all of the optimisations from simplejson, and as the comment says, I still found it to be substantially slower than simplejson.

I'd therefore be reluctant to simply discard simplejson without further benchmarking. I'd be open to dropping the dep on simplejson if it could be shown that on recent (say, python 3.5) versions of the stdlib the performance was comparable with simplejson.

I'd also consider making the dependency optional, but am wary of introducing strange effects like that seen in https://github.com/kennethreitz/requests/issues/3052.

I'm not quite clear on what problem you're trying to solve: is simplejson failing to parse valid json?

@richvdh
Copy link
Member

richvdh commented Jun 5, 2019

I guess a final option would be to use json by default but allow the caller to pass in an alternative json implementation as a keyword param.

@konradkonrad
Copy link

I'm not quite clear on what problem you're trying to solve: is simplejson failing to parse valid json?

No, the problem we experience is ultimately to blame on https://github.com/kennethreitz/requests/issues/3052 -- when trying to handle parsing exceptions, we mismatch on the exception type, because requests uses simplejson if available. Since pulling matrix-synapse as a dependency, this is the case now.

Unfortunately it doesn't look like https://github.com/kennethreitz/requests/pull/3056 will ever make it into the requests 2.x series, or be released before 2020 (see yellow note here), so politely asking to get rid of the dependency is our way of avoiding monkey patching or code arounds.

I guess a final option would be to use json by default but allow the caller to pass in an alternative json implementation as a keyword param.

Allowing simplejson as an extra could be a possibility, too, no?

@palango
Copy link
Author

palango commented Jun 7, 2019

I'd therefore be reluctant to simply discard simplejson without further benchmarking. I'd be open to dropping the dep on simplejson if it could be shown that on recent (say, python 3.5) versions of the stdlib the performance was comparable with simplejson.

According to https://artem.krylysov.com/blog/2015/09/29/benchmark-python-json-libraries/ json in python 3.5 is slightly faster than simplejson, but slower in python 2.7. SO I guess it depends on how important python 2.7 is for the user base.

Allowing simplejson as an extra could be a possibility, too, no?

That sounds like a good solution, keeping the option to use simplejsonn for python 2.7 users, but defaulting to json.

@richvdh
Copy link
Member

richvdh commented Jun 7, 2019

According to https://artem.krylysov.com/blog/2015/09/29/benchmark-python-json-libraries/ json in python 3.5 is slightly faster than simplejson, but slower in python 2.7. SO I guess it depends on how important python 2.7 is for the user base.

I looked at those metrics when I did the benchmarking I referred to earlier. What I found is that the performance of the various json libraries varied dramatically according to other options. In particular, stock json was much slower than simplejson with sort_keys=True (as used by canonicaljson). So I'm afraid those metrics aren't much use here.

Allowing simplejson as an extra could be a possibility, too, no?

That sounds like a good solution, keeping the option to use simplejsonn for python 2.7 users, but defaulting to json.

Isn't this what requests did, and which caused you all sorts of problems? I'm not enthusiastic about repeating that mistake. I'd prefer the idea of letting the caller actively choose to pass in an alternative implementation.

@richvdh
Copy link
Member

richvdh commented Nov 20, 2019

@palango any further thoughts here?

@richvdh
Copy link
Member

richvdh commented Jun 10, 2020

hrm, I did some benchmarking (based https://github.com/richvdh/json_benchmark).

using python 3.7.5 (seconds per loop):

                             dumps (large obj)   dumps (small objs)
json 2.0.9                               0.015                0.020
simplejson 3.17.0                        0.030                0.046

So yeah, we should probably switch back to stock json.

@richvdh
Copy link
Member

richvdh commented Jun 10, 2020

it seems like there was a substantial change to the performance of the stdlib json somewhere between python 2.7 and 3.5. Now that 2.7 is unsupported, I'm not too worried about any performance regressions for it.

@richvdh
Copy link
Member

richvdh commented Jun 10, 2020

as per the benchmarks on #9, this would also allow us to remove the _unascii magic.

@richvdh
Copy link
Member

richvdh commented Aug 7, 2020

An update on the situation here: as of #25, it is possible to use stdlib json rather than simplejson for encoding; however, we still pull it in with setup.py, so this wouldn't solve the original reporter's problem.

Properly dropping the simplejson dependency requires thought about how to do so without breaking compatibility with older applications that might pass bytes values into canonicaljson.encode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants