-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix NaN and Inf serialization #706
Conversation
82d2306
to
e22d04d
Compare
e22d04d
to
6daba70
Compare
@@ -27,6 +26,7 @@ | |||
compare_digest, | |||
) # We are using compare_digest to limit the surface of timing attacks | |||
|
|||
import simplejson as json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the added dependency? I think we should be able to do this with stdlib json, since we always have before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before, we had json_clean
in ipykernel that was going through all messages (making copies of lists and dicts) and replacing math.inf
and math.nan
manually.
Unfortunately the standard json lib only allows two options:
allow_nan=True
: nan is serialized toNaN
, inf is serialized toInfinity
, which are both valid JS but not valid JSONallow_nan=False
: throw an exception for nan and inf without letting us a chance to serialize them tonull
(valid JSON) in thedefault
function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arg, that's frustrating. And support for nan/inf is required somewhere? This is an artifact of earlier "never raise" goals, but I think maybe something that can't actually be serialized should raise. What breaks if these raise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both bqplot and pythreejs are failing now due to this. Arguably, those libraries should maybe provide valid JSON.
Edit:, for example, ipydatagrid has special serializers for those: https://github.com/bloomberg/ipydatagrid/blob/main/ipydatagrid/datagrid.py#L172-L178
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simplejson does the wrong thing for bytes:
In [2]: simplejson.dumps({'buf': b'bytes',}, default=repr)
Out[2]: '{"buf": "bytes"}'
we should have a test that verifies this, because the switch to simplejson should have caused tests to fail.
Can you add
({"key": b"\xFF"}, {"key": "/w==\n"}),
to the list of pairs in test_json_default
? And make sure the same json implementation is used?
I think the bytes behavior is more important than the unsupported floats. matplotlib figures rely on it, for one.
What I think is the ideal behavior is to trigger a DeprecationWarning when these values are seen, but still coerce to null. I don't see a good way to do either with simplejson or stdlib json, though.
If supporting these values is deprecated, maybe the thing to do is to fallback on the old inefficient path:
try:
dumps(obj, default=json_default)
except ValueError as e:
# maybe double check that it's the nan/inf error
# DeprecationWarning about inf/nan support
obj = json_clean(obj)
return dumps(obj, default=json_default)
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's do that. Thanks!
Let's close this PR. I will add an extra test here like you suggested, and push a fix to ipykernel for the fallback to json_clean
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and push a fix to ipykernel
Actually, should we do this fallback to json_clean
in jupyter_client
? It's jupyter_client
that does the dumps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, should we do this fallback to json_clean in jupyter_client? It's jupyter_client that does the dumps
Yeah, I think doing it here makes sense. I'm not sure ipykernel can add a catch in the right place.
cc. @SylvainCorlay @minrk
This is fixing serialization of
math.nan
andmath.inf
, preventing:ValueError: Out of range float values are not JSON compliant
We used to clean NaNs and Inf in
json_clean
from ipykernel before serializing them, but since ipython/ipykernel#708 we need to take care of them