Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NaN and Inf serialization #706

Closed

Conversation

martinRenou
Copy link
Member

cc. @SylvainCorlay @minrk

This is fixing serialization of math.nan and math.inf, preventing: ValueError: Out of range float values are not JSON compliant

We used to clean NaNs and Inf in json_clean from ipykernel before serializing them, but since ipython/ipykernel#708 we need to take care of them

@martinRenou martinRenou force-pushed the fix_nan_serialization branch 2 times, most recently from 82d2306 to e22d04d Compare September 30, 2021 07:33
@@ -27,6 +26,7 @@
compare_digest,
) # We are using compare_digest to limit the surface of timing attacks

import simplejson as json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the added dependency? I think we should be able to do this with stdlib json, since we always have before.

Copy link
Member Author

@martinRenou martinRenou Sep 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before, we had json_clean in ipykernel that was going through all messages (making copies of lists and dicts) and replacing math.inf and math.nan manually.

Unfortunately the standard json lib only allows two options:

  • allow_nan=True: nan is serialized to NaN, inf is serialized to Infinity, which are both valid JS but not valid JSON
  • allow_nan=False: throw an exception for nan and inf without letting us a chance to serialize them to null (valid JSON) in the default function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arg, that's frustrating. And support for nan/inf is required somewhere? This is an artifact of earlier "never raise" goals, but I think maybe something that can't actually be serialized should raise. What breaks if these raise?

Copy link
Member Author

@martinRenou martinRenou Sep 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both bqplot and pythreejs are failing now due to this. Arguably, those libraries should maybe provide valid JSON.

Edit:, for example, ipydatagrid has special serializers for those: https://github.com/bloomberg/ipydatagrid/blob/main/ipydatagrid/datagrid.py#L172-L178

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplejson does the wrong thing for bytes:

In [2]: simplejson.dumps({'buf': b'bytes',}, default=repr)
Out[2]: '{"buf": "bytes"}'

we should have a test that verifies this, because the switch to simplejson should have caused tests to fail.

Can you add

        ({"key": b"\xFF"}, {"key": "/w==\n"}),

to the list of pairs in test_json_default? And make sure the same json implementation is used?

I think the bytes behavior is more important than the unsupported floats. matplotlib figures rely on it, for one.

What I think is the ideal behavior is to trigger a DeprecationWarning when these values are seen, but still coerce to null. I don't see a good way to do either with simplejson or stdlib json, though.

If supporting these values is deprecated, maybe the thing to do is to fallback on the old inefficient path:

try:
    dumps(obj, default=json_default)
except ValueError as e:
    # maybe double check that it's the nan/inf error
    # DeprecationWarning about inf/nan support
    obj = json_clean(obj)
    return dumps(obj, default=json_default)

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's do that. Thanks!

Let's close this PR. I will add an extra test here like you suggested, and push a fix to ipykernel for the fallback to json_clean.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and push a fix to ipykernel

Actually, should we do this fallback to json_clean in jupyter_client? It's jupyter_client that does the dumps

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, should we do this fallback to json_clean in jupyter_client? It's jupyter_client that does the dumps

Yeah, I think doing it here makes sense. I'm not sure ipykernel can add a catch in the right place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants