Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jupyter_client 7.x implicitly depends on pyzmq>22.2.0 (matplotlib UnicodeDecodeError) #702

Closed
diurnalist opened this issue Sep 28, 2021 · 5 comments · Fixed by #703
Closed

Comments

@diurnalist
Copy link

I think this is a bug in the latest release and it regresses matplotlib for anybody who happens to have pyzmq < 22.2.0.

I was able to reproduce just by having a simple cell w/ a plot in an IPython notebook:

%matplotlib inline
import matplotlib.pyplot as plt
plt.plot([1, 2, 3, 4])
plt.ylabel('some numbers')
plt.show()

This will return the following output:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
/tmp/ipykernel_480/1959157813.py in <module>
      2 plt.plot([1, 2, 3, 4])
      3 plt.ylabel('some numbers')
----> 4 plt.show()

/opt/conda/lib/python3.9/site-packages/matplotlib/pyplot.py in show(*args, **kwargs)
    376     """
    377     _warn_if_gui_out_of_main_thread()
--> 378     return _backend_mod.show(*args, **kwargs)
    379 
    380 

/opt/conda/lib/python3.9/site-packages/matplotlib_inline/backend_inline.py in show(close, block)
     39     try:
     40         for figure_manager in Gcf.get_all_fig_managers():
---> 41             display(
     42                 figure_manager.canvas.figure,
     43                 metadata=_fetch_figure_metadata(figure_manager.canvas.figure)

/opt/conda/lib/python3.9/site-packages/IPython/core/display.py in display(include, exclude, metadata, transient, display_id, *objs, **kwargs)
    325                 # kwarg-specified metadata gets precedence
    326                 _merge(md_dict, metadata)
--> 327             publish_display_data(data=format_dict, metadata=md_dict, **kwargs)
    328     if display_id:
    329         return DisplayHandle(display_id)

/opt/conda/lib/python3.9/site-packages/IPython/core/display.py in publish_display_data(data, metadata, source, transient, **kwargs)
    117         kwargs['transient'] = transient
    118 
--> 119     display_pub.publish(
    120         data=data,
    121         metadata=metadata,

/opt/conda/lib/python3.9/site-packages/ipykernel/zmqshell.py in publish(self, data, metadata, transient, update)
    136                 return
    137 
--> 138         self.session.send(
    139             self.pub_socket, msg, ident=self.topic,
    140         )

/opt/conda/lib/python3.9/site-packages/jupyter_client/session.py in send(self, stream, msg_or_type, content, parent, ident, buffers, track, header, metadata)
    828         if self.adapt_version:
    829             msg = adapt(msg, self.adapt_version)
--> 830         to_send = self.serialize(msg, ident)
    831         to_send.extend(buffers)
    832         longest = max([len(s) for s in to_send])

/opt/conda/lib/python3.9/site-packages/jupyter_client/session.py in serialize(self, msg, ident)
    702             content = self.none
    703         elif isinstance(content, dict):
--> 704             content = self.pack(content)
    705         elif isinstance(content, bytes):
    706             # content is already packed, as in a relayed message

/opt/conda/lib/python3.9/site-packages/jupyter_client/session.py in json_packer(obj)
     93 
     94 def json_packer(obj):
---> 95     return jsonapi.dumps(
     96         obj,
     97         default=json_default,

/opt/conda/lib/python3.9/site-packages/zmq/utils/jsonapi.py in dumps(o, **kwargs)
     39         kwargs['separators'] = (',', ':')
     40 
---> 41     s = jsonmod.dumps(o, **kwargs)
     42 
     43     if isinstance(s, unicode):

/opt/conda/lib/python3.9/site-packages/simplejson/__init__.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, encoding, default, use_decimal, namedtuple_as_object, tuple_as_array, bigint_as_string, sort_keys, item_sort_key, for_json, ignore_nan, int_as_string_bitcount, iterable_as_array, **kw)
    383     if cls is None:
    384         cls = JSONEncoder
--> 385     return cls(
    386         skipkeys=skipkeys, ensure_ascii=ensure_ascii,
    387         check_circular=check_circular, allow_nan=allow_nan, indent=indent,

/opt/conda/lib/python3.9/site-packages/simplejson/encoder.py in encode(self, o)
    294         # exceptions aren't as detailed.  The list call should be roughly
    295         # equivalent to the PySequence_Fast that ''.join() would do.
--> 296         chunks = self.iterencode(o, _one_shot=True)
    297         if not isinstance(chunks, (list, tuple)):
    298             chunks = list(chunks)

/opt/conda/lib/python3.9/site-packages/simplejson/encoder.py in iterencode(self, o, _one_shot)
    376                 self.iterable_as_array, Decimal=decimal.Decimal)
    377         try:
--> 378             return _iterencode(o, 0)
    379         finally:
    380             key_memo.clear()

/opt/conda/lib/python3.9/site-packages/simplejson/encoder.py in encode_basestring(s, _PY3, _q)
     42     if _PY3:
     43         if isinstance(s, bytes):
---> 44             s = str(s, 'utf-8')
     45         elif type(s) is not str:
     46             # convert an str subclass instance to exact str

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

The issue seemed to be the JSON serialization of the IPython mimebundle-formatted outputs, which return dicts that have some of the values (the image contents) in bytes.

I couldn't find any obvious place where jupyter_client was doing the wrong thing, or a new thing, but PyZMQ has this in the 22.2.0 release notes:

deprecate zmq.utils.jsonapi and remove support for non-stdlib json implementations in send/recv_json. Custom serialization methods should be used instead.

I was able to fix my issues by upgrading past 22.2.0:

pip install --upgrade 'pyzmq>22.2.0'

It does seem like 22.2.0 is the version that breaks things, if you pip install --upgrade 'pyzmq<22.2.0 this will occur.

@davidbrochart
Copy link
Member

cc @martinRenou

@minrk
Copy link
Member

minrk commented Sep 29, 2021

I believe it's the presence of simplejson that is causing this. It has different handling of bytes from stdlib json (which started as a direct import of simplesjon):

In [1]: import json

In [2]: import simplejson

In [3]: d = dict(a=b'bytes')

In [4]: json.dumps(d, default=repr)
Out[4]: '{"a": "b\'bytes\'"}'

In [5]: simplejson.dumps(d, default=repr)
Out[5]: '{"a": "bytes"}'

jupyter-client 7 relies on default being called for bytes, whereas simplejson assumes (incorrectly in our case) that bytestrings are utf8 text.

For some background: zmq.utils.jsonapi was introduced when there was no json implementation in the standard library. There were several competing implementations, so we went with trying to support a few widespread ones. All we really wanted was "json encode to utf8-bytes" which was fiddly in the days of Python 2.5 + 3.0. If you have a third-party json implementation like simplejson, pyzmq assumes you'd prefer to use that (in early days, stdlib's fork of simplejson was necessarily older, and the public release was often faster but otherwise identical). That's a much easier story today, and jupyter-client should probably have its own implementation, either assuming stdlib (as in pyzmq 22.2), or also supporting the current generation of optimized JSON implementations, if available. #703 takes the simplest approach to start.

This issue would also go away if whatever code was putting bytes in a to-be-json dict did the b2a_base64 call itself, which I think is IPython's print_figure.

@diurnalist
Copy link
Author

@minrk thank you for the quick attention and the detailed explanation. So, I guess some recent refactors did not notice this behavior in simplejson because later versions of pyzmq had the jsonapi module using the native json builtin?

In any event thank you very much!

@minrk
Copy link
Member

minrk commented Sep 30, 2021

later versions of pyzmq had the jsonapi module using the native json builtin?

pyzmq's always used the stdlib json (once it was added in 2.6, at least), but it prefers simplejson if available. But that's true a lot less often these days, so it's almost always using the stdlib. That is, until 22.2, which removed support for alternative implementations so it's now always consistent with what was before only the most common behavior.

You've pretty much got it - I think it was missed mainly due to the combination of:

  • pyzmq 22.2 came out first, then jupyter-client 7, so no one with an up-to-date env would have been impacted at any point in time, and
  • with json in the standard library, simplejson isn't that common anymore

so it would only come up if all three of these conditions are met:

  • have the latest jupyter-client,
  • do not have the latest pyzmq, and
  • have simplejson

which I suspect is a pretty small number of envs

@diurnalist
Copy link
Author

diurnalist commented Sep 30, 2021

Got it! Our env checks all the bad boxes for sure. The culprit in my view is that we have made OpenStack client libraries available in our Jupyter environment, and those strongly prefer to be installed using a global set of requirements tied to major releases. So, when we install these clients we try to respect these requirements and constraints (there are also a LOT of dependencies that can be pulled in by these ;_;).

I just checked and simplejson is in the list (doesn't necessarily mean anything, other than it's likely that some of the client libraries or their dependencies could pull it in), and there's an upper constraint pinning pyzmq to <22.2.0 in all but the most bleeding edge release branches of that thing. So that explains things some more. If others are mixing OpenStack libs with the latest Jupyter client presumably they would also have experienced this.

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants