umlauts in dataframe cannot be displayed in ipython notebook #2458

jankatins · 2012-12-09T00:44:31Z

When I put this into a ipython notebook cell, an exception is thrown.

import pandas
data2 = [u"test", u"ß", u"ä", u"á"]
df2 = pandas.DataFrame({"a":data2})
print(df2["a"][1])
df2

ß
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-5-0bb7d6f7d515> in <module>()
      2 df2 = pandas.DataFrame({"a":data2})
      3 print(df2["a"][1])
----> 4 df2

C:\portabel\Python27\lib\site-packages\IPython\core\displayhook.pyc in __call__(self, result)
    244             self.update_user_ns(result)
    245             self.log_output(format_dict)
--> 246             self.finish_displayhook()
    247 
    248     def flush(self):

C:\portabel\Python27\lib\site-packages\IPython\zmq\displayhook.pyc in finish_displayhook(self)
     59         sys.stdout.flush()
     60         sys.stderr.flush()
---> 61         self.session.send(self.pub_socket, self.msg, ident=self.topic)
     62         self.msg = None
     63 

C:\portabel\Python27\lib\site-packages\IPython\zmq\session.pyc in send(self, stream, msg_or_type, content, parent, ident, buffers, track, header, metadata)
    576 
    577         buffers = [] if buffers is None else buffers
--> 578         to_send = self.serialize(msg, ident)
    579         to_send.extend(buffers)
    580         longest = max([ len(s) for s in to_send ])

C:\portabel\Python27\lib\site-packages\IPython\zmq\session.pyc in serialize(self, msg, ident)
    484             content = self.none
    485         elif isinstance(content, dict):
--> 486             content = self.pack(content)
    487         elif isinstance(content, bytes):
    488             # content is already packed, as in a relayed message

C:\portabel\Python27\lib\site-packages\IPython\zmq\session.pyc in <lambda>(obj)
     78 _version_info_list = list(IPython.version_info)
     79 # ISO8601-ify datetime objects
---> 80 json_packer = lambda obj: jsonapi.dumps(obj, default=date_default)
     81 json_unpacker = lambda s: extract_dates(jsonapi.loads(s))
     82 

C:\portabel\Python27\lib\site-packages\zmq\utils\jsonapi.pyc in dumps(o, **kwargs)
     70         kwargs['separators'] = (',', ':')
     71 
---> 72     return _squash_unicode(jsonmod.dumps(o, **kwargs))
     73 
     74 def loads(s, **kwargs):

C:\portabel\Python27\lib\site-packages\simplejson\__init__.pyc in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, encoding, default, use_decimal, namedtuple_as_object, tuple_as_array, bigint_as_string, sort_keys, item_sort_key, **kw)
    332         sort_keys=sort_keys,
    333         item_sort_key=item_sort_key,
--> 334         **kw).encode(obj)
    335 
    336 

C:\portabel\Python27\lib\site-packages\simplejson\encoder.pyc in encode(self, o)
    235         # exceptions aren't as detailed.  The list call should be roughly
    236         # equivalent to the PySequence_Fast that ''.join() would do.
--> 237         chunks = self.iterencode(o, _one_shot=True)
    238         if not isinstance(chunks, (list, tuple)):
    239             chunks = list(chunks)

C:\portabel\Python27\lib\site-packages\simplejson\encoder.pyc in iterencode(self, o, _one_shot)
    309                 Decimal=Decimal)
    310         try:
--> 311             return _iterencode(o, 0)
    312         finally:
    313             key_memo.clear()

UnicodeDecodeError: 'utf8' codec can't decode byte 0xdf in position 22: invalid continuation byte

This also happens when I have umlauts in csv files and specify an encoding:

df3 = pandas.read_csv("file_with_unlauts.csv", encoding="iso-8859-15")
print(df3.head(10)["name"][8]) # prints the right chars
df3.head(10) # throws the above error

The text was updated successfully, but these errors were encountered:

jankatins · 2012-12-09T00:46:35Z

pandas version: '0.10.0.dev-07318fa'

ghost · 2012-12-09T04:42:04Z

Cannot reproduce on notebook or qtconsole with ipython git master (0.14-dev).
which ipython version are you using?

jankatins · 2012-12-09T12:35:36Z

I also updated to ipython git master from today, but can still reproduce it in both ipythonqt and the notebook and with different browsers.

Win7 64bit

The notebook: https://gist.github.com/4244669 http://nbviewer.ipython.org/4244669/

jankatins · 2012-12-09T12:59:42Z

This also looks wrong to me:

df2.to_html()

results in

u'<table border="1" class="dataframe">\n  <thead>\n    <tr style="text-align: right;">\n      <th></th>\n      <th>a</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <td><strong>0</strong></td>\n      <td> test</td>\n    </tr>\n    <tr>\n      <td><strong>1</strong></td>\n      <td>    \xdf</td>\n    </tr>\n    <tr>\n      <td><strong>2</strong></td>\n      <td>    \xe4</td>\n    </tr>\n    <tr>\n      <td><strong>3</strong></td>\n      <td>    \xe1</td>\n    </tr>\n  </tbody>\n</table>'

A unicode string but the umlauts are nevertheless encoded in iso-8859-15: u"ß".encode("iso-8859-15")

ghost · 2012-12-09T13:19:05Z

what are the values of sys.stdout.encoding sys.stdin.encoding and pandas.get_option("print.encoding")?
try with qtconsole for now.

ghost · 2012-12-09T13:21:08Z

also, sys.getdefaultencoding()

jankatins · 2012-12-09T14:19:34Z

import sys
sys.stdout.encoding, sys.stdin.encoding, pandas.get_option("print.encoding"), sys.getdefaultencoding()
('UTF-8', None, 'cp1252', 'ascii')

ghost · 2012-12-09T14:28:08Z

I think I see what's happening.
Try the updated git master and post the results again.

jankatins · 2012-12-09T14:34:21Z

Yay, that fixed it! Thanks!

ghost · 2012-12-09T14:34:57Z

you're welcome.

closed via 7cc9779

hmeine · 2014-07-20T08:52:27Z

I have the same problem now with 0.14.0 and 0.14.1, running latest IPython from git.

sys.stdin.encoding: UTF-8 (in the terminal, in the notebook I am getting None)
sys.stdout.encoding: UTF-8
sys.stderr.encoding: UTF-8
sys.getdefaultencoding(): ascii
sys.getfilesystemencoding(): utf-8
locale.getpreferredencoding(): UTF-8

(Did not notice this problem before, but I am also unsure how often I had umlauts in my data.)

ghost self-assigned this Dec 9, 2012

ghost closed this as completed Dec 9, 2012

jankatins unassigned ghost Jul 20, 2014

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

umlauts in dataframe cannot be displayed in ipython notebook #2458

umlauts in dataframe cannot be displayed in ipython notebook #2458

jankatins commented Dec 9, 2012

jankatins commented Dec 9, 2012

ghost commented Dec 9, 2012

jankatins commented Dec 9, 2012

jankatins commented Dec 9, 2012

ghost commented Dec 9, 2012

ghost commented Dec 9, 2012

jankatins commented Dec 9, 2012

ghost commented Dec 9, 2012

jankatins commented Dec 9, 2012

ghost commented Dec 9, 2012

hmeine commented Jul 20, 2014

umlauts in dataframe cannot be displayed in ipython notebook #2458

umlauts in dataframe cannot be displayed in ipython notebook #2458

Comments

jankatins commented Dec 9, 2012

jankatins commented Dec 9, 2012

ghost commented Dec 9, 2012

jankatins commented Dec 9, 2012

jankatins commented Dec 9, 2012

ghost commented Dec 9, 2012

ghost commented Dec 9, 2012

jankatins commented Dec 9, 2012

ghost commented Dec 9, 2012

jankatins commented Dec 9, 2012

ghost commented Dec 9, 2012

hmeine commented Jul 20, 2014