repr wrong column alignment with non-ascii characters #1620

manuteleco · 2012-07-14T10:04:40Z

Hi,

it seems that when DataFrame, Series and maybe other objects contain non-ascii characters inside non-unicode strings the __repr__ method is not able to give the correct column alignment to its values. However, we see that this issue does not affect unicode strings. I'm using pandas '0.8.1.dev-70c3deb' in a Linux box.

Sample code:

# -*- coding: utf-8 -*-

from pandas import Series, DataFrame

df1 = DataFrame([["aaaa", 1], ["bbbb", 2]])
df2 = DataFrame([["aaää", 1], ["bbbb", 2]])
df3 = DataFrame([[u"aaää", 1], ["bbbb", 2]])

# Comparison between "similar dataframes"
print df1
print
print df2
print
print df3
print

# Other cases:
s1 = Series(["ä", "bbbb", "ßß"])
print
print s1

This results in:

      0  1
0  aaaa  1
1  bbbb  2

        0  1
0  aaää  1
1    bbbb  2

      0  1
0  aaää  1
1  bbbb  2


0      ä
1    bbbb
2    ßß

Thanks and regards.

The text was updated successfully, but these errors were encountered:

jseabold · 2012-08-22T16:41:46Z

I don't think this should've been closed. The original problem in the given example still exists AFAICT.

>>> from pandas import Series, DataFrame
>>> 
>>> df1 = DataFrame([["aaaa", 1], ["bbbb", 2]])                                    
>>> df2 = DataFrame([["aaää", 1], ["bbbb", 2]])                                    
>>> df3 = DataFrame([[u"aaää", 1], ["bbbb", 2]])                                   
>>>                                                                                
>>> # Comparison between "similar dataframes"                                      
>>> print df1                                                                      
      0  1                                                                         
0  aaaa  1                                                                         
1  bbbb  2                                                                         
>>> print                                                                          

>>> print df2                                                                      
      0    1                                                                       
0  aaää  1                                                                         
1  bbbb    2                                                                       
>>> print                                                                          

>>> print df3                                                                      
      0  1                                                                         
0  aaää  1                                                                         
1  bbbb  2  
>>> pandas.version.version
'0.8.2.dev-c99d9cd'

jseabold · 2012-08-22T16:43:45Z

Though maybe this is intentional for strings?

jseabold · 2012-08-22T17:15:14Z

Thinking about this a bit more, I'm thinking this maybe should be fixed, but does that imply always using unicode in to_string? Regardless, force_unicode=True fails for df2. I'll push a fix for this.

Version 0.8.1 * tag 'v0.8.1': (126 commits) RLS: Version 0.8.1 DOC: tweak DOC: set_index/reset_index examples DOC: doc fixes and what's new in 0.8.1, vectorized string methods ENH: better string element access/slicing notation close pandas-dev#1656 DOC: minor additions to release notes for 0.8.1 BUG: handle Yahoo! finance returning duplicate dates for prev bus day, doc fixes BUG: fix windows/32-bit builds BUG: get pandas-dev#1620 fix working on python 3 ENH: handling of UTF-8 strings in DataFrame columns, close pandas-dev#1620 TST: span unit test pandas-dev#1635 TST: skip another @network test if no internet connection ENH/BUG: handle tz-aware datetime.datetime in to_datetime, add utc=True option to allow conversion to utc, close pandas-dev#1581 ENH: hack to not compress single group keys, accelerate single-key and Categorical groupby operations BUG: fix merge bug with left joins on length-0 DataFrame, close pandas-dev#1628 BUG: Series.interpolate bug with method='values' and datetime64[ns], close pandas-dev#1646 BUG: properly handle None values in dict input to concat, close pandas-dev#1649 BUG: len-0 Series min/max/describe pandas-dev#1650 Fix describe() failure for None and empty Series. BUG: string date aliases now work with tz-aware time series close pandas-dev#1647 ...

ghost assigned changhiskhan Jul 19, 2012

wesm mentioned this issue Jul 20, 2012

Configurability of unicode/console encoding #1654

Closed

wesm closed this as completed in ae70acc Jul 21, 2012

wesm added a commit that referenced this issue Jul 21, 2012

BUG: get #1620 fix working on python 3

4e94fba

changhiskhan added a commit that referenced this issue Jul 23, 2012

BUG: try to convert non-unicode non-ascii characters in repr #1620

00b31f1

jseabold mentioned this issue Aug 22, 2012

Fix tostring unicode #1804

Merged

wesm unassigned changhiskhan Oct 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

repr wrong column alignment with non-ascii characters #1620

repr wrong column alignment with non-ascii characters #1620

manuteleco commented Jul 14, 2012

jseabold commented Aug 22, 2012

jseabold commented Aug 22, 2012

jseabold commented Aug 22, 2012

__repr__ wrong column alignment with non-ascii characters #1620

__repr__ wrong column alignment with non-ascii characters #1620

Comments

manuteleco commented Jul 14, 2012

jseabold commented Aug 22, 2012

jseabold commented Aug 22, 2012

jseabold commented Aug 22, 2012

repr wrong column alignment with non-ascii characters #1620

repr wrong column alignment with non-ascii characters #1620