Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__repr__ wrong column alignment with non-ascii characters #1620

Closed
manuteleco opened this issue Jul 14, 2012 · 3 comments
Closed

__repr__ wrong column alignment with non-ascii characters #1620

manuteleco opened this issue Jul 14, 2012 · 3 comments
Labels
Milestone

Comments

@manuteleco
Copy link

Hi,

it seems that when DataFrame, Series and maybe other objects contain non-ascii characters inside non-unicode strings the __repr__ method is not able to give the correct column alignment to its values. However, we see that this issue does not affect unicode strings. I'm using pandas '0.8.1.dev-70c3deb' in a Linux box.

Sample code:

# -*- coding: utf-8 -*-

from pandas import Series, DataFrame

df1 = DataFrame([["aaaa", 1], ["bbbb", 2]])
df2 = DataFrame([["aaää", 1], ["bbbb", 2]])
df3 = DataFrame([[u"aaää", 1], ["bbbb", 2]])

# Comparison between "similar dataframes"
print df1
print
print df2
print
print df3
print

# Other cases:
s1 = Series(["ä", "bbbb", "ßß"])
print
print s1

This results in:

      0  1
0  aaaa  1
1  bbbb  2

        0  1
0  aaää  1
1    bbbb  2

      0  1
0  aaää  1
1  bbbb  2


0      ä
1    bbbb
2    ßß

Thanks and regards.

@jseabold
Copy link
Contributor

I don't think this should've been closed. The original problem in the given example still exists AFAICT.

>>> from pandas import Series, DataFrame
>>> 
>>> df1 = DataFrame([["aaaa", 1], ["bbbb", 2]])                                    
>>> df2 = DataFrame([["aaää", 1], ["bbbb", 2]])                                    
>>> df3 = DataFrame([[u"aaää", 1], ["bbbb", 2]])                                   
>>>                                                                                
>>> # Comparison between "similar dataframes"                                      
>>> print df1                                                                      
      0  1                                                                         
0  aaaa  1                                                                         
1  bbbb  2                                                                         
>>> print                                                                          

>>> print df2                                                                      
      0    1                                                                       
0  aaää  1                                                                         
1  bbbb    2                                                                       
>>> print                                                                          

>>> print df3                                                                      
      0  1                                                                         
0  aaää  1                                                                         
1  bbbb  2  
>>> pandas.version.version
'0.8.2.dev-c99d9cd'

@jseabold
Copy link
Contributor

Though maybe this is intentional for strings?

@jseabold
Copy link
Contributor

Thinking about this a bit more, I'm thinking this maybe should be fixed, but does that imply always using unicode in to_string? Regardless, force_unicode=True fails for df2. I'll push a fix for this.

yarikoptic added a commit to neurodebian/pandas that referenced this issue Sep 12, 2012
Version 0.8.1

* tag 'v0.8.1': (126 commits)
  RLS: Version 0.8.1
  DOC: tweak
  DOC: set_index/reset_index examples
  DOC: doc fixes and what's new in 0.8.1, vectorized string methods
  ENH: better string element access/slicing notation close pandas-dev#1656
  DOC: minor additions to release notes for 0.8.1
  BUG: handle Yahoo! finance returning duplicate dates for prev bus day, doc fixes
  BUG: fix windows/32-bit builds
  BUG: get pandas-dev#1620 fix working on python 3
  ENH: handling of UTF-8 strings in DataFrame columns, close pandas-dev#1620
  TST: span unit test pandas-dev#1635
  TST: skip another @network test if no internet connection
  ENH/BUG: handle tz-aware datetime.datetime in to_datetime, add utc=True option to allow conversion to utc, close pandas-dev#1581
  ENH: hack to not compress single group keys, accelerate single-key and Categorical groupby operations
  BUG: fix merge bug with left joins on length-0 DataFrame, close pandas-dev#1628
  BUG: Series.interpolate bug with method='values' and datetime64[ns], close pandas-dev#1646
  BUG: properly handle None values in dict input to concat, close pandas-dev#1649
  BUG: len-0 Series min/max/describe pandas-dev#1650
  Fix describe() failure for None and empty Series.
  BUG: string date aliases now work with tz-aware time series close pandas-dev#1647
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants