Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot save DataFrame with unicode to CSV #705

Closed
jtbates opened this issue Jan 27, 2012 · 4 comments
Closed

Cannot save DataFrame with unicode to CSV #705

jtbates opened this issue Jan 27, 2012 · 4 comments
Labels
Unicode Unicode strings

Comments

@jtbates
Copy link

jtbates commented Jan 27, 2012

In [1]: from pandas import DataFrame
In [2]: df = DataFrame({u'c/\u03c3':[1,2,3]})
In [3]: df.to_csv('test')
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
.../<ipython-input-3-9b2e5ea53beb> in <module>()
----> 1 df.to_csv('test')

.../lib/python2.7/site-packages/pandas-0.7.0.dev_88fcac5-py2.7-macosx-10.4-x86_64.egg/pandas/core/frame.pyc in to_csv(self, path, sep, na_rep, cols, header, index, index_label, mode, nanRep)
    891                     # given a string for a DF with Index

    892                     index_label = [index_label]
--> 893                 csvout.writerow(list(index_label) + list(cols))
    894             else:
    895                 csvout.writerow(cols)

UnicodeEncodeError: 'ascii' codec can't encode character u'\u03c3' in position 2: ordinal not in range(128)

I think this should be separate from #680. The CSV issue is also mentioned in this comment on bug #300.

@adamklein
Copy link
Contributor

I presume you're using python version < 3? The csv module does not handle unicode unfortunately. I'll see if there is a workaround, but as you can tell by the recurring issues, pandas isn't exactly unicode-friendly on <= python 2.7, but neither is python 2.7 ...

@adamklein
Copy link
Contributor

I had to rewrite this b/c it slowed down CSV reading/writing. If you want to write a UTF-8 encoded csv in python version < 3, you need to pass df.to_csv(..., encoding='utf-8').

@jtbates
Copy link
Author

jtbates commented Jan 31, 2012

Yes, I'm on 2.7. Thanks @adamklein !

yarikoptic added a commit to neurodebian/pandas that referenced this issue Feb 10, 2012
* commit 'v0.7.0rc1-94-ge3df4e2':
  DOC: added info on encoding parameter for csv i/o
  TST: renamed io b/c module conflict, made suite check for config
  added vbench for write csv
  BUG: made encoding optional on csv read/write, addresses pandas-dev#717
  BUG: float64 hash table for handling NAs in Series.unique, close pandas-dev#714
  TST: add bench_unique.py
  TST: added better testing for pandas-dev#709
  BUG: closes pandas-dev#709, bug in ix + multiindex use case
  DOC: release notes
  BUG: don't assume that each object contains every unique block type in concat, GH pandas-dev#708
  BUG: inconsistency in .ix with integer label and float index
  Fix test that assumed py2.
  Don't use unnecessary UnicodeReader on Python 3.
  BUG: remove poor man's breakpoint
  BUG: closes pandas-dev#705, csv is encoded utf-8 and then decoded on the read side
  updated support contact info
  DOC: note EWMA adjustment, closes pandas-dev#703
  ENH: close pandas-dev#694, pandas-dev#693, pandas-dev#692
  BUG: Bar plot fails if axis parameter supplied, closes pandas-dev#702
@imsrgadich
Copy link

imsrgadich commented Jun 14, 2018

@adamklein 2018 and still the same issue. your trick helped. thanks!

dan-nadler pushed a commit to dan-nadler/pandas that referenced this issue Sep 23, 2019
Fixes build breakage due to pandas 0.24.0 upgrade
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Unicode Unicode strings
Projects
None yet
Development

No branches or pull requests

3 participants