Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

case insensitive bool parsing #1295

Closed
changhiskhan opened this issue May 23, 2012 · 3 comments
Closed

case insensitive bool parsing #1295

changhiskhan opened this issue May 23, 2012 · 3 comments
Labels
Enhancement IO Data IO issues that don't fit into a more specific label
Milestone

Comments

@changhiskhan
Copy link
Contributor

From mailing list:

This is something that surprised me.
When reading from csv the values "True" and "False" are converted to bool, while "TRUE" and "FALSE"' to object
Using a converter {...: np.bool} I could take it into account.
Is it useful to take more possibilities into account knowing that there are a lot of possibilies True/False, Yes/No, ....?

In [19]: data = '''A;B
....: True;False
....: False;True'''
In [20]: df = pd.read_csv(StringIO(data), sep=';')
In [22]: df.dtypes
Out[22]:
A bool
B bool

In [23]: data = '''A;B
....: TRUE;FALSE
....: FALSE;TRUE'''

In [24]: df = pd.read_csv(StringIO(data), sep=';')

In [25]: df.dtypes
Out[25]:
A object
B object

In [27]: df = pd.read_csv(StringIO(data), sep=';', converters={'A':np.bool})

In [28]: df.dtypes
Out[28]:
A bool
B object

@moleary
Copy link
Contributor

moleary commented Jul 25, 2012

Hi,

I'm new to pandas, but I'd like to get involved. Is this something I could take a crack at?

Mark

@changhiskhan
Copy link
Contributor Author

Absolutely. If you haven't already done so, a good starting point would be to read the pandas developer page, the pandas examples on read_csv etc, and then dive into the pandas.io.parsers module.
We look forward to your pull request!

@moleary
Copy link
Contributor

moleary commented Jul 25, 2012

Okay I'll take a look at it. Thanks.

@wesm wesm closed this as completed Sep 18, 2012
yarikoptic added a commit to neurodebian/pandas that referenced this issue Sep 27, 2012
* commit 'v0.8.1-203-g67121af': (193 commits)
  BUG: DataFrame column formatting issue in length-truncated column close pandas-dev#1906
  BUG: override min/max in DatetimeIndex to function as expected close pandas-dev#1895
  BUG: DataFrame mixed-type arithmetic column-wise, fix DataFrame.diff upcasting->object bug close pandas-dev#1896
  BUG: treat nobs=1 >= min_periods case in rolling_std/variance as 0 trivially. close pandas-dev#1884
  TST: skip to_file test if URLError occurs on some systems
  VB: resolve test name conflict and update make script
  DOC: minor change to build script to help auto build process
  DOC: fixed extlinks in sphinx conf
  TST: oops import in wrong place
  TST: skip test_console_encode if sys.stdin.encoding is None
  TST: unit test for pandas-dev#1902 and default to csv.QUOTE_MINIMAL
  Make it possible to set quoting for to_csv
  ENH: clean up pandas-dev#1691 changes, rls note
  ENH: add more possible bool values to read_csv pandas-dev#1295
  BUG: fix rolling_max/min for small inputs and large windows. Add a check that the min_period <= window size. Fixes pandas-dev#1897.
  Mention Ubuntu for NeuroDebian repository
  BUG: don't clobber color keyword in Series.plot, close pandas-dev#1890
  DOC: add intersphinx mapping for python library, close pandas-dev#1556
  BUG: fix mixed-integer .ix indexing bugs. close#1799
  BUG: unicode sheet name in to_excel pandas-dev#1828
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

No branches or pull requests

3 participants