Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: astype() can now take col label -> dtype mapping as arg; GH7271 #13375

Closed

Conversation

StephenKappel
Copy link
Contributor

New PR for what was started in #12086.

closes #7271

By passing a dict of {column name/column index: dtype}, multiple columns can be cast to different data types in a single command. Now users can do:

df = df.astype({'my_bool', 'bool', 'my_int': 'int'})

or:

df = df.astype({0, 'bool', 1: 'int'})

instead of:

df['my_bool'] = df.my_bool.astype('bool')
df['my_int'] = df.my_int.astype('int')

@codecov-io
Copy link

codecov-io commented Jun 6, 2016

Current coverage is 84.38%

Merging #13375 into master will increase coverage by <.01%

@@             master     #13375   diff @@
==========================================
  Files           142        142          
  Lines         51223      51241    +18   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43224      43242    +18   
  Misses         7999       7999          
  Partials          0          0          

Powered by Codecov. Last updated by ada6bf3...5fe82e3

@jreback jreback added Enhancement Dtype Conversions Unexpected or buggy dtype conversions API Design labels Jun 6, 2016
@@ -2980,18 +2981,39 @@ def astype(self, dtype, copy=True, raise_on_error=True, **kwargs):

Parameters
----------
dtype : numpy.dtype or Python type
dtype : numpy.dtype, Python type, or dict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add pandas extension type

@StephenKappel
Copy link
Contributor Author

Updated

'the key in Series dtype mappings.')
new_type = list(dtype.values())[0]
return self.astype(new_type, copy, raise_on_error, **kwargs)
if self.ndim > 2:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elif

give a nice error message with the exception

@StephenKappel StephenKappel force-pushed the 7271-df-astype-dict-2 branch 2 times, most recently from 9a64742 to d679e35 Compare June 11, 2016 21:46
@@ -2980,18 +2981,45 @@ def astype(self, dtype, copy=True, raise_on_error=True, **kwargs):

Parameters
----------
dtype : numpy.dtype or Python type
dtype : numpy.dtype, Python type, Pandas extension type, or dict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this something like "data type, or dict of column -> type".
And then you can sum the different options (numpy.dtype, Python type, pandas extension type) in the explanation below.

@jorisvandenbossche
Copy link
Member

@StephenKappel Can you rebase this?

This will raise an error when you have duplicate columns (also if it is not the column you are astyping). But maybe we are OK with that?

for col in other_col_labels]
new_df = concat(casted_cols + other_cols, axis=1, copy=False)
return new_df.reindex(columns=self.columns, copy=False)

Copy link
Contributor

@jreback jreback Jul 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to raise if the columns are duplicates (not the dtype dict). Put them in order first (the original order). before passing to the concat. Further doing self[col] also will not work with duplicates. something like:

results = []

# this is safe for duplicates
for c, col in self:
    if c in dtype:
      results.append( # cast col here)
    else:
     results.append(col)

new_df = concat(results, axis=1, copy=False)
...

@jorisvandenbossche jorisvandenbossche modified the milestones: 1.0, 0.19.0 Jul 8, 2016
@jreback
Copy link
Contributor

jreback commented Jul 15, 2016

can you rebase / update

@StephenKappel
Copy link
Contributor Author

rebased and updated to work with duplicate column names

expected = DataFrame({
'a': a1_str,
'b': b_str,
'a': a2_str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you cannot contruct a duplicate this way it only has non duplicate keys

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

construct via unique named series and then set the columns
or use a list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jeez. I swear I know better than that... not sure what I was thinking. Fixed now.

@jreback
Copy link
Contributor

jreback commented Jul 20, 2016

thanks! nice PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions Enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: df.astype could accept a dict of {col: type}
4 participants