-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: astype() can now take col label -> dtype mapping as arg; GH7271 #13375
ENH: astype() can now take col label -> dtype mapping as arg; GH7271 #13375
Conversation
1a9ae16
to
b654ffe
Compare
Current coverage is 84.38%@@ master #13375 diff @@
==========================================
Files 142 142
Lines 51223 51241 +18
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43224 43242 +18
Misses 7999 7999
Partials 0 0
|
@@ -2980,18 +2981,39 @@ def astype(self, dtype, copy=True, raise_on_error=True, **kwargs): | |||
|
|||
Parameters | |||
---------- | |||
dtype : numpy.dtype or Python type | |||
dtype : numpy.dtype, Python type, or dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add pandas extension type
b654ffe
to
f8af32c
Compare
Updated |
f8af32c
to
a0a7712
Compare
'the key in Series dtype mappings.') | ||
new_type = list(dtype.values())[0] | ||
return self.astype(new_type, copy, raise_on_error, **kwargs) | ||
if self.ndim > 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elif
give a nice error message with the exception
9a64742
to
d679e35
Compare
@@ -2980,18 +2981,45 @@ def astype(self, dtype, copy=True, raise_on_error=True, **kwargs): | |||
|
|||
Parameters | |||
---------- | |||
dtype : numpy.dtype or Python type | |||
dtype : numpy.dtype, Python type, Pandas extension type, or dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this something like "data type, or dict of column -> type".
And then you can sum the different options (numpy.dtype, Python type, pandas extension type) in the explanation below.
@StephenKappel Can you rebase this? This will raise an error when you have duplicate columns (also if it is not the column you are astyping). But maybe we are OK with that? |
for col in other_col_labels] | ||
new_df = concat(casted_cols + other_cols, axis=1, copy=False) | ||
return new_df.reindex(columns=self.columns, copy=False) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is going to raise if the columns are duplicates (not the dtype dict). Put them in order first (the original order). before passing to the concat. Further doing self[col]
also will not work with duplicates. something like:
results = []
# this is safe for duplicates
for c, col in self:
if c in dtype:
results.append( # cast col here)
else:
results.append(col)
new_df = concat(results, axis=1, copy=False)
...
can you rebase / update |
fe49999
to
880b311
Compare
rebased and updated to work with duplicate column names |
expected = DataFrame({ | ||
'a': a1_str, | ||
'b': b_str, | ||
'a': a2_str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you cannot contruct a duplicate this way it only has non duplicate keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
construct via unique named series and then set the columns
or use a list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jeez. I swear I know better than that... not sure what I was thinking. Fixed now.
880b311
to
5fe82e3
Compare
thanks! nice PR |
New PR for what was started in #12086.
closes #7271
By passing a dict of {column name/column index: dtype}, multiple columns can be cast to different data types in a single command. Now users can do:
df = df.astype({'my_bool', 'bool', 'my_int': 'int'})
or:
df = df.astype({0, 'bool', 1: 'int'})
instead of: