-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve Extension type on cross section #22785
Conversation
Hello @TomAugspurger! Thanks for updating the PR.
Comment last updated on September 20, 2018 at 16:50 Hours UTC |
Codecov Report
@@ Coverage Diff @@
## master #22785 +/- ##
==========================================
- Coverage 92.19% 92.18% -0.01%
==========================================
Files 169 169
Lines 50825 50806 -19
==========================================
- Hits 46857 46836 -21
- Misses 3968 3970 +2
Continue to review full report at Codecov.
|
result = np.empty(n, dtype=dtype) | ||
if is_extension_array_dtype(dtype): | ||
# we'll eventually construct an ExtensionArray. | ||
result = np.empty(n, dtype=object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do people find this confusing? I can either
- duplicate the for loop, using
list.append
for EAs and inserting intoresult
for other - use lists everywhere
- use this
I chose this implementation because I assume it's slightly for wide dataframes with a numpy type, compared to building a list an then np.asarray(result)
at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation looks good to me
With the caveats that a) I generally trust @TomAugspurger to know what he's doing and b) this can wait until the upcoming chat: wouldn't it be simpler just to allow EA to support 2D arrays? I expect we're going to have to go that way eventually for #22614 etc |
I'm still a bit hesitant about 2D extension arrays, since it pushes additional complexity on the implementations, and goes a bit against our (someday?) goal of removing the BlockManager in favor of a simpler data structure. But I'll read through #22614 and related issues, and try to collect my thoughts. |
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -546,6 +546,7 @@ Other API Changes | |||
- :class:`pandas.io.formats.style.Styler` supports a ``number-format`` property when using :meth:`~pandas.io.formats.style.Styler.to_excel` (:issue:`22015`) | |||
- :meth:`DataFrame.corr` and :meth:`Series.corr` now raise a ``ValueError`` along with a helpful error message instead of a ``KeyError`` when supplied with an invalid method (:issue:`22298`) | |||
- :meth:`shift` will now always return a copy, instead of the previous behaviour of returning self when shifting by 0 (:issue:`22397`) | |||
- Slicing a single row of a DataFrame with multiple ExtensionArrays of the same type now preserves the dtype, rather than coercing to object (:issue:`22784`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double backticks on DataFrame
shouldn't this be in the EA section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
for blk in self.blocks: | ||
# Such assignment may incorrectly coerce NaT to None | ||
# result[blk.mgr_locs] = blk._slice((slice(None), loc)) | ||
for i, rl in enumerate(blk.mgr_locs): | ||
result[rl] = blk._try_coerce_result(blk.iget((i, loc))) | ||
|
||
if is_extension_array_dtype(dtype): | ||
result = dtype.construct_array_type()._from_sequence( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this gauaranteed to be 1d at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result
is created a few lines above with np.empty(n, dtype=object)
, so I assume yes
result = np.empty(n, dtype=dtype) | ||
if is_extension_array_dtype(dtype): | ||
# we'll eventually construct an ExtensionArray. | ||
result = np.empty(n, dtype=object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation looks good to me
for blk in self.blocks: | ||
# Such assignment may incorrectly coerce NaT to None | ||
# result[blk.mgr_locs] = blk._slice((slice(None), loc)) | ||
for i, rl in enumerate(blk.mgr_locs): | ||
result[rl] = blk._try_coerce_result(blk.iget((i, loc))) | ||
|
||
if is_extension_array_dtype(dtype): | ||
result = dtype.construct_array_type()._from_sequence( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result
is created a few lines above with np.empty(n, dtype=object)
, so I assume yes
I think everything has been addressed. Merging in a few hours. |
I am fine with merging now, if that makes it easier for updating your sparse PR |
closes #22784
Builds on #22780 (first commit).
0197e0c has the relevant changes.