Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve Extension type on cross section #22785

Merged
merged 10 commits into from
Sep 26, 2018

Conversation

TomAugspurger
Copy link
Contributor

closes #22784

Builds on #22780 (first commit).

0197e0c has the relevant changes.

@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Sep 20, 2018
@TomAugspurger TomAugspurger added Indexing Related to indexing on series/frames, not to indexes themselves Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. labels Sep 20, 2018
@pep8speaks
Copy link

pep8speaks commented Sep 20, 2018

Hello @TomAugspurger! Thanks for updating the PR.

Comment last updated on September 20, 2018 at 16:50 Hours UTC

@codecov
Copy link

codecov bot commented Sep 20, 2018

Codecov Report

Merging #22785 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22785      +/-   ##
==========================================
- Coverage   92.19%   92.18%   -0.01%     
==========================================
  Files         169      169              
  Lines       50825    50806      -19     
==========================================
- Hits        46857    46836      -21     
- Misses       3968     3970       +2
Flag Coverage Δ
#multiple 90.6% <100%> (-0.01%) ⬇️
#single 42.33% <60%> (-0.05%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/multi.py 95.45% <ø> (ø) ⬆️
pandas/core/base.py 97.61% <ø> (ø) ⬆️
pandas/core/frame.py 97.2% <ø> (ø) ⬆️
pandas/core/internals/managers.py 96.66% <100%> (+0.1%) ⬆️
pandas/core/groupby/base.py 91.11% <0%> (-0.73%) ⬇️
pandas/io/parquet.py 73.04% <0%> (-0.69%) ⬇️
pandas/core/ops.py 97.07% <0%> (-0.3%) ⬇️
pandas/core/resample.py 96.78% <0%> (-0.19%) ⬇️
pandas/core/window.py 96.28% <0%> (-0.12%) ⬇️
pandas/io/parsers.py 95.54% <0%> (-0.06%) ⬇️
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 739e6be...78dd81e. Read the comment docs.

result = np.empty(n, dtype=dtype)
if is_extension_array_dtype(dtype):
# we'll eventually construct an ExtensionArray.
result = np.empty(n, dtype=object)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do people find this confusing? I can either

  1. duplicate the for loop, using list.append for EAs and inserting into result for other
  2. use lists everywhere
  3. use this

I chose this implementation because I assume it's slightly for wide dataframes with a numpy type, compared to building a list an then np.asarray(result) at the end.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation looks good to me

@TomAugspurger
Copy link
Contributor Author

cc @jreback for 78798cf

@jbrockmendel
Copy link
Member

With the caveats that a) I generally trust @TomAugspurger to know what he's doing and b) this can wait until the upcoming chat: wouldn't it be simpler just to allow EA to support 2D arrays? I expect we're going to have to go that way eventually for #22614 etc

@TomAugspurger
Copy link
Contributor Author

I'm still a bit hesitant about 2D extension arrays, since it pushes additional complexity on the implementations, and goes a bit against our (someday?) goal of removing the BlockManager in favor of a simpler data structure. But I'll read through #22614 and related issues, and try to collect my thoughts.

@@ -546,6 +546,7 @@ Other API Changes
- :class:`pandas.io.formats.style.Styler` supports a ``number-format`` property when using :meth:`~pandas.io.formats.style.Styler.to_excel` (:issue:`22015`)
- :meth:`DataFrame.corr` and :meth:`Series.corr` now raise a ``ValueError`` along with a helpful error message instead of a ``KeyError`` when supplied with an invalid method (:issue:`22298`)
- :meth:`shift` will now always return a copy, instead of the previous behaviour of returning self when shifting by 0 (:issue:`22397`)
- Slicing a single row of a DataFrame with multiple ExtensionArrays of the same type now preserves the dtype, rather than coercing to object (:issue:`22784`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backticks on DataFrame

shouldn't this be in the EA section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

for blk in self.blocks:
# Such assignment may incorrectly coerce NaT to None
# result[blk.mgr_locs] = blk._slice((slice(None), loc))
for i, rl in enumerate(blk.mgr_locs):
result[rl] = blk._try_coerce_result(blk.iget((i, loc)))

if is_extension_array_dtype(dtype):
result = dtype.construct_array_type()._from_sequence(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this gauaranteed to be 1d at this point?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result is created a few lines above with np.empty(n, dtype=object), so I assume yes

result = np.empty(n, dtype=dtype)
if is_extension_array_dtype(dtype):
# we'll eventually construct an ExtensionArray.
result = np.empty(n, dtype=object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation looks good to me

for blk in self.blocks:
# Such assignment may incorrectly coerce NaT to None
# result[blk.mgr_locs] = blk._slice((slice(None), loc))
for i, rl in enumerate(blk.mgr_locs):
result[rl] = blk._try_coerce_result(blk.iget((i, loc)))

if is_extension_array_dtype(dtype):
result = dtype.construct_array_type()._from_sequence(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result is created a few lines above with np.empty(n, dtype=object), so I assume yes

@TomAugspurger
Copy link
Contributor Author

I think everything has been addressed. Merging in a few hours.

@jorisvandenbossche
Copy link
Member

I am fine with merging now, if that makes it easier for updating your sparse PR

@TomAugspurger TomAugspurger merged commit 9df8065 into pandas-dev:master Sep 26, 2018
@TomAugspurger TomAugspurger deleted the ea-xs branch September 26, 2018 14:27
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Indexing a row of a DataFrame with multiple ExtensionArrays coerces to object
5 participants