Preserve Extension type on cross section #22785

TomAugspurger · 2018-09-20T13:50:36Z

closes #22784

Builds on #22780 (first commit).

0197e0c has the relevant changes.

pep8speaks · 2018-09-20T13:50:47Z

Hello @TomAugspurger! Thanks for updating the PR.

There are no PEP8 issues in the file pandas/core/base.py !
There are no PEP8 issues in the file pandas/core/frame.py !
There are no PEP8 issues in the file pandas/core/indexes/multi.py !
There are no PEP8 issues in the file pandas/core/internals/managers.py !
There are no PEP8 issues in the file pandas/tests/frame/test_dtypes.py !
There are no PEP8 issues in the file pandas/tests/indexing/test_indexing.py !
There are no PEP8 issues in the file pandas/tests/indexing/test_multiindex.py !
There are no PEP8 issues in the file pandas/tests/series/test_dtypes.py !

Comment last updated on September 20, 2018 at 16:50 Hours UTC

codecov · 2018-09-20T16:25:11Z

Codecov Report

Merging #22785 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #22785      +/-   ##
==========================================
- Coverage   92.19%   92.18%   -0.01%     
==========================================
  Files         169      169              
  Lines       50825    50806      -19     
==========================================
- Hits        46857    46836      -21     
- Misses       3968     3970       +2

Flag	Coverage Δ
#multiple	`90.6% <100%> (-0.01%)`	⬇️
#single	`42.33% <60%> (-0.05%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/indexes/multi.py	`95.45% <ø> (ø)`	⬆️
pandas/core/base.py	`97.61% <ø> (ø)`	⬆️
pandas/core/frame.py	`97.2% <ø> (ø)`	⬆️
pandas/core/internals/managers.py	`96.66% <100%> (+0.1%)`	⬆️
pandas/core/groupby/base.py	`91.11% <0%> (-0.73%)`	⬇️
pandas/io/parquet.py	`73.04% <0%> (-0.69%)`	⬇️
pandas/core/ops.py	`97.07% <0%> (-0.3%)`	⬇️
pandas/core/resample.py	`96.78% <0%> (-0.19%)`	⬇️
pandas/core/window.py	`96.28% <0%> (-0.12%)`	⬇️
pandas/io/parsers.py	`95.54% <0%> (-0.06%)`	⬇️
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 739e6be...78dd81e. Read the comment docs.

TomAugspurger · 2018-09-20T16:28:50Z

pandas/core/internals/managers.py

-        result = np.empty(n, dtype=dtype)
+        if is_extension_array_dtype(dtype):
+            # we'll eventually construct an ExtensionArray.
+            result = np.empty(n, dtype=object)


Do people find this confusing? I can either

duplicate the for loop, using list.append for EAs and inserting into result for other

use lists everywhere

use this

I chose this implementation because I assume it's slightly for wide dataframes with a numpy type, compared to building a list an then np.asarray(result) at the end.

This implementation looks good to me

TomAugspurger · 2018-09-20T16:49:56Z

cc @jreback for 78798cf

jbrockmendel · 2018-09-20T18:48:29Z

With the caveats that a) I generally trust @TomAugspurger to know what he's doing and b) this can wait until the upcoming chat: wouldn't it be simpler just to allow EA to support 2D arrays? I expect we're going to have to go that way eventually for #22614 etc

TomAugspurger · 2018-09-20T19:31:21Z

I'm still a bit hesitant about 2D extension arrays, since it pushes additional complexity on the implementations, and goes a bit against our (someday?) goal of removing the BlockManager in favor of a simpler data structure. But I'll read through #22614 and related issues, and try to collect my thoughts.

jreback · 2018-09-23T12:28:01Z

doc/source/whatsnew/v0.24.0.txt

@@ -546,6 +546,7 @@ Other API Changes
 - :class:`pandas.io.formats.style.Styler` supports a ``number-format`` property when using :meth:`~pandas.io.formats.style.Styler.to_excel` (:issue:`22015`)
 - :meth:`DataFrame.corr` and :meth:`Series.corr` now raise a ``ValueError`` along with a helpful error message instead of a ``KeyError`` when supplied with an invalid method (:issue:`22298`)
 - :meth:`shift` will now always return a copy, instead of the previous behaviour of returning self when shifting by 0 (:issue:`22397`)
+- Slicing a single row of a DataFrame with multiple ExtensionArrays of the same type now preserves the dtype, rather than coercing to object (:issue:`22784`)


double backticks on DataFrame

shouldn't this be in the EA section?

jreback · 2018-09-23T12:29:08Z

pandas/core/internals/managers.py

        for blk in self.blocks:
            # Such assignment may incorrectly coerce NaT to None
            # result[blk.mgr_locs] = blk._slice((slice(None), loc))
            for i, rl in enumerate(blk.mgr_locs):
                result[rl] = blk._try_coerce_result(blk.iget((i, loc)))

+        if is_extension_array_dtype(dtype):
+            result = dtype.construct_array_type()._from_sequence(


is this gauaranteed to be 1d at this point?

result is created a few lines above with np.empty(n, dtype=object), so I assume yes

jorisvandenbossche · 2018-09-24T09:00:51Z

pandas/core/internals/managers.py

-        result = np.empty(n, dtype=dtype)
+        if is_extension_array_dtype(dtype):
+            # we'll eventually construct an ExtensionArray.
+            result = np.empty(n, dtype=object)


This implementation looks good to me

jorisvandenbossche · 2018-09-24T09:01:31Z

pandas/core/internals/managers.py

        for blk in self.blocks:
            # Such assignment may incorrectly coerce NaT to None
            # result[blk.mgr_locs] = blk._slice((slice(None), loc))
            for i, rl in enumerate(blk.mgr_locs):
                result[rl] = blk._try_coerce_result(blk.iget((i, loc)))

+        if is_extension_array_dtype(dtype):
+            result = dtype.construct_array_type()._from_sequence(


result is created a few lines above with np.empty(n, dtype=object), so I assume yes

[ci skip]

TomAugspurger · 2018-09-26T13:47:49Z

I think everything has been addressed. Merging in a few hours.

jorisvandenbossche · 2018-09-26T14:03:50Z

I am fine with merging now, if that makes it easier for updating your sparse PR

TomAugspurger added 2 commits September 20, 2018 08:48

ENH: is_homogenous

e8b37da

BUG: Preserve dtype on homogeneous EA xs

0197e0c

TomAugspurger added this to the 0.24.0 milestone Sep 20, 2018

TomAugspurger added Indexing Related to indexing on series/frames, not to indexes themselves Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. labels Sep 20, 2018

TomAugspurger mentioned this pull request Sep 20, 2018

ENH: is_homogeneous #22780

Merged

TomAugspurger added 3 commits September 20, 2018 11:04

asarray test

62326ae

Fixed asarray

f008c38

Merge remote-tracking branch 'upstream/master' into ea-xs

88c6126

TomAugspurger commented Sep 20, 2018

View reviewed changes

TomAugspurger added 2 commits September 20, 2018 11:48

is_homogeneous -> is_homogeneous_type

78798cf

lint

b051424

lint

d6a2479

TomAugspurger mentioned this pull request Sep 20, 2018

SparseArray is an ExtensionArray #22325

Merged

4 tasks

jreback requested changes Sep 23, 2018

View reviewed changes

jorisvandenbossche approved these changes Sep 24, 2018

View reviewed changes

TomAugspurger added 2 commits September 26, 2018 06:38

Merge remote-tracking branch 'upstream/master' into ea-xs

f796138

Moved whatsnew to correct section

78dd81e

[ci skip]

TomAugspurger merged commit 9df8065 into pandas-dev:master Sep 26, 2018

TomAugspurger deleted the ea-xs branch September 26, 2018 14:27

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

Preserve Extension type on cross section (pandas-dev#22785)

916c931

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve Extension type on cross section #22785

Preserve Extension type on cross section #22785

TomAugspurger commented Sep 20, 2018

pep8speaks commented Sep 20, 2018 •

edited

Loading

codecov bot commented Sep 20, 2018 •

edited

Loading

TomAugspurger Sep 20, 2018

jorisvandenbossche Sep 24, 2018

TomAugspurger commented Sep 20, 2018

jbrockmendel commented Sep 20, 2018

TomAugspurger commented Sep 20, 2018

jreback Sep 23, 2018

TomAugspurger Sep 26, 2018

jreback Sep 23, 2018

jorisvandenbossche Sep 24, 2018

jorisvandenbossche Sep 24, 2018

jorisvandenbossche Sep 24, 2018

TomAugspurger commented Sep 26, 2018

jorisvandenbossche commented Sep 26, 2018

Preserve Extension type on cross section #22785

Preserve Extension type on cross section #22785

Conversation

TomAugspurger commented Sep 20, 2018

pep8speaks commented Sep 20, 2018 • edited Loading

Comment last updated on September 20, 2018 at 16:50 Hours UTC

codecov bot commented Sep 20, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Sep 20, 2018

jbrockmendel commented Sep 20, 2018

TomAugspurger commented Sep 20, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Sep 26, 2018

jorisvandenbossche commented Sep 26, 2018

pep8speaks commented Sep 20, 2018 •

edited

Loading

codecov bot commented Sep 20, 2018 •

edited

Loading