Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SparseArray is an ExtensionArray #22325

Merged
merged 236 commits into from
Oct 13, 2018
Merged
Show file tree
Hide file tree
Changes from 66 commits
Commits
Show all changes
236 commits
Select commit Hold shift + click to select a range
ee187eb
wip
TomAugspurger Jul 12, 2018
32c1372
from scratch
TomAugspurger Jul 13, 2018
b265659
Updates
TomAugspurger Jul 13, 2018
8dfc898
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Jul 13, 2018
9c57725
WIP
TomAugspurger Jul 13, 2018
13952ab
wip
TomAugspurger Jul 13, 2018
7a6e7fa
wip take
TomAugspurger Jul 13, 2018
1016af1
wip take
TomAugspurger Jul 16, 2018
072abec
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Jul 22, 2018
0ad61cc
take
TomAugspurger Jul 22, 2018
5b0b524
take working
TomAugspurger Jul 22, 2018
224744a
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Jul 23, 2018
620b5fb
remove registry
TomAugspurger Jul 23, 2018
164c401
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Jul 24, 2018
65f83d6
missing
TomAugspurger Jul 24, 2018
0b3c682
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Jul 27, 2018
69a5d13
wip ops
TomAugspurger Jul 27, 2018
f2b5862
More ops wip
TomAugspurger Jul 27, 2018
fa80fc5
segfault!
TomAugspurger Jul 28, 2018
3f20890
wip
TomAugspurger Jul 28, 2018
484adb0
start docs
TomAugspurger Jul 28, 2018
1df1190
2 failing extension tests
TomAugspurger Jul 30, 2018
4246ac4
wip fillna
TomAugspurger Jul 30, 2018
a849699
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 1, 2018
c4da319
registry dtype, asarray
TomAugspurger Aug 1, 2018
a2f158f
astype interface
TomAugspurger Aug 1, 2018
26b671a
"passing" extension tests
TomAugspurger Aug 1, 2018
375e160
no sparse block
TomAugspurger Aug 1, 2018
0a37050
wip
TomAugspurger Aug 2, 2018
3c2cb0f
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 2, 2018
27c6378
wip
TomAugspurger Aug 3, 2018
e52dae9
a bit on concat
TomAugspurger Aug 3, 2018
b6d8430
revert concat changes
TomAugspurger Aug 3, 2018
640c4a5
passing again
TomAugspurger Aug 3, 2018
6b61597
More concat
TomAugspurger Aug 3, 2018
427234f
fillna...
TomAugspurger Aug 3, 2018
e055629
wip
TomAugspurger Aug 6, 2018
a79359c
wip
TomAugspurger Aug 6, 2018
de3aa71
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 6, 2018
21f4ee3
reductions, ufuncs
TomAugspurger Aug 6, 2018
c1e594a
failing on ufuncs
TomAugspurger Aug 6, 2018
dc7f93f
wipo
TomAugspurger Aug 6, 2018
eb09d21
concat is broken
TomAugspurger Aug 7, 2018
7dcf4b2
formatting failing
TomAugspurger Aug 7, 2018
b39658a
more wip
TomAugspurger Aug 7, 2018
a8b76bd
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 8, 2018
e041313
Extension test fixups
TomAugspurger Aug 8, 2018
595535e
some indexing, sparse string
TomAugspurger Aug 9, 2018
7700299
passing indexing
TomAugspurger Aug 9, 2018
f1ff7da
passing pivot
TomAugspurger Aug 9, 2018
33fa6f7
broken broken broken
TomAugspurger Aug 10, 2018
40c035e
sanitize
TomAugspurger Aug 10, 2018
1d49cc7
broken broken broken
TomAugspurger Aug 10, 2018
6f4b6b6
wip
TomAugspurger Aug 13, 2018
6f037b5
working through series
TomAugspurger Aug 13, 2018
7da220e
working through series
TomAugspurger Aug 13, 2018
bfbe4ab
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 13, 2018
c5666b6
series passing
TomAugspurger Aug 13, 2018
ff6037c
more tests
TomAugspurger Aug 13, 2018
5c362ef
wip
TomAugspurger Aug 13, 2018
55cac36
wip
TomAugspurger Aug 13, 2018
c4e8784
More test
TomAugspurger Aug 13, 2018
a00f987
skip internals tests
TomAugspurger Aug 13, 2018
a6d7eac
linting
TomAugspurger Aug 13, 2018
4b4f9bd
cleanup
TomAugspurger Aug 13, 2018
82801be
cleanup
TomAugspurger Aug 13, 2018
1a149dc
cleanup
TomAugspurger Aug 13, 2018
fde19d7
remove debug code
TomAugspurger Aug 13, 2018
a7ba8f6
API: dispatch to EA.astype
TomAugspurger Aug 13, 2018
5064217
API: ExtensionDtype._is_numeric
TomAugspurger Aug 14, 2018
e31e8aa
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 14, 2018
79c8e9c
update type
TomAugspurger Aug 14, 2018
26993fe
Merge remote-tracking branch 'upstream/master' into ea-astype-dispatch
TomAugspurger Aug 14, 2018
6eeec11
py2 compat
TomAugspurger Aug 14, 2018
50de326
fixed test
TomAugspurger Aug 14, 2018
5ef1747
test fill value
TomAugspurger Aug 14, 2018
f31970c
Test nbytes
TomAugspurger Aug 14, 2018
f1b860f
explainers
TomAugspurger Aug 14, 2018
5c44275
linting
TomAugspurger Aug 14, 2018
33bc8f8
Allow concatenating with different sparse dtypes
TomAugspurger Aug 14, 2018
9bf13ad
Linting
TomAugspurger Aug 14, 2018
de1fb5b
lint
TomAugspurger Aug 14, 2018
da580cd
Wip
TomAugspurger Aug 14, 2018
88b73c3
Merge branch 'ea-astype-dispatch' into ea-sparse-2
TomAugspurger Aug 14, 2018
afde64d
Merge branch 'ea-is-numeric' into ea-sparse-2
TomAugspurger Aug 14, 2018
e603d3d
fixup 33bc8f836
TomAugspurger Aug 15, 2018
ec5eb9a
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 15, 2018
a72ee1a
Fixed DataFrame.__setitem__ for updating to sparse.
TomAugspurger Aug 15, 2018
f147635
try removing
TomAugspurger Aug 15, 2018
c35c7c2
Merge branch 'ea-astype-dispatch' into ea-sparse-2
TomAugspurger Aug 15, 2018
e159ef2
wip
TomAugspurger Aug 16, 2018
d48a8fa
Fixup
TomAugspurger Aug 16, 2018
3bcf57e
astype works
TomAugspurger Aug 16, 2018
31d401f
Squashed commit of the following:
TomAugspurger Aug 16, 2018
a4369c2
Squashed commit of the following:
TomAugspurger Aug 16, 2018
608b499
Fixed Series[sparse].to_sparse
TomAugspurger Aug 16, 2018
14e60c9
Shift works
TomAugspurger Aug 16, 2018
550f163
parametrize shift test
TomAugspurger Aug 16, 2018
821cc91
Removed bogus test
TomAugspurger Aug 16, 2018
e21ed21
Un-xfail more
TomAugspurger Aug 16, 2018
aeb8c8c
scalar take raises
TomAugspurger Aug 16, 2018
34c90ed
Move fill_value to dtyep
TomAugspurger Aug 17, 2018
2103959
Move fill_value to dtyep
TomAugspurger Aug 17, 2018
26af959
Merge branch 'ea-sparse-dtype-fill-value' into ea-sparse-2
TomAugspurger Aug 18, 2018
e5920c2
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 18, 2018
084a967
cleanup
TomAugspurger Aug 18, 2018
bb17760
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 20, 2018
dde7852
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 20, 2018
f1b4e6b
Setting fill value (but that's bad)
TomAugspurger Aug 20, 2018
6a31077
Explicit fill value
TomAugspurger Aug 20, 2018
02aa7f7
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 20, 2018
3a7ee2d
Fixed merge conflicts
TomAugspurger Aug 20, 2018
d6fe191
subdtype -> subtype
TomAugspurger Aug 20, 2018
b1ea874
subdtype -> subtype
TomAugspurger Aug 20, 2018
2213b83
Fixed pickle
TomAugspurger Aug 21, 2018
94664c4
test dtype
TomAugspurger Aug 21, 2018
e54160c
astype update
TomAugspurger Aug 21, 2018
04a2dbb
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 21, 2018
fb01d1a
more
TomAugspurger Aug 21, 2018
f78ae81
lint
TomAugspurger Aug 21, 2018
11d5b40
py2 compat
TomAugspurger Aug 21, 2018
ba70753
dtype tests
TomAugspurger Aug 21, 2018
82bab3c
explainer
TomAugspurger Aug 21, 2018
2990124
Delete things
TomAugspurger Aug 21, 2018
a9d0f17
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 22, 2018
0c52c37
NumPy 1.9 compat
TomAugspurger Aug 22, 2018
998f113
implement divmod
TomAugspurger Aug 22, 2018
38b0356
Fix broken fill value setting
TomAugspurger Aug 22, 2018
7206d94
compare with lists
TomAugspurger Aug 22, 2018
fe771b5
clean
TomAugspurger Aug 22, 2018
12e424c
fixed index ctor fail
TomAugspurger Aug 22, 2018
3bd567f
New xfail
TomAugspurger Aug 22, 2018
f816346
Handle sparse reindex
TomAugspurger Aug 22, 2018
1a1dcf4
concat mixed
TomAugspurger Aug 22, 2018
e3d9173
take note
TomAugspurger Aug 22, 2018
2715cdb
Remove test.
TomAugspurger Aug 22, 2018
4e40599
concat NA and empty
TomAugspurger Aug 22, 2018
0aa3934
dum
TomAugspurger Aug 22, 2018
a3becb6
Fix lost fill value
TomAugspurger Aug 22, 2018
5660b9a
override
TomAugspurger Aug 22, 2018
dd3cba5
Handle fill in unique
TomAugspurger Aug 23, 2018
cc65b8a
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 23, 2018
06dce5f
Faster isna
TomAugspurger Aug 23, 2018
f7351d3
Support old numpy
TomAugspurger Aug 23, 2018
2055494
clean
TomAugspurger Aug 23, 2018
f310322
Simplified setter
TomAugspurger Aug 23, 2018
0008164
Inplace not supported.
TomAugspurger Aug 23, 2018
027f6d8
compat
TomAugspurger Aug 24, 2018
c0d9875
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 24, 2018
44b218c
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 28, 2018
47fa73a
32-bit compat
TomAugspurger Aug 28, 2018
c2c489f
Lint
TomAugspurger Aug 28, 2018
3729927
Test fixups
TomAugspurger Aug 28, 2018
9ba49e1
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 29, 2018
543ac7c
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 30, 2018
f66ef6f
CI passing
TomAugspurger Aug 30, 2018
ba8fc9d
Right numpy version
TomAugspurger Aug 30, 2018
9185e33
linting
TomAugspurger Aug 30, 2018
11799ab
Try intp
TomAugspurger Aug 31, 2018
73e7626
32-bit compat
TomAugspurger Aug 31, 2018
ebece16
Doc cleanup
TomAugspurger Aug 31, 2018
7db6990
Simplify is_sparse
TomAugspurger Aug 31, 2018
be21f42
Updated factorize
TomAugspurger Sep 4, 2018
e857363
Use ABC
TomAugspurger Sep 4, 2018
d0ee038
simplify interleave_dtype
TomAugspurger Sep 4, 2018
54f4417
docstring, simplify
TomAugspurger Sep 4, 2018
2082d86
fixup supers
TomAugspurger Sep 4, 2018
f846606
Linting
TomAugspurger Sep 4, 2018
ce8e0ac
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Sep 4, 2018
1f6590e
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Sep 5, 2018
b758469
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Sep 6, 2018
f6b0924
move and fix conflict
TomAugspurger Sep 6, 2018
232518c
doc note
TomAugspurger Sep 6, 2018
e8b37da
ENH: is_homogenous
TomAugspurger Sep 20, 2018
0197e0c
BUG: Preserve dtype on homogeneous EA xs
TomAugspurger Sep 20, 2018
62326ae
asarray test
TomAugspurger Sep 20, 2018
f008c38
Fixed asarray
TomAugspurger Sep 20, 2018
88c6126
Merge remote-tracking branch 'upstream/master' into ea-xs
TomAugspurger Sep 20, 2018
5c8662e
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Sep 20, 2018
78798cf
is_homogeneous -> is_homogeneous_type
TomAugspurger Sep 20, 2018
b051424
lint
TomAugspurger Sep 20, 2018
78979b6
Squashed commit of the following:
TomAugspurger Sep 20, 2018
2333db1
Merge followup
TomAugspurger Sep 20, 2018
b41d473
Followup from merge
TomAugspurger Sep 20, 2018
d6a2479
lint
TomAugspurger Sep 20, 2018
a23c27c
Merge remote-tracking branch 'origin/ea-xs' into ea-sparse-2
TomAugspurger Sep 20, 2018
7372eb3
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Sep 26, 2018
cab8c54
handle unary ops
TomAugspurger Sep 26, 2018
52ae275
linting
TomAugspurger Sep 26, 2018
9c9b49e
compat, lint
TomAugspurger Sep 26, 2018
f5d7492
SparseSeries unary ops
TomAugspurger Sep 26, 2018
b4b4cbc
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Sep 26, 2018
bf98b9d
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Sep 26, 2018
f3d2681
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Sep 29, 2018
7d4d3ba
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Oct 4, 2018
57c03c2
splib
TomAugspurger Oct 4, 2018
0dbc33e
collections -> compat
TomAugspurger Oct 4, 2018
c217cf5
updates
TomAugspurger Oct 8, 2018
2ea7a91
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Oct 8, 2018
8f2f228
Set dtype
TomAugspurger Oct 8, 2018
c83bed7
reveret
TomAugspurger Oct 8, 2018
53e494e
clarify fillna
TomAugspurger Oct 8, 2018
627b9ce
Remove old invert
TomAugspurger Oct 8, 2018
df0293a
some cleanup
TomAugspurger Oct 8, 2018
a590418
remove redundant whatsnew
TomAugspurger Oct 9, 2018
7821f19
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Oct 9, 2018
ee26c52
Update hashing, eq
TomAugspurger Oct 9, 2018
40390f1
wip-comments
TomAugspurger Oct 11, 2018
15a164d
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Oct 11, 2018
88432c8
hashing
TomAugspurger Oct 11, 2018
3e7ec90
dtype and datetime64
TomAugspurger Oct 11, 2018
7b0a179
Updates
TomAugspurger Oct 11, 2018
20d8815
index
TomAugspurger Oct 11, 2018
3e81c69
wip
TomAugspurger Oct 11, 2018
1098a7a
quantile test
TomAugspurger Oct 11, 2018
10d204a
merge conflict
TomAugspurger Oct 11, 2018
69075d8
use is_homogenous_type
TomAugspurger Oct 11, 2018
0764baa
use assert_frame_equal
TomAugspurger Oct 11, 2018
a4a47c5
merge exp construction
TomAugspurger Oct 11, 2018
a5b6c39
API: Allow ExtensionArray.isna to be an EA
TomAugspurger Oct 11, 2018
70d8268
document and test map
TomAugspurger Oct 11, 2018
7aed79f
table formatting
TomAugspurger Oct 11, 2018
11e55aa
fixup! API: Allow ExtensionArray.isna to be an EA
TomAugspurger Oct 11, 2018
11606af
Restore subclass test
TomAugspurger Oct 11, 2018
2f73179
Revert changes to test
TomAugspurger Oct 11, 2018
1b3058a
quote
TomAugspurger Oct 11, 2018
f4ec928
fixup! API: Allow ExtensionArray.isna to be an EA
TomAugspurger Oct 11, 2018
8c67ca2
lint
TomAugspurger Oct 11, 2018
cc89ec7
COMPAT: NumPy 1.9 bool-like indexing
TomAugspurger Oct 12, 2018
3f713d4
misc. comments
TomAugspurger Oct 12, 2018
886fe03
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Oct 12, 2018
75099af
asarray on bool key for numpy compat
TomAugspurger Oct 12, 2018
731fc06
Raise for non-default values
TomAugspurger Oct 12, 2018
f91141d
groupby / reduce compat
TomAugspurger Oct 12, 2018
37a4b57
lint
TomAugspurger Oct 12, 2018
4aad8e1
fix docs
jorisvandenbossche Oct 13, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,22 @@ is the case with :attr:`Period.end_time`, for example

p.end_time

.. _whatsnew_0240.api_breaking.sparse_values:

``SparseArray`` is now an ``ExtensionArray``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved

This has some notable changes

- ``SparseArray`` is no longer a subclass of :class:`numpy.ndarray`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know of specific consequences that people might run into because of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm not sure. isinstance(sparse_array, np.ndarray)? :)

The main thing is that there are many method implemented in ndarray that are not on SparseArray. Hard to say what's most used.

- ``SparseArray.dtype`` and ``SparseSeries.dtype`` are now instances of ``SparseDtype``, rather than ``np.dtype``. Access the underlying dtype with ``SparseDtype.subdtype``.
- :meth:`numpy.asarray(sparse_array)` now returns a dense array with all the values, not just the non-fill-value values (:issue:`todo`)
- Providing a ``sparse_index`` to the SparseArray constructor no longer defaults the na-value to ``np.nan`` for all dtypes. The correct na_value for ``data.dtype`` is now used.
- passing ``fill_value`` to ``SparseArray.take`` no longer implies ``allow_fill=True``.
- ``SparseArray.astype(np.dtype)`` will create a dense NumPy array. To keep astype to a SparseArray with a different subdtype, use ``.astype(sparse_dtype)`` or a string like ``.astype('Sparse[float32]')``.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should the .astype behavior be? IMO .astype(np_dtype), but we could also automatically wrap np_dtype in SparseDtype and return a SparseArray whose sp_values has np_dtype dtype. And people want a dense astype can do np.asarray(sparse_array, np_dtype). I think for backwards compat SparseSeries.astype keeps things sparse. These should match behavior.

- Setting ``SparseArray.fill_value`` to a fill value with a different dtype is now allowed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this then change the dtype of the SparseArray?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's a bad idea though (see SparseArray.fill_value.setter).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have misunderstood your question earlier. The answer may be no.

SparseArray.dtype is a SparseDtype, which consists of two fields: the array dtype (SparseArray.sp_values.dtype) and the fill_value. This changes just the fill value. The array type is unchanged. There's no restriction mixing the array dtype and the type of fill value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a bit strange though to have sp_values and a fill_value that don't have compatible dtypes?

- Bug in ``SparseArray.nbytes`` under-reporting its memory usage by not including the size of its sparse index.

.. _whatsnew_0240.api.datetimelike.normalize:

Tick DateOffset Normalize Restrictions
Expand Down Expand Up @@ -418,7 +434,7 @@ ExtensionType Changes
- Bug in :meth:`Series.get` for ``Series`` using ``ExtensionArray`` and integer index (:issue:`21257`)
- :meth:`Series.combine()` works correctly with :class:`~pandas.api.extensions.ExtensionArray` inside of :class:`Series` (:issue:`20825`)
- :meth:`Series.combine()` with scalar argument now works for any function type (:issue:`21248`)
-
- Added ``ExtensionDtype._is_numeric`` for controlling whether an extension dtype is considered numeric.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be here anymore (since the other PRs are already merged?) (the same for the shift entry above)


.. _whatsnew_0240.api.incompatibilities:

Expand Down
8 changes: 8 additions & 0 deletions pandas/_libs/sparse.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,10 @@ cdef class IntIndex(SparseIndex):
output += 'Indices: %s\n' % repr(self.indices)
return output

@property
def nbytes(self):
return self.indices.nbytes

def check_integrity(self):
"""
Checks the following:
Expand Down Expand Up @@ -362,6 +366,10 @@ cdef class BlockIndex(SparseIndex):

return output

@property
def nbytes(self):
return self.blocs.nbytes + self.blengths.nbytes

@property
def ngaps(self):
return self.length - self.npoints
Expand Down
2 changes: 1 addition & 1 deletion pandas/api/extensions/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@
from pandas.core.algorithms import take # noqa
from pandas.core.arrays.base import (ExtensionArray, # noqa
ExtensionScalarOpsMixin)
from pandas.core.dtypes.dtypes import ExtensionDtype # noqa
from pandas.core.dtypes.dtypes import registry, ExtensionDtype # noqa
11 changes: 8 additions & 3 deletions pandas/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
from pandas import compat
from pandas.compat import iteritems, PY36, OrderedDict
from pandas.core.dtypes.generic import ABCSeries, ABCIndex, ABCIndexClass
from pandas.core.dtypes.common import is_integer
from pandas.core.dtypes.common import is_integer, is_bool_dtype
from pandas.core.dtypes.inference import _iterable_not_string
from pandas.core.dtypes.missing import isna, isnull, notnull # noqa
from pandas.core.dtypes.cast import construct_1d_object_array_from_listlike
Expand Down Expand Up @@ -100,7 +100,12 @@ def maybe_box_datetimelike(value):


def is_bool_indexer(key):
if isinstance(key, (ABCSeries, np.ndarray, ABCIndex)):
# TODO: This is currently broken for ExtensionArrays.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# We currently special case SparseArray, but that should *maybe* be
# just ExtensionArray.
from pandas.core.sparse.api import SparseArray
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have an ABCSparseArray


if isinstance(key, (ABCSeries, np.ndarray, ABCIndex, SparseArray)):
if key.dtype == np.object_:
key = np.asarray(values_from_object(key))

Expand All @@ -110,7 +115,7 @@ def is_bool_indexer(key):
'NA / NaN values')
return False
return True
elif key.dtype == np.bool_:
elif is_bool_dtype(key.dtype):
return True
elif isinstance(key, list):
try:
Expand Down
16 changes: 16 additions & 0 deletions pandas/core/dtypes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,17 @@ def is_dtype(cls, dtype):
except TypeError:
return False

@property
def _is_numeric(self):
"""
Whether columns with this dtype should be considered numeric.

By default ExtensionDtypes are assumed to be non-numeric.
They'll be excluded from operations that exclude non-numeric
columns, like groupby reductions.
"""
return False


class ExtensionDtype(_DtypeOpsMixin):
"""A custom data type, to be paired with an ExtensionArray.
Expand All @@ -109,6 +120,11 @@ class ExtensionDtype(_DtypeOpsMixin):
* name
* construct_from_string

The following properties affect the behavior of extension arrays
in operations:

* _is_numeric

Optionally one can override construct_array_type for construction
with the name of this dtype via the Registry

Expand Down
32 changes: 28 additions & 4 deletions pandas/core/dtypes/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
DatetimeTZDtypeType, PeriodDtype, PeriodDtypeType, IntervalDtype,
IntervalDtypeType, PandasExtensionDtype, ExtensionDtype,
_pandas_registry)
from pandas.core.sparse.dtype import SparseDtype
from pandas.core.dtypes.generic import (
ABCCategorical, ABCPeriodIndex, ABCDatetimeIndex, ABCSeries,
ABCSparseArray, ABCSparseSeries, ABCCategoricalIndex, ABCIndexClass,
Expand Down Expand Up @@ -152,8 +153,22 @@ def is_sparse(arr):
>>> is_sparse(bsr_matrix([1, 2, 3]))
False
"""
from pandas.core.sparse.array import SparseArray
from pandas.core.sparse.dtype import SparseDtype
from pandas.core.generic import ABCSeries
from pandas.core.internals import BlockManager, Block

return isinstance(arr, (ABCSparseArray, ABCSparseSeries))
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
if isinstance(arr, BlockManager):
# SparseArrays are only 1d
if arr.ndim == 1:
arr = arr.blocks[0]
else:
return False

if isinstance(arr, (ABCSeries, Block)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use getattr(arr, 'values', arr)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think so, since that would densify a SparseArray, and then return False.

arr = arr.values

return isinstance(arr, (SparseArray, ABCSparseSeries, SparseDtype))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed this function accepts dtype objects? (and also BlockManagers?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, though if we work on the Series constructor, this could maybe be avoided.

This is so that

pd.DataFrame({"A": pd.SparseSeries([1, 2])})['A']

is a SparseSeries. If we exclude the block handling, we get back a Series[Sparse[float64]] (which is fine by me, but an API change that we could deprecate).

For including SparseDtype, I'm not sure why, but this changes the output of DataFrame.values. For a homogenous sparse frame, .values is object without including SparseDtype here. With SparseDtype, it's float (dense).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.



def is_scipy_sparse(arr):
Expand Down Expand Up @@ -1608,8 +1623,9 @@ def is_bool_dtype(arr_or_dtype):
False
>>> is_bool_dtype(np.array([True, False]))
True
>>> is_bool_dtype(pd.SparseArray([True, False]))
True
"""

if arr_or_dtype is None:
return False
try:
Expand All @@ -1626,7 +1642,8 @@ def is_bool_dtype(arr_or_dtype):
# guess this
return (arr_or_dtype.is_object and
arr_or_dtype.inferred_type == 'boolean')

elif isinstance(arr_or_dtype, SparseDtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for is_bool_indexe(SparseArray) We support boolean indexing a SparseArray with a SparseArray of booleans.

return issubclass(arr_or_dtype.subdtype.type, np.bool_)
return issubclass(tipo, np.bool_)


Expand Down Expand Up @@ -1706,6 +1723,8 @@ def is_extension_array_dtype(arr_or_dtype):
array interface. In pandas, this includes:

* Categorical
* Sparse
* Interval

Third-party libraries may implement arrays or types satisfying
this interface as well.
Expand Down Expand Up @@ -1828,7 +1847,8 @@ def _get_dtype(arr_or_dtype):
return PeriodDtype.construct_from_string(arr_or_dtype)
elif is_interval_dtype(arr_or_dtype):
return IntervalDtype.construct_from_string(arr_or_dtype)
elif isinstance(arr_or_dtype, (ABCCategorical, ABCCategoricalIndex)):
elif isinstance(arr_or_dtype, (ABCCategorical, ABCCategoricalIndex,
ABCSparseArray, ABCSparseSeries)):
return arr_or_dtype.dtype

if hasattr(arr_or_dtype, 'dtype'):
Expand Down Expand Up @@ -1876,6 +1896,10 @@ def _get_dtype_type(arr_or_dtype):
elif is_interval_dtype(arr_or_dtype):
return IntervalDtypeType
return _get_dtype_type(np.dtype(arr_or_dtype))
elif isinstance(arr_or_dtype, (ABCSparseSeries, ABCSparseArray,
SparseDtype)):
dtype = getattr(arr_or_dtype, 'dtype', arr_or_dtype)
return dtype.type
try:
return arr_or_dtype.dtype.type
except AttributeError:
Expand Down
70 changes: 17 additions & 53 deletions pandas/core/dtypes/concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,9 @@ def _get_frame_result_type(result, objs):
otherwise, return 1st obj
"""

if result.blocks and all(b.is_sparse for b in result.blocks):
if (result.blocks and (
all(is_sparse(b) for b in result.blocks) or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related to my comment above. cannot is_sparse not simply check if its an EA and if it has a Sparse Dtype?

then you simply need to pass the b.values here, yes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give that a shot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here, its not obvious what you are doing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can obj be a SparseFrame here? is this tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a comment of mine may have been lost.

This is hit in several places (e.g. pandas/tests/sparse/test_combine_concat.py::TestSparseDataFrameConcat::test_concat).

What part can I clarify here?

all(isinstance(obj, ABCSparseDataFrame) for obj in objs))):
from pandas.core.sparse.api import SparseDataFrame
return SparseDataFrame
else:
Expand Down Expand Up @@ -554,61 +556,23 @@ def _concat_sparse(to_concat, axis=0, typs=None):
a single array, preserving the combined dtypes
"""

from pandas.core.sparse.array import SparseArray, _make_index
from pandas.core.sparse.array import SparseArray
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved

def convert_sparse(x, axis):
# coerce to native type
if isinstance(x, SparseArray):
x = x.get_values()
else:
x = np.asarray(x)
x = x.ravel()
if axis > 0:
x = np.atleast_2d(x)
return x
fill_values = [x.fill_value for x in to_concat
if isinstance(x, SparseArray)]

if typs is None:
typs = get_dtype_kinds(to_concat)
if len(set(fill_values)) > 1:
raise ValueError("Cannot concatenate SparseArrays with different "
"fill values")
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved

if len(typs) == 1:
# concat input as it is if all inputs are sparse
# and have the same fill_value
fill_values = {c.fill_value for c in to_concat}
if len(fill_values) == 1:
sp_values = [c.sp_values for c in to_concat]
indexes = [c.sp_index.to_int_index() for c in to_concat]

indices = []
loc = 0
for idx in indexes:
indices.append(idx.indices + loc)
loc += idx.length
sp_values = np.concatenate(sp_values)
indices = np.concatenate(indices)
sp_index = _make_index(loc, indices, kind=to_concat[0].sp_index)

return SparseArray(sp_values, sparse_index=sp_index,
fill_value=to_concat[0].fill_value)

# input may be sparse / dense mixed and may have different fill_value
# input must contain sparse at least 1
sparses = [c for c in to_concat if is_sparse(c)]
fill_values = [c.fill_value for c in sparses]
sp_indexes = [c.sp_index for c in sparses]

# densify and regular concat
to_concat = [convert_sparse(x, axis) for x in to_concat]
result = np.concatenate(to_concat, axis=axis)

if not len(typs - {'sparse', 'f', 'i'}):
# sparsify if inputs are sparse and dense numerics
# first sparse input's fill_value and SparseIndex is used
result = SparseArray(result.ravel(), fill_value=fill_values[0],
kind=sp_indexes[0])
else:
# coerce to object if needed
result = result.astype('object')
return result
fill_value = list(fill_values)[0]
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved

# TODO: Fix join unit generation so we aren't passed this.
to_concat = [x if isinstance(x, SparseArray)
else SparseArray(x.squeeze(), fill_value=fill_value)
for x in to_concat]

return SparseArray._concat_same_type(to_concat)


def _concat_rangeindex_same_dtype(indexes):
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/internals/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
make_block, # io.pytables, io.packers
FloatBlock, IntBlock, ComplexBlock, BoolBlock, ObjectBlock,
TimeDeltaBlock, DatetimeBlock, DatetimeTZBlock,
CategoricalBlock, ExtensionBlock, SparseBlock, ScalarBlock,
CategoricalBlock, ExtensionBlock, ScalarBlock,
Block)
from .managers import ( # noqa:F401
BlockManager, SingleBlockManager,
Expand Down
Loading