Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SparseArray is an ExtensionArray #22325

Merged
merged 236 commits into from
Oct 13, 2018
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
236 commits
Select commit Hold shift + click to select a range
ee187eb
wip
TomAugspurger Jul 12, 2018
32c1372
from scratch
TomAugspurger Jul 13, 2018
b265659
Updates
TomAugspurger Jul 13, 2018
8dfc898
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Jul 13, 2018
9c57725
WIP
TomAugspurger Jul 13, 2018
13952ab
wip
TomAugspurger Jul 13, 2018
7a6e7fa
wip take
TomAugspurger Jul 13, 2018
1016af1
wip take
TomAugspurger Jul 16, 2018
072abec
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Jul 22, 2018
0ad61cc
take
TomAugspurger Jul 22, 2018
5b0b524
take working
TomAugspurger Jul 22, 2018
224744a
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Jul 23, 2018
620b5fb
remove registry
TomAugspurger Jul 23, 2018
164c401
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Jul 24, 2018
65f83d6
missing
TomAugspurger Jul 24, 2018
0b3c682
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Jul 27, 2018
69a5d13
wip ops
TomAugspurger Jul 27, 2018
f2b5862
More ops wip
TomAugspurger Jul 27, 2018
fa80fc5
segfault!
TomAugspurger Jul 28, 2018
3f20890
wip
TomAugspurger Jul 28, 2018
484adb0
start docs
TomAugspurger Jul 28, 2018
1df1190
2 failing extension tests
TomAugspurger Jul 30, 2018
4246ac4
wip fillna
TomAugspurger Jul 30, 2018
a849699
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 1, 2018
c4da319
registry dtype, asarray
TomAugspurger Aug 1, 2018
a2f158f
astype interface
TomAugspurger Aug 1, 2018
26b671a
"passing" extension tests
TomAugspurger Aug 1, 2018
375e160
no sparse block
TomAugspurger Aug 1, 2018
0a37050
wip
TomAugspurger Aug 2, 2018
3c2cb0f
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 2, 2018
27c6378
wip
TomAugspurger Aug 3, 2018
e52dae9
a bit on concat
TomAugspurger Aug 3, 2018
b6d8430
revert concat changes
TomAugspurger Aug 3, 2018
640c4a5
passing again
TomAugspurger Aug 3, 2018
6b61597
More concat
TomAugspurger Aug 3, 2018
427234f
fillna...
TomAugspurger Aug 3, 2018
e055629
wip
TomAugspurger Aug 6, 2018
a79359c
wip
TomAugspurger Aug 6, 2018
de3aa71
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 6, 2018
21f4ee3
reductions, ufuncs
TomAugspurger Aug 6, 2018
c1e594a
failing on ufuncs
TomAugspurger Aug 6, 2018
dc7f93f
wipo
TomAugspurger Aug 6, 2018
eb09d21
concat is broken
TomAugspurger Aug 7, 2018
7dcf4b2
formatting failing
TomAugspurger Aug 7, 2018
b39658a
more wip
TomAugspurger Aug 7, 2018
a8b76bd
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 8, 2018
e041313
Extension test fixups
TomAugspurger Aug 8, 2018
595535e
some indexing, sparse string
TomAugspurger Aug 9, 2018
7700299
passing indexing
TomAugspurger Aug 9, 2018
f1ff7da
passing pivot
TomAugspurger Aug 9, 2018
33fa6f7
broken broken broken
TomAugspurger Aug 10, 2018
40c035e
sanitize
TomAugspurger Aug 10, 2018
1d49cc7
broken broken broken
TomAugspurger Aug 10, 2018
6f4b6b6
wip
TomAugspurger Aug 13, 2018
6f037b5
working through series
TomAugspurger Aug 13, 2018
7da220e
working through series
TomAugspurger Aug 13, 2018
bfbe4ab
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 13, 2018
c5666b6
series passing
TomAugspurger Aug 13, 2018
ff6037c
more tests
TomAugspurger Aug 13, 2018
5c362ef
wip
TomAugspurger Aug 13, 2018
55cac36
wip
TomAugspurger Aug 13, 2018
c4e8784
More test
TomAugspurger Aug 13, 2018
a00f987
skip internals tests
TomAugspurger Aug 13, 2018
a6d7eac
linting
TomAugspurger Aug 13, 2018
4b4f9bd
cleanup
TomAugspurger Aug 13, 2018
82801be
cleanup
TomAugspurger Aug 13, 2018
1a149dc
cleanup
TomAugspurger Aug 13, 2018
fde19d7
remove debug code
TomAugspurger Aug 13, 2018
a7ba8f6
API: dispatch to EA.astype
TomAugspurger Aug 13, 2018
5064217
API: ExtensionDtype._is_numeric
TomAugspurger Aug 14, 2018
e31e8aa
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 14, 2018
79c8e9c
update type
TomAugspurger Aug 14, 2018
26993fe
Merge remote-tracking branch 'upstream/master' into ea-astype-dispatch
TomAugspurger Aug 14, 2018
6eeec11
py2 compat
TomAugspurger Aug 14, 2018
50de326
fixed test
TomAugspurger Aug 14, 2018
5ef1747
test fill value
TomAugspurger Aug 14, 2018
f31970c
Test nbytes
TomAugspurger Aug 14, 2018
f1b860f
explainers
TomAugspurger Aug 14, 2018
5c44275
linting
TomAugspurger Aug 14, 2018
33bc8f8
Allow concatenating with different sparse dtypes
TomAugspurger Aug 14, 2018
9bf13ad
Linting
TomAugspurger Aug 14, 2018
de1fb5b
lint
TomAugspurger Aug 14, 2018
da580cd
Wip
TomAugspurger Aug 14, 2018
88b73c3
Merge branch 'ea-astype-dispatch' into ea-sparse-2
TomAugspurger Aug 14, 2018
afde64d
Merge branch 'ea-is-numeric' into ea-sparse-2
TomAugspurger Aug 14, 2018
e603d3d
fixup 33bc8f836
TomAugspurger Aug 15, 2018
ec5eb9a
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 15, 2018
a72ee1a
Fixed DataFrame.__setitem__ for updating to sparse.
TomAugspurger Aug 15, 2018
f147635
try removing
TomAugspurger Aug 15, 2018
c35c7c2
Merge branch 'ea-astype-dispatch' into ea-sparse-2
TomAugspurger Aug 15, 2018
e159ef2
wip
TomAugspurger Aug 16, 2018
d48a8fa
Fixup
TomAugspurger Aug 16, 2018
3bcf57e
astype works
TomAugspurger Aug 16, 2018
31d401f
Squashed commit of the following:
TomAugspurger Aug 16, 2018
a4369c2
Squashed commit of the following:
TomAugspurger Aug 16, 2018
608b499
Fixed Series[sparse].to_sparse
TomAugspurger Aug 16, 2018
14e60c9
Shift works
TomAugspurger Aug 16, 2018
550f163
parametrize shift test
TomAugspurger Aug 16, 2018
821cc91
Removed bogus test
TomAugspurger Aug 16, 2018
e21ed21
Un-xfail more
TomAugspurger Aug 16, 2018
aeb8c8c
scalar take raises
TomAugspurger Aug 16, 2018
34c90ed
Move fill_value to dtyep
TomAugspurger Aug 17, 2018
2103959
Move fill_value to dtyep
TomAugspurger Aug 17, 2018
26af959
Merge branch 'ea-sparse-dtype-fill-value' into ea-sparse-2
TomAugspurger Aug 18, 2018
e5920c2
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 18, 2018
084a967
cleanup
TomAugspurger Aug 18, 2018
bb17760
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 20, 2018
dde7852
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 20, 2018
f1b4e6b
Setting fill value (but that's bad)
TomAugspurger Aug 20, 2018
6a31077
Explicit fill value
TomAugspurger Aug 20, 2018
02aa7f7
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 20, 2018
3a7ee2d
Fixed merge conflicts
TomAugspurger Aug 20, 2018
d6fe191
subdtype -> subtype
TomAugspurger Aug 20, 2018
b1ea874
subdtype -> subtype
TomAugspurger Aug 20, 2018
2213b83
Fixed pickle
TomAugspurger Aug 21, 2018
94664c4
test dtype
TomAugspurger Aug 21, 2018
e54160c
astype update
TomAugspurger Aug 21, 2018
04a2dbb
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 21, 2018
fb01d1a
more
TomAugspurger Aug 21, 2018
f78ae81
lint
TomAugspurger Aug 21, 2018
11d5b40
py2 compat
TomAugspurger Aug 21, 2018
ba70753
dtype tests
TomAugspurger Aug 21, 2018
82bab3c
explainer
TomAugspurger Aug 21, 2018
2990124
Delete things
TomAugspurger Aug 21, 2018
a9d0f17
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 22, 2018
0c52c37
NumPy 1.9 compat
TomAugspurger Aug 22, 2018
998f113
implement divmod
TomAugspurger Aug 22, 2018
38b0356
Fix broken fill value setting
TomAugspurger Aug 22, 2018
7206d94
compare with lists
TomAugspurger Aug 22, 2018
fe771b5
clean
TomAugspurger Aug 22, 2018
12e424c
fixed index ctor fail
TomAugspurger Aug 22, 2018
3bd567f
New xfail
TomAugspurger Aug 22, 2018
f816346
Handle sparse reindex
TomAugspurger Aug 22, 2018
1a1dcf4
concat mixed
TomAugspurger Aug 22, 2018
e3d9173
take note
TomAugspurger Aug 22, 2018
2715cdb
Remove test.
TomAugspurger Aug 22, 2018
4e40599
concat NA and empty
TomAugspurger Aug 22, 2018
0aa3934
dum
TomAugspurger Aug 22, 2018
a3becb6
Fix lost fill value
TomAugspurger Aug 22, 2018
5660b9a
override
TomAugspurger Aug 22, 2018
dd3cba5
Handle fill in unique
TomAugspurger Aug 23, 2018
cc65b8a
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 23, 2018
06dce5f
Faster isna
TomAugspurger Aug 23, 2018
f7351d3
Support old numpy
TomAugspurger Aug 23, 2018
2055494
clean
TomAugspurger Aug 23, 2018
f310322
Simplified setter
TomAugspurger Aug 23, 2018
0008164
Inplace not supported.
TomAugspurger Aug 23, 2018
027f6d8
compat
TomAugspurger Aug 24, 2018
c0d9875
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 24, 2018
44b218c
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 28, 2018
47fa73a
32-bit compat
TomAugspurger Aug 28, 2018
c2c489f
Lint
TomAugspurger Aug 28, 2018
3729927
Test fixups
TomAugspurger Aug 28, 2018
9ba49e1
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 29, 2018
543ac7c
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 30, 2018
f66ef6f
CI passing
TomAugspurger Aug 30, 2018
ba8fc9d
Right numpy version
TomAugspurger Aug 30, 2018
9185e33
linting
TomAugspurger Aug 30, 2018
11799ab
Try intp
TomAugspurger Aug 31, 2018
73e7626
32-bit compat
TomAugspurger Aug 31, 2018
ebece16
Doc cleanup
TomAugspurger Aug 31, 2018
7db6990
Simplify is_sparse
TomAugspurger Aug 31, 2018
be21f42
Updated factorize
TomAugspurger Sep 4, 2018
e857363
Use ABC
TomAugspurger Sep 4, 2018
d0ee038
simplify interleave_dtype
TomAugspurger Sep 4, 2018
54f4417
docstring, simplify
TomAugspurger Sep 4, 2018
2082d86
fixup supers
TomAugspurger Sep 4, 2018
f846606
Linting
TomAugspurger Sep 4, 2018
ce8e0ac
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Sep 4, 2018
1f6590e
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Sep 5, 2018
b758469
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Sep 6, 2018
f6b0924
move and fix conflict
TomAugspurger Sep 6, 2018
232518c
doc note
TomAugspurger Sep 6, 2018
e8b37da
ENH: is_homogenous
TomAugspurger Sep 20, 2018
0197e0c
BUG: Preserve dtype on homogeneous EA xs
TomAugspurger Sep 20, 2018
62326ae
asarray test
TomAugspurger Sep 20, 2018
f008c38
Fixed asarray
TomAugspurger Sep 20, 2018
88c6126
Merge remote-tracking branch 'upstream/master' into ea-xs
TomAugspurger Sep 20, 2018
5c8662e
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Sep 20, 2018
78798cf
is_homogeneous -> is_homogeneous_type
TomAugspurger Sep 20, 2018
b051424
lint
TomAugspurger Sep 20, 2018
78979b6
Squashed commit of the following:
TomAugspurger Sep 20, 2018
2333db1
Merge followup
TomAugspurger Sep 20, 2018
b41d473
Followup from merge
TomAugspurger Sep 20, 2018
d6a2479
lint
TomAugspurger Sep 20, 2018
a23c27c
Merge remote-tracking branch 'origin/ea-xs' into ea-sparse-2
TomAugspurger Sep 20, 2018
7372eb3
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Sep 26, 2018
cab8c54
handle unary ops
TomAugspurger Sep 26, 2018
52ae275
linting
TomAugspurger Sep 26, 2018
9c9b49e
compat, lint
TomAugspurger Sep 26, 2018
f5d7492
SparseSeries unary ops
TomAugspurger Sep 26, 2018
b4b4cbc
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Sep 26, 2018
bf98b9d
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Sep 26, 2018
f3d2681
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Sep 29, 2018
7d4d3ba
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Oct 4, 2018
57c03c2
splib
TomAugspurger Oct 4, 2018
0dbc33e
collections -> compat
TomAugspurger Oct 4, 2018
c217cf5
updates
TomAugspurger Oct 8, 2018
2ea7a91
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Oct 8, 2018
8f2f228
Set dtype
TomAugspurger Oct 8, 2018
c83bed7
reveret
TomAugspurger Oct 8, 2018
53e494e
clarify fillna
TomAugspurger Oct 8, 2018
627b9ce
Remove old invert
TomAugspurger Oct 8, 2018
df0293a
some cleanup
TomAugspurger Oct 8, 2018
a590418
remove redundant whatsnew
TomAugspurger Oct 9, 2018
7821f19
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Oct 9, 2018
ee26c52
Update hashing, eq
TomAugspurger Oct 9, 2018
40390f1
wip-comments
TomAugspurger Oct 11, 2018
15a164d
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Oct 11, 2018
88432c8
hashing
TomAugspurger Oct 11, 2018
3e7ec90
dtype and datetime64
TomAugspurger Oct 11, 2018
7b0a179
Updates
TomAugspurger Oct 11, 2018
20d8815
index
TomAugspurger Oct 11, 2018
3e81c69
wip
TomAugspurger Oct 11, 2018
1098a7a
quantile test
TomAugspurger Oct 11, 2018
10d204a
merge conflict
TomAugspurger Oct 11, 2018
69075d8
use is_homogenous_type
TomAugspurger Oct 11, 2018
0764baa
use assert_frame_equal
TomAugspurger Oct 11, 2018
a4a47c5
merge exp construction
TomAugspurger Oct 11, 2018
a5b6c39
API: Allow ExtensionArray.isna to be an EA
TomAugspurger Oct 11, 2018
70d8268
document and test map
TomAugspurger Oct 11, 2018
7aed79f
table formatting
TomAugspurger Oct 11, 2018
11e55aa
fixup! API: Allow ExtensionArray.isna to be an EA
TomAugspurger Oct 11, 2018
11606af
Restore subclass test
TomAugspurger Oct 11, 2018
2f73179
Revert changes to test
TomAugspurger Oct 11, 2018
1b3058a
quote
TomAugspurger Oct 11, 2018
f4ec928
fixup! API: Allow ExtensionArray.isna to be an EA
TomAugspurger Oct 11, 2018
8c67ca2
lint
TomAugspurger Oct 11, 2018
cc89ec7
COMPAT: NumPy 1.9 bool-like indexing
TomAugspurger Oct 12, 2018
3f713d4
misc. comments
TomAugspurger Oct 12, 2018
886fe03
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Oct 12, 2018
75099af
asarray on bool key for numpy compat
TomAugspurger Oct 12, 2018
731fc06
Raise for non-default values
TomAugspurger Oct 12, 2018
f91141d
groupby / reduce compat
TomAugspurger Oct 12, 2018
37a4b57
lint
TomAugspurger Oct 12, 2018
4aad8e1
fix docs
jorisvandenbossche Oct 13, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions pandas/core/sparse/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -659,6 +659,9 @@ def __getitem__(self, key):
key = np.asarray(key)

if com.is_bool_indexer(key) and len(self) == len(key):
# TODO(numpy 1.11): Remove this asarray.
# Old NumPy didn't treat array-like as boolean masks.
key = np.asarray(key)
return self.take(np.arange(len(key), dtype=np.int32)[key])
elif hasattr(key, '__len__'):
return self.take(key)
Expand Down
30 changes: 22 additions & 8 deletions pandas/core/sparse/dtype.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,9 +173,10 @@ def construct_from_string(cls, string):
'Sparse[int, 1]' SparseDtype[np.int64, 0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one now needs to be updated as the above would raise an error. But make it 'Sparse[int, 0] to show that default fill value is OK?

================ ============================

Notice that any "fill value" in `string` is ignored. The
fill from from `construct_from_string` will always be
the default fill value for the dtype.
It is not possible to specify non-default fill values
with a string. An argument like ``'SparseDtype[int, 1]'``
will raise a ``TypeError`` because the default fill value
for integers is 0.

Returns
-------
Expand All @@ -184,10 +185,19 @@ def construct_from_string(cls, string):
msg = "Could not construct SparseDtype from '{}'".format(string)
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
if string.startswith("Sparse"):
try:
sub_type = cls._parse_subtype(string)
return SparseDtype(sub_type)
sub_type, has_fill_value = cls._parse_subtype(string)
result = SparseDtype(sub_type)
except Exception:
raise TypeError(msg)
else:
msg = ("Could not construct SparseDtype from '{}'.\n\nIt "
"looks like the fill_value in the string is not "
"the default for the dtype. Non-default fill_values "
"are not supported. Use the 'SparseDtype()' "
"constructor instead.")
if has_fill_value and str(result) != string:
raise TypeError(msg.format(string))
return result
else:
raise TypeError(msg)

Expand All @@ -213,22 +223,26 @@ def _parse_subtype(dtype):
ValueError
When the subtype cannot be extracted.
"""
xpr = re.compile(r"Sparse\[(?P<subtype>[^,]*)(, )?(.*?)?\]$")
xpr = re.compile(
r"Sparse\[(?P<subtype>[^,]*)(, )?(?P<fill_value>.*?)?\]$"
)
m = xpr.match(dtype)
has_fill_value = False
if m:
subtype = m.groupdict()['subtype']
has_fill_value = m.groupdict()['fill_value'] or has_fill_value
elif dtype == "Sparse":
subtype = 'float64'
else:
raise ValueError("Cannot parse {}".format(dtype))
return subtype
return subtype, has_fill_value

@classmethod
def is_dtype(cls, dtype):
dtype = getattr(dtype, 'dtype', dtype)
if (isinstance(dtype, compat.string_types) and
dtype.startswith("Sparse")):
sub_type = cls._parse_subtype(dtype)
sub_type, _ = cls._parse_subtype(dtype)
dtype = np.dtype(sub_type)
elif isinstance(dtype, cls):
return True
Expand Down
5 changes: 4 additions & 1 deletion pandas/tests/extension/arrow/bool.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,10 @@ def _reduce(self, method, skipna=True, **kwargs):
else:
arr = self

op = getattr(arr, method)
try:
op = getattr(arr, method)
except AttributeError:
raise TypeError
return op(**kwargs)

def any(self, axis=0, out=None):
Expand Down
5 changes: 5 additions & 0 deletions pandas/tests/extension/arrow/test_bool.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,11 @@ def test_from_dtype(self, data):


class TestReduce(base.BaseNoReduceTests):
def test_reduce_series_boolean(self):
pass


class TestReduceBoolean(base.BaseBooleanReduceTests):
pass


Expand Down
6 changes: 6 additions & 0 deletions pandas/tests/sparse/test_array.py
Original file line number Diff line number Diff line change
Expand Up @@ -568,6 +568,12 @@ def _checkit(i):
_checkit(i)
_checkit(-i)

def test_getitem_arraylike_mask(self):
arr = SparseArray([0, 1, 2])
result = arr[[True, False, True]]
expected = SparseArray([0, 2])
tm.assert_sp_array_equal(result, expected)

def test_getslice(self):
result = self.arr[:-3]
exp = SparseArray(self.arr.values[:-3])
Expand Down
13 changes: 12 additions & 1 deletion pandas/tests/sparse/test_dtype.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import numpy as np

import pandas as pd
import pandas.util.testing as tm
from pandas.core.sparse.api import SparseDtype


Expand Down Expand Up @@ -127,5 +128,15 @@ def test_hash_equal(a, b, expected):
('Sparse[datetime64[ns], 0]', 'datetime64[ns]'),
])
def test_parse_subtype(string, expected):
subtype = SparseDtype._parse_subtype(string)
subtype, _ = SparseDtype._parse_subtype(string)
assert subtype == expected
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is not testing whether the fill_value is parsed correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. We don't even attempt to parse the fill value from a string like Sparse[...]. Using the string-form will always give you the default for that dtype.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that was not clear to me. I think it can be very confusing that that part is ignored. Not sure what the best way is here. Then I maybe like the previous behaviour of raising an error for that case more, although that can also be annoying for the default dtype since this is the string repr of the dtype ...

Or maybe we could check that the str repr of the parsed dtype is equal to the passed string? (to allow cases with the default fill_value, but detect other fill_values that would be ignored?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see that it can be confusing. Checking that the repr matches seems reasonable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something you plan to do here, or in a follow-up (to know if I can merge :-))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I meant to do that but was sidetracked. We should do it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the error message seem OK?

In [2]: pd.SparseDtype.construct_from_string("Sparse[int, 1]")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-3d2316f339ca> in <module>
----> 1 pd.SparseDtype.construct_from_string("Sparse[int, 1]")

~/sandbox/pandas/pandas/core/sparse/dtype.py in construct_from_string(cls, string)
    196                        "constructor instead.")
    197                 if str(result) != string:
--> 198                     raise TypeError(msg.format(string))
    199                 return result
    200         else:

TypeError: Could not construct SparseDtype from 'Sparse[int, 1]'.

It looks like the fill_value in the string is not the default for the dtype. Non-default fill_values are not supported. Use the 'SparseDtype()' constructor instead.



@pytest.mark.parametrize("string", [
"Sparse[int, 1]",
"Sparse[float, 0.0]",
"Sparse[bool, True]",
])
def test_construct_from_string_fill_value_raises(string):
with tm.assert_raises_regex(TypeError, 'fill_value in the string is not'):
SparseDtype.construct_from_string(string)