Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SparseArray is an ExtensionArray #22325
SparseArray is an ExtensionArray #22325
Changes from all commits
ee187eb
32c1372
b265659
8dfc898
9c57725
13952ab
7a6e7fa
1016af1
072abec
0ad61cc
5b0b524
224744a
620b5fb
164c401
65f83d6
0b3c682
69a5d13
f2b5862
fa80fc5
3f20890
484adb0
1df1190
4246ac4
a849699
c4da319
a2f158f
26b671a
375e160
0a37050
3c2cb0f
27c6378
e52dae9
b6d8430
640c4a5
6b61597
427234f
e055629
a79359c
de3aa71
21f4ee3
c1e594a
dc7f93f
eb09d21
7dcf4b2
b39658a
a8b76bd
e041313
595535e
7700299
f1ff7da
33fa6f7
40c035e
1d49cc7
6f4b6b6
6f037b5
7da220e
bfbe4ab
c5666b6
ff6037c
5c362ef
55cac36
c4e8784
a00f987
a6d7eac
4b4f9bd
82801be
1a149dc
fde19d7
a7ba8f6
5064217
e31e8aa
79c8e9c
26993fe
6eeec11
50de326
5ef1747
f31970c
f1b860f
5c44275
33bc8f8
9bf13ad
de1fb5b
da580cd
88b73c3
afde64d
e603d3d
ec5eb9a
a72ee1a
f147635
c35c7c2
e159ef2
d48a8fa
3bcf57e
31d401f
a4369c2
608b499
14e60c9
550f163
821cc91
e21ed21
aeb8c8c
34c90ed
2103959
26af959
e5920c2
084a967
bb17760
dde7852
f1b4e6b
6a31077
02aa7f7
3a7ee2d
d6fe191
b1ea874
2213b83
94664c4
e54160c
04a2dbb
fb01d1a
f78ae81
11d5b40
ba70753
82bab3c
2990124
a9d0f17
0c52c37
998f113
38b0356
7206d94
fe771b5
12e424c
3bd567f
f816346
1a1dcf4
e3d9173
2715cdb
4e40599
0aa3934
a3becb6
5660b9a
dd3cba5
cc65b8a
06dce5f
f7351d3
2055494
f310322
0008164
027f6d8
c0d9875
44b218c
47fa73a
c2c489f
3729927
9ba49e1
543ac7c
f66ef6f
ba8fc9d
9185e33
11799ab
73e7626
ebece16
7db6990
be21f42
e857363
d0ee038
54f4417
2082d86
f846606
ce8e0ac
1f6590e
b758469
f6b0924
232518c
e8b37da
0197e0c
62326ae
f008c38
88c6126
5c8662e
78798cf
b051424
78979b6
2333db1
b41d473
d6a2479
a23c27c
7372eb3
cab8c54
52ae275
9c9b49e
f5d7492
b4b4cbc
bf98b9d
f3d2681
7d4d3ba
57c03c2
0dbc33e
c217cf5
2ea7a91
8f2f228
c83bed7
53e494e
627b9ce
df0293a
a590418
7821f19
ee26c52
40390f1
15a164d
88432c8
3e7ec90
7b0a179
20d8815
3e81c69
1098a7a
10d204a
69075d8
0764baa
a4a47c5
a5b6c39
70d8268
7aed79f
11e55aa
11606af
2f73179
1b3058a
f4ec928
8c67ca2
cc89ec7
3f713d4
886fe03
75099af
731fc06
f91141d
37a4b57
4aad8e1
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember if I asked before, but do we actually want this?
I don't think the above makes much sense, so not sure this is good to allow.
For me it seems logical to restrict the fill_value of the same dtype as the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The somewhat strange thing is that on master we do allow that in the SparseArray constructor
I don't have strong opinions here, other than that people shouldn't be setting
.fill_value
in the first place. The new way to do it is.astype(SparseDtype(self.dtype.subtype, fill_value))
. I'm happy to deprecate thisThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree the fill type should match the dtype but since missing value support is allowed here it is prob ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really? we allow this. I agree this would be ok, but is reasonably tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define reasonable :)
I'm reasonably sure there are places in pandas where we assume we have an ndarray, but may get an ExtensionArray instead.
The common case of
.isna().any()
is well tested though, I think.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, if you create the masks in other ways than
.isna()
, eg with a comparison likedf['sparse_column'] == 1
, you get exactly the same issue. Which means we basically have to support indexing with boolean sparse masks anyway I think.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really, we have support for this? again i agree this is a nice feature, but we are decreasing support generally for sparse, so not anxious to advertise this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably have an Indexing EA mixin that implementes these as NotImplemented (so once can subclass)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such an indexing mixing might be useful, but how is this related to the line above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related to my comment above. cannot is_sparse not simply check if its an EA and if it has a Sparse Dtype?
then you simply need to pass the
b.values
here, yes?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll give that a shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here, its not obvious what you are doing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can obj be a SparseFrame here? is this tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a comment of mine may have been lost.
This is hit in several places (e.g.
pandas/tests/sparse/test_combine_concat.py::TestSparseDataFrameConcat::test_concat
).What part can I clarify here?