-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Sparse int64 and bool dtype support enhancement #13849
Conversation
Current coverage is 85.27% (diff: 98.63%)@@ master #13849 diff @@
==========================================
Files 139 139
Lines 50511 50523 +12
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43071 43083 +12
Misses 7440 7440
Partials 0 0
|
3ab5f3e
to
d500761
Compare
b49c1c8
to
4a7c84b
Compare
@sinhrks getting tons of warnings compiling on windows....all the same
|
6c4e0ee
to
4eacbec
Compare
c334402
to
21861f0
Compare
Sorry, not familiar with sparse. But: using object dtype, does it work enough to use it for certain cases? If yes, I would not remove it. |
I think object dtype can be used in some cases, but not fully sure as it is not tested well. Not remove ATM and add more tests to clarify (on another PR). #13110 should be closed. Added whatsnew. |
@@ -17,6 +17,7 @@ Highlights include: | |||
- ``.rolling()`` are now time-series aware, see :ref:`here <whatsnew_0190.enhancements.rolling_ts>` | |||
- pandas development api, see :ref:`here <whatsnew_0190.dev_api>` | |||
- ``PeriodIndex`` now has its own ``period`` dtype. see ref:`here <whatsnew_0190.api.perioddtype>` | |||
- Sparse now supports other ``int`` and ``bool`` dtypes, see :ref:`here <whatsnew_0190.sparse>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would leave out other
Disclaimer: I never used sparse or am familiar with the implementation (so my excuses if it is a stupid or naive question), but I quickly looked at the PR and have the following question. Previously, for integer and boolean serieses, the 0 or False values were regarded as actual values, not an indication of 'not a value' in the sparse series. Isn't this a big change? (I don't know how much you could use it before this PR to be a problem) |
OK, so probably my question should be categorized in the naive category :-) |
|
||
Sparse data should have the same dtype as its dense representation. Currently, | ||
``float64``, ``int64`` and ``bool`` dtypes are supported. Depending on the original | ||
dtype, ``fill_value`` default changes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a note here somewhere that for int and bool this was only added from 0.19 ?
joris your example already works you can have any values u want as actual values (both True and False); the fill value is for the missing value indicator when I need to densify (it's the default) so this is not a conceptual change at all just a change to keep dtype consistency |
@jreback I was looking at the |
@jreback This PR for the rest OK to merge for you, Jeff? (it's closing a lot of issues for 0.19.0 :-)) |
c040583
to
38c6661
Compare
@sinhrks Can you update the docstrings for SparseDataFrame, SparseSeries and SparseArray? They all still mention the fact that only floats are supported or that nan is the default fill value. |
@sinhrks Thanks a lot! |
@sinhrks appveyor started failing (some int dtype issues):
|
@jorisvandenbossche thx for pointing out, will fix. |
git diff upstream/master | flake8 --diff
Currently, sparse doesn't support
int64
andbool
dtypes actually. Whenint
orbool
values are passed, it is coerced tofloat64
ifdtype
kw is not explicitly specified.on current master
after this PR
The created data should have the
dtype
of passed values (as the same as normalSeries
).Also,
fill_value
is automatically specified according to the following rules (becausenp.nan
cannot appear inint
orbool
dtype):Basic rule: sparse
dtype
must not be changed when it is converted to dense.sparse_index
is specified and data has a hole (missing values):fill_value
is np.nandtype
isfloat64
orobject
(which can store bothdata
andfill_value
)sparse_index
is None (all values are provided viadata
, no missing values)fill_value
is not explicitly passed, following default will be used depending on its dtype.float
:np.nan
int
:0
bool
:False