Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Slicing subclasses of SparseDataFrames. #13787

Closed
wants to merge 12 commits into from
Closed

BUG: Slicing subclasses of SparseDataFrames. #13787

wants to merge 12 commits into from

Conversation

sstanovnik
Copy link
Contributor

@sstanovnik sstanovnik commented Jul 25, 2016

  • 1 test added & passed
  • passes git diff upstream/master | flake8 --diff

This changes SparseDataFrame to use proper subclassing functionality so slicing of subclasses of SparseDataFrame works. Example of a failure that this PR fixes:

import pandas as pd

class DenseSubclassDF(pd.DataFrame):
    _constructor = property(lambda self: DenseSubclassDF)
    _constructor_sliced = property(lambda self: DenseSubclassS)

class DenseSubclassS(pd.Series):
    _constructor = property(lambda self: DenseSubclassS)
    _constructor_expanddim = property(lambda self: DenseSubclassDF)

class SparseSubclassDF(pd.SparseDataFrame):
    _constructor = property(lambda self: SparseSubclassDF)
    _constructor_sliced = property(lambda self: SparseSubclassS)

class SparseSubclassS(pd.SparseSeries):
    _constructor = property(lambda self: SparseSubclassS)
    _constructor_expanddim = property(lambda self: SparseSubclassDF)

ddf = DenseSubclassDF([[1,2,3], [4,5,6], [7,8,9]])
sdf = SparseSubclassDF([[1,2,3], [4,5,6], [7,8,9]])

print(type(ddf.iloc[0]))  # <class '__main__.DenseSubclassS'>
print(type(ddf.iloc[:2]))  # <class '__main__.DenseSubclassDF'>
print(type(ddf[:2]))  # <class '__main__.DenseSubclassDF'>

# sparse doesn't preserve types
print(type(sdf.iloc[0]))  # <class '__main__.SparseSubclassS'>
print(type(sdf.iloc[:2]))  # <class 'pandas.sparse.frame.SparseDataFrame'>
print(type(sdf[:2]))  # <class 'pandas.sparse.frame.SparseDataFrame'>

@sstanovnik sstanovnik changed the title [FIX] Slicing subclasses of SparseDataFrames. BUG: Slicing subclasses of SparseDataFrames. Jul 25, 2016
@@ -373,7 +373,9 @@ def _slice(self, slobj, axis=0, kind=None):
new_index = self.index
new_columns = self.columns[slobj]

return self.reindex(index=new_index, columns=new_columns)
return self._constructor(data=self.reindex(index=new_index,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try not to use \, instead:

return self._constructor(
          data=self.reindex(index=new_index, 
                                        columns=new_columns)).__finalize__(self)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we fix .reindex to return subclass?

@jreback
Copy link
Contributor

jreback commented Jul 25, 2016

is there a specific issue for this?

@sinhrks pls have a look

@jreback jreback added Sparse Sparse Data Type Compat pandas objects compatability with Numpy or Python functions labels Jul 25, 2016
@codecov-io
Copy link

codecov-io commented Jul 25, 2016

Current coverage is 85.25% (diff: 100%)

Merging #13787 into master will increase coverage by <.01%

@@             master     #13787   diff @@
==========================================
  Files           140        140          
  Lines         50455      50471    +16   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43014      43031    +17   
+ Misses         7441       7440     -1   
  Partials          0          0          

Powered by Codecov. Last update 59f2557...a7fb24f

@sstanovnik
Copy link
Contributor Author

Thank you for your comments, I made changes according to them:

  • .reindex didn't need to be adapted, the problem was _reindex_columns which explicitly instantiated a SparseDataFrame. I changed all such calls to use _constructor.
  • The thing to check in testing.py is that I added obj as a kwarg to panel checks to conform to what the series and frame ones look like. Previously, obj was referenced in them but the one that was used was a leftover from the for loop around line 2600.
  • There is no issue that handles this problem, the closest one is the sparse master issue BUG: Sparse master issue #10627. Reference that one or create a new one?


Parameters
----------
left : DataFrame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice docstring:) SparseDataFrame.

@kawochen kawochen mentioned this pull request Jul 26, 2016
18 tasks
@sinhrks
Copy link
Member

sinhrks commented Jul 26, 2016

@sstanovnik thx for update! i linked the PR from #10627, so new issue is not needed unless there is something which is not fixid in this PR.

@sstanovnik
Copy link
Contributor Author

Should I also squash the commits?

@sinhrks
Copy link
Member

sinhrks commented Jul 26, 2016

not mandatory on your side. will be done during merge.

@sstanovnik
Copy link
Contributor Author

Typo fixed, rebased.

@sstanovnik
Copy link
Contributor Author

Two old pickling tests failed: see this build. I didn't see a nice way out, so I made an exception for the deprecated SparseTimeSeries. This may not be what you want and needs review.

@sinhrks
Copy link
Member

sinhrks commented Jul 27, 2016

@sstanovnik yeah SparseTimeSeries is deprecated, you can switch validation option depending on the version, like:

@@ -210,3 +210,25 @@ def test_subclass_align_combinations(self):
tm.assert_series_equal(res1, exp2)
tm.assertIsInstance(res2, tm.SubclassedDataFrame)
tm.assert_frame_equal(res2, exp1)

def test_subclass_sparse_slice(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have lots of changes, but want to see if we can have a test for each, can you audit these changes and add tests if needed.

@sstanovnik
Copy link
Contributor Author

I added additional tests to check other subclassing changes I made.

@jreback
Copy link
Contributor

jreback commented Jul 29, 2016

lgtm; though I think the read_pickle assertion can be done in a clearer way (use sep function). @sinhrks ?

@@ -44,8 +44,15 @@ def compare_element(self, result, expected, typ, version=None):
return

if typ.startswith('sp_'):
# SparseTimeSeries deprecated in 0.17.0
if (typ == "sp_series" and version and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_pickle can use object-specific method if it is defined. Defining compare_sp_series_ts (like other compare_xxx method) should allow test_pickle to use the method for sparse time series. Current impl skips type check of all sparse series.

NOTE: method name is derived from pickle key:

@jreback
Copy link
Contributor

jreback commented Jul 29, 2016

@sstanovnik looks good. @sinhrks anything further?

@jreback jreback added this to the 0.19.0 milestone Jul 29, 2016
@sinhrks
Copy link
Member

sinhrks commented Jul 29, 2016

None. thanks for all the effort, @sstanovnik !

@jreback
Copy link
Contributor

jreback commented Aug 1, 2016

@sstanovnik lgtm. pls add a whatsnew note (put in API changes). ping when green.

Use proper subclassing behaviour so subclasses work properly: this fixes
an issue where a multi-element slice of a subclass of SparseDataFrame
returned the SparseDataFrame type instead of the subclass type.
@sstanovnik
Copy link
Contributor Author

ping

@jreback jreback closed this in a7f7e1d Aug 2, 2016
@jreback
Copy link
Contributor

jreback commented Aug 2, 2016

thanks @sstanovnik

@jreback jreback mentioned this pull request Aug 2, 2016
4 tasks
@sstanovnik sstanovnik deleted the fix-sparse-slice branch August 4, 2016 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants