Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: more testing of indexing with sparse #4400

Closed
njheyu opened this issue Jul 29, 2013 · 6 comments
Closed

TST: more testing of indexing with sparse #4400

njheyu opened this issue Jul 29, 2013 · 6 comments
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Sparse Sparse Data Type Testing pandas testing functions or related to the test suite
Milestone

Comments

@njheyu
Copy link

njheyu commented Jul 29, 2013

related is #6076 (loc not working either)

It would be nice if common operations like indexing/slicing is available for sparse dataframes as for their dense counterparts.

Thanks!

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: A = np.zeros((3,4))

In [4]: A[:] = np.nan

In [5]: A[2,3] = 1

In [6]: 

In [6]: Y = pd.DataFrame(A)

In [7]: print Y.iloc[2,3]   # print 1.0
1.0

In [8]: sY = Y.to_sparse()

In [9]: print sY.iloc[2,3]
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-9-0674f68ef869> in <module>()
----> 1 print sY.iloc[2,3]

/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/indexing.pyc in __getitem__(self, key)
    667     def __getitem__(self, key):
    668         if type(key) is tuple:
--> 669             return self._getitem_tuple(key)
    670         else:
    671             return self._getitem_axis(key, axis=0)

/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_tuple(self, tup)
    774                 continue
    775 
--> 776             retval = getattr(retval,self.name)._getitem_axis(key, axis=i)
    777 
    778         return retval

/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_axis(self, key, axis)
    803                 raise ValueError("Cannot index by location index with a non-integer key")
    804 
--> 805             return self._get_loc(key,axis=axis)
    806 
    807     def _convert_to_indexer(self, obj, axis=0):

/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/indexing.pyc in _get_loc(self, key, axis)
     61 
     62     def _get_loc(self, key, axis=0):
---> 63         return self.obj._ixs(key, axis=axis)
     64 
     65     def _slice(self, obj, axis=0, raise_on_error=False):

/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.pyc in _ixs(self, i, axis, copy)
   1880                         new_values = self._data.fast_2d_xs(i, copy=copy)
   1881                     except:
-> 1882                         new_values = self._data.fast_2d_xs(i, copy=True)
   1883                     return Series(new_values, index=self.columns,
   1884                                   name=self.index[i])

AttributeError: '_SparseMockBlockManager' object has no attribute 'fast_2d_xs'
@jreback jreback modified the milestones: 0.15.0, 0.14.0 Feb 15, 2014
@wholmgren
Copy link
Contributor

I think that this was fixed at some point and can be closed.

In [4]: data = np.zeros((2,2))

In [5]: data[:] = np.nan

In [6]: data[1,1] = 1

In [7]: df = pd.DataFrame(data)

In [9]: df.ix[:,1]
Out[9]: 
0   NaN
1     1
Name: 1, dtype: float64

In [10]: df.to_sparse().ix[:,1]
Out[10]: 
0   NaN
1     1
Name: 1, dtype: float64
BlockIndex
Block locations: array([1], dtype=int32)
Block lengths: array([1], dtype=int32)

In [24]: df.to_sparse().iloc[1,1]
Out[24]: 1.0

In [25]: df.to_sparse().iloc[1,0]
Out[25]: nan
In [3]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.1.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-431.17.1.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.15.2
nose: 1.3.4
Cython: 0.20.2
numpy: 1.9.1
scipy: 0.14.1

@jreback
Copy link
Contributor

jreback commented Jan 16, 2015

ok. I am not 100% sure that this is actually tested though. Can you confirm? If it is we want to add this issue reference, and if not put some tests in place. thxs.

@wholmgren
Copy link
Contributor

I searched test_sparse for indexers. I was surprised at how few cases I found, but maybe I should be looking elsewhere. Here's a summary:

.iloc only applied to a DataFrame, not a SparseDataFrame

.loc does not occur.

.ix is used on a SparseDataFrame in a few spots e.g. L270, L1051, and 1112

.isin is tested on SparseDataFrame.

.iat and .at do not occur.

Maybe inheriting from NDFrame makes this good enough? If needed, I can make a PR with some additional tests.

@jreback
Copy link
Contributor

jreback commented Jan 16, 2015

you should run the tests with -v and see

as since it inhersts tests from test_frame.py as well (and test_series).

@wholmgren
Copy link
Contributor

It appears to me that the tests inherited from test_frame.SafeForSparse only execute .ix. TestSparseSeries only inherits test_series.CheckNameIntegration

From what I can gather, it seems that the "do it right" solution might be to reuse some of the test_indexing machinery in test_libsparse. I don't think I'm the right person for that job at this point in time.

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@jreback jreback added Testing pandas testing functions or related to the test suite Difficulty Intermediate labels Apr 3, 2016
@jreback jreback changed the title ENH: add indexing to sparse pandas DataFrame TST: more testing of indexing with sparse Apr 3, 2016
@jreback
Copy link
Contributor

jreback commented Apr 3, 2016

after #12779, this issue is adding more tests for .iat/.at

@jreback jreback modified the milestones: 0.18.1, Next Major Release Apr 3, 2016
@kawochen kawochen mentioned this issue Apr 3, 2016
18 tasks
jreback pushed a commit that referenced this issue Apr 7, 2016
related to #4400

Added more tests for sparse indexing.
`SparseArray.take`` has optimized logic to omit dense ``np.ndarray`` creation.
SparseSeires.iloc` can work with negative indices.
Made ``SparseArray.take`` to handle negative indices as the same rule as ``Index`` (#12676)

Author: sinhrks <sinhrks@gmail.com>

Closes #12796 from sinhrks/sparse_test_at and squashes the following commits:

df1f056 [sinhrks] ENH/PERF SparseArray.take indexing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Sparse Sparse Data Type Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants