Not possible to concatenate DataFrames with sparse data without copying #30316

MezentsevIlya · 2019-12-18T06:04:38Z

Code Sample

a = pd.DataFrame([0, 0, 1, 0, 2], columns=['a'])
a = a.astype(pd.SparseDtype('float', 0.0))
b = pd.DataFrame([0, 2, 3, 0, 2], columns=['b'])
b = b.astype(pd.SparseDtype('float', 0.0))

pd.concat([a, b], axis=1, copy=False)

Problem description

Hello! Since SparseArray is no longer a subclass of numpy.ndarray (pandas>=0.24.0) it is not possible to concatenate dataframes with SparseArray dtype using pandas.concat.
The problem is SparseArray has not attribute view which is used in pandas.core.internals.managers.py at function concatenate_block_managers when parameter copy is False.

Actual Output

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-d61b58c595ad> in <module>
----> 1 pd.concat([a, b], axis=1, copy=False)

~/.conda/envs/pandas_dev/lib/python3.7/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    256     )
    257 
--> 258     return op.get_result()
    259 
    260 

~/.conda/envs/pandas_dev/lib/python3.7/site-packages/pandas/core/reshape/concat.py in get_result(self)
    471 
    472             new_data = concatenate_block_managers(
--> 473                 mgrs_indexers, self.new_axes, concat_axis=self.axis, copy=self.copy
    474             )
    475             if not self.copy:

~/.conda/envs/pandas_dev/lib/python3.7/site-packages/pandas/core/internals/managers.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
   2044                 values = values.copy()
   2045             elif not copy:
-> 2046                 values = values.view()
   2047             b = b.make_block_same_class(values, placement=placement)
   2048         elif is_uniform_join_units(join_units):

AttributeError: 'SparseArray' object has no attribute 'view'

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.7.5.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-957.1.3.el7.x86_64
machine : x86_64
processor :
byteorder : little
LC_ALL : C.UTF-8
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.3
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.2.post20191203
Cython : 0.29.14
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.10.2
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.4.0
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2019-12-18T07:46:33Z

This seems to be fixed on master (to be released as 1.0 shortly):

In [4]: a = pd.DataFrame([0, 0, 1, 0, 2], columns=['a']) 
   ...: a = a.astype(pd.SparseDtype('float', 0.0)) 
   ...: b = pd.DataFrame([0, 2, 3, 0, 2], columns=['b']) 
   ...: b = b.astype(pd.SparseDtype('float', 0.0)) 
   ...:  
   ...: pd.concat([a, b], axis=1, copy=False)   
Out[4]: 
     a    b
0  0.0  0.0
1  0.0  2.0
2  1.0  3.0
3  0.0  0.0
4  2.0  2.0

(and I also see that error on 0.25.3)

I am not fully sure if this was fixed purposefully, because if not, it would be good to still add a test for this.

jorisvandenbossche · 2019-12-18T07:52:22Z

Going to close this as a duplicate of #20756, which described the same issue but in general for extension arrays.

PRs to add tests is still welcome!

MezentsevIlya changed the title ~~Not possible to concatenate DataFrames with sparse data~~ Not possible to concatenate DataFrames with sparse data without copying Dec 18, 2019

MezentsevIlya mentioned this issue Dec 18, 2019

SparseArray is an ExtensionArray #22325

Merged

4 tasks

jorisvandenbossche added ExtensionArray Extending pandas with custom dtypes or arrays. Needs Tests Unit test(s) needed to prevent regressions Sparse Sparse Data Type labels Dec 18, 2019

jorisvandenbossche added this to the Contributions Welcome milestone Dec 18, 2019

jorisvandenbossche closed this as completed Dec 18, 2019

jorisvandenbossche removed this from the Contributions Welcome milestone Dec 18, 2019

jorisvandenbossche removed the Needs Tests Unit test(s) needed to prevent regressions label Dec 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not possible to concatenate DataFrames with sparse data without copying #30316

Not possible to concatenate DataFrames with sparse data without copying #30316

MezentsevIlya commented Dec 18, 2019

INSTALLED VERSIONS

jorisvandenbossche commented Dec 18, 2019 •

edited

Loading

jorisvandenbossche commented Dec 18, 2019

Not possible to concatenate DataFrames with sparse data without copying #30316

Not possible to concatenate DataFrames with sparse data without copying #30316

Comments

MezentsevIlya commented Dec 18, 2019

Code Sample

Problem description

Actual Output

Output of pd.show_versions()

INSTALLED VERSIONS

jorisvandenbossche commented Dec 18, 2019 • edited Loading

jorisvandenbossche commented Dec 18, 2019

Output of `pd.show_versions()`

jorisvandenbossche commented Dec 18, 2019 •

edited

Loading