Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame doesn't define density #19028

Closed
hexgnu opened this issue Jan 1, 2018 · 5 comments
Closed

DataFrame doesn't define density #19028

hexgnu opened this issue Jan 1, 2018 · 5 comments
Labels
Enhancement Sparse Sparse Data Type

Comments

@hexgnu
Copy link
Contributor

hexgnu commented Jan 1, 2018

Code Sample, a copy-pastable example if possible

This is more of an open question whether this should be implemented or not. If a DataFrame has a SparseSeries inside of it shouldn't 'density' also be defined?

df = pd.DataFrame({'a': pd.SparseSeries([1,0,0,1])})
df.a.density #=> 0.5
df.density #=> Throws error I would think this should be 0.5

sdf = pd.SparseDataFrame({'a': pd.SparseSeries([1,0,0,1])})
sdf.density #=> 0.5
sdf.a.density #=> 0.5

Problem description

Basically this is a consistency problem between SparseDataFrame and DataFrame. Since DataFrame's can contain SparseSeries it should probably define 'density' as well.

Expected Output

I would expect a DataFrame to have density defined. If it is dense it would just be 1.0.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.16-202.fc26.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.3
pytest: 3.3.1
pip: 9.0.1
setuptools: 28.8.0
Cython: 0.27.3
numpy: 1.13.1
scipy: 0.19.1
xarray: 0.10.0
IPython: 6.1.0
sphinx: 1.6.5
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.0.2
openpyxl: 2.4.9
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0b10
sqlalchemy: 1.1.15
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.9.6
s3fs: 0.1.2
pandas_gbq: None
pandas_datareader: None

@hexgnu
Copy link
Contributor Author

hexgnu commented Jan 1, 2018

See issue #16874

@jreback
Copy link
Contributor

jreback commented Jan 1, 2018

yeah suppose we could define this function on DataFrame itself. maybe want to rename this (e.g. deprecate .density()) to avoid namespace pollution, or better yet have a .sparse namespace``

@jreback jreback added API Design Numeric Operations Arithmetic, Comparison, and Logical operations Sparse Sparse Data Type labels Jan 1, 2018
@hexgnu
Copy link
Contributor Author

hexgnu commented Jan 5, 2018

Another issue from #16874 is that to_coo won't be defined on a DataFrame either.

I feel a good piece of work is to go through all of the differences and at least document then or implement them if you can. to_coo doesn't seem like it should be implemented imho.

@TomAugspurger
Copy link
Contributor

This should be an attribute of the .sparse accessor.

@mroeschke mroeschke removed API Design Numeric Operations Arithmetic, Comparison, and Logical operations labels Jun 12, 2021
@lithomas1
Copy link
Member

lithomas1 commented Jul 30, 2021

This is implemented already in the sparse accessor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Sparse Sparse Data Type
Projects
None yet
Development

No branches or pull requests

5 participants