Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: collect ops dispatch functions in one place, try to de-duplicate SparseDataFrame methods #23060

Merged
merged 17 commits into from
Oct 28, 2018

Conversation

jbrockmendel
Copy link
Member

No description provided.

@pep8speaks
Copy link

Hello @jbrockmendel! Thanks for submitting the PR.

@codecov
Copy link

codecov bot commented Oct 9, 2018

Codecov Report

Merging #23060 into master will not change coverage.
The diff coverage is 95.65%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #23060   +/-   ##
=======================================
  Coverage   92.16%   92.16%           
=======================================
  Files         166      166           
  Lines       51224    51224           
=======================================
  Hits        47212    47212           
  Misses       4012     4012
Flag Coverage Δ
#multiple 90.6% <95.65%> (ø) ⬆️
#single 42.23% <26.08%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/sparse/frame.py 94.86% <92.3%> (ø) ⬆️
pandas/core/ops.py 94.24% <97.67%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 62a15fa...c431373. Read the comment docs.

@jbrockmendel
Copy link
Member Author

Woops, accidentally pushed some unrelated commits collecting arithmetic tests.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. just a couple of questions / comments.

"""
# Note: we use iloc to access columns for compat with cases
# with non-unique columns.
import pandas.core.computation.expressions as expressions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be imported at the top?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure, but I think this is a run-time import to make import pandas as pd faster

if own_default == other_default:
# TOOD: won't this evaluate as False if both are np.nan?
fill_value = own_default
elif np.isnan(own_default) and not np.isnan(other_default):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these be isna checks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I thought so, but the module-level docstring says only float64 is supported, so I kept the behavior as-is. I think the overall takeaway is that this isn't especially well-maintained, and we should all look forward to Sparse EA.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be amendednow that that Sparse EA is here? (followup ok too)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend holding off on changing it.

  1. SparseDataFrame may be going away, so why bother.
  2. We may have to change the default_fill_value if we want its type to match that of sp_values (Require the dtype of SparseArray.fill_value and sp_values.dtype to match #23124 (comment))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thanks.

'floatcol': np.random.randn(10),
'stringcol': list(tm.rands(10))})
df.loc[np.random.rand(len(df)) > 0.5, 'dates2'] = pd.NaT
ops = {'gt': 'lt', 'lt': 'gt', 'ge': 'le', 'le': 'ge', 'eq': 'eq',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should parameterize if you can

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, the point of collecting these arithmetic tests is to parametrize/fixturize and especially de-duplicate them in an upcoming pass.

# DataFrame
assert df.eq(df).values.all()
assert not df.ne(df).values.any()
for op in ['eq', 'ne', 'gt', 'lt', 'ge', 'le']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs paramaterization!

with tm.assert_raises_regex(ValueError, msg):
f(ndim_5)

# Series
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull lthis out to a separatate, parameterized test (future PR is ok for these, though since you are moving around, maybe better here)

lambda x: tm.makeFloatSeries(),
True)
])
@pytest.mark.parametrize('opname', ['add', 'sub', 'mul', 'floordiv',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally switch to our operators fixture

@jreback jreback added Numeric Operations Arithmetic, Comparison, and Logical operations Sparse Sparse Data Type Clean labels Oct 11, 2018
# == and !=, inequalities should raise
result = x == y
expected = pd.DataFrame({col: x[col] == y[col]
for col in x.columns},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parameterize this (next pass ok)

@jreback jreback added this to the 0.24.0 milestone Oct 24, 2018
@jbrockmendel
Copy link
Member Author

If it will help, I can separate out the unrelated test parts of this. There is a bunch of test cleanup to do, and already a healthy number of test-touching PRs in play.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment, but can be a followup, ping on green.

if own_default == other_default:
# TOOD: won't this evaluate as False if both are np.nan?
fill_value = own_default
elif np.isnan(own_default) and not np.isnan(other_default):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be amendednow that that Sparse EA is here? (followup ok too)

@jbrockmendel
Copy link
Member Author

can this be amendednow that that Sparse EA is here? (followup ok too)

@TomAugspurger any idea about this? I have no clue

@jreback
Copy link
Contributor

jreback commented Oct 26, 2018

also happy to merge this and followup later on sparse refactorings (prob better).

@jbrockmendel
Copy link
Member Author

Ping

@jreback
Copy link
Contributor

jreback commented Oct 28, 2018

can you rebase. the isort is playing havoc :>

@jreback jreback merged commit b9e2278 into pandas-dev:master Oct 28, 2018
@jreback
Copy link
Contributor

jreback commented Oct 28, 2018

thanks @jbrockmendel nice as always!

@jbrockmendel jbrockmendel deleted the failing2 branch October 28, 2018 16:17
thoo added a commit to thoo/pandas that referenced this pull request Oct 30, 2018
…y_tests

* repo_org/master: (52 commits)
  ENH: Allow rename_axis to specify index and columns arguments  (pandas-dev#20046)
  STY: proposed isort settings [ci skip] [skip ci] [ciskip] [skipci] (pandas-dev#23366)
  MAINT: Remove extraneous test.parquet file
  CLN: Follow-up comments to pandas-devgh-23392 (pandas-dev#23401)
  BUG GH23282 calling min on series of NaT returns NaT (pandas-dev#23289)
  unpin openpyxl (pandas-dev#23361)
  REF: collect ops dispatch functions in one place, try to de-duplicate SparseDataFrame methods (pandas-dev#23060)
  CLN: Remove pandas.tools module (pandas-dev#23376)
  CLN: Remove some dtype methods from API (pandas-dev#23390)
  CLN: Cleanup toplevel namespace shims (pandas-dev#23386)
  DOC: fixup whatsnew note for GH21394 (pandas-dev#23355)
  Fix import format at pandas/tests/extension directory (pandas-dev#23365)
  DOC: Remove Series.sortlevel from api.rst (pandas-dev#23395)
  API: Disallow dtypes w/o frequency when casting (pandas-dev#23392)
  BUG/TST/REF: Datetimelike Arithmetic Methods (pandas-dev#23215)
  STYLE: lint
  add np.nan* funcs to cython_table (pandas-dev#22109)
  Run Isort on tests/util single PR (pandas-dev#23347)
  BUG: Fix date_range overflow (pandas-dev#23345)
  Run Isort on tests/arrays single PR (pandas-dev#23346)
  ...
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Numeric Operations Arithmetic, Comparison, and Logical operations Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants