Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: dropping nuisance columns in DataFrame reductions #41480

Merged
merged 13 commits into from
May 21, 2021

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented May 15, 2021

  • closes #xxxx
  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

Discussed on this week's call

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. can you add a sub-section in deprecations as this is fairly user visible. ping on green.

@@ -9800,6 +9800,21 @@ def _get_data() -> DataFrame:
# Even if we are object dtype, follow numpy and return
# float64, see test_apply_funcs_over_empty
out = out.astype(np.float64)

if numeric_only is None and out.shape[0] != df.shape[1]:
# columns have been dropped
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue number here and below (this PR number is fine),

@jreback jreback added Deprecate Functionality to remove in pandas Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Reduction Operations sum, mean, min, max, etc. labels May 17, 2021
@jreback jreback added this to the 1.3 milestone May 17, 2021
@jbrockmendel
Copy link
Member Author

added a whatsnew subsection. this is actually just the first half of the note im about to push for #41475.

the smart money says ive made a mess of the rst conventions regarding code-block:: ipython vs ipython:: python vs [...]


Deprecated Dropping Nuisance Columns in DataFrame Reductions and DataFrameGroupBy Operations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When calling a reduction (.min, .max, .sum, ...) on a :class:`DataFrame` with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When calling a reduction (.min, .max, .sum, ...) on a :class:`DataFrame` with
The default of calling a reduction (.min, .max, .sum, ...) on a :class:`DataFrame` with
``numeric_only=None`` will silently ignore and drop from the result nuiscance columns, e.g. a string column in a .mean() reduction.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When calling a reduction (.min, .max, .sum, ...) on a :class:`DataFrame` with
``numeric_only=None`` (the default, columns on which the reduction raises ``TypeError``
are silently ignored and dropped from the result. This behavior is deprecated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start a new paragraph with 'This behavior is deprecated'

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. can you rebase once more and ping on green.

@jbrockmendel
Copy link
Member Author

ping

@jreback jreback merged commit aa3bfc4 into pandas-dev:master May 21, 2021
@jbrockmendel jbrockmendel deleted the depr-ignore_only_none branch May 21, 2021 21:18
TLouf pushed a commit to TLouf/pandas that referenced this pull request Jun 1, 2021
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
zhengruifeng pushed a commit to apache/spark that referenced this pull request Aug 21, 2023
…0 and enabling tests

### What changes were proposed in this pull request?

This PR proposes to match the behavior with pandas 2.0.0 and above for stat functions, such as `sum`, `quantile`, `prod`, etc. See pandas-dev/pandas#41480 and pandas-dev/pandas#47500 for more detail.

### Why are the changes needed?

To match the behavior to latest pandas.

### Does this PR introduce _any_ user-facing change?

Yes, the behaviors for stat funcs are now matched with pandas 2.0.0 and above.

### How was this patch tested?

Enabling & updating the existing UTs.

Closes #42526 from itholic/pandas_stat.

Authored-by: itholic <haejoon.lee@databricks.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
valentinp17 pushed a commit to valentinp17/spark that referenced this pull request Aug 24, 2023
…0 and enabling tests

### What changes were proposed in this pull request?

This PR proposes to match the behavior with pandas 2.0.0 and above for stat functions, such as `sum`, `quantile`, `prod`, etc. See pandas-dev/pandas#41480 and pandas-dev/pandas#47500 for more detail.

### Why are the changes needed?

To match the behavior to latest pandas.

### Does this PR introduce _any_ user-facing change?

Yes, the behaviors for stat funcs are now matched with pandas 2.0.0 and above.

### How was this patch tested?

Enabling & updating the existing UTs.

Closes apache#42526 from itholic/pandas_stat.

Authored-by: itholic <haejoon.lee@databricks.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
ragnarok56 pushed a commit to ragnarok56/spark that referenced this pull request Mar 2, 2024
…0 and enabling tests

### What changes were proposed in this pull request?

This PR proposes to match the behavior with pandas 2.0.0 and above for stat functions, such as `sum`, `quantile`, `prod`, etc. See pandas-dev/pandas#41480 and pandas-dev/pandas#47500 for more detail.

### Why are the changes needed?

To match the behavior to latest pandas.

### Does this PR introduce _any_ user-facing change?

Yes, the behaviors for stat funcs are now matched with pandas 2.0.0 and above.

### How was this patch tested?

Enabling & updating the existing UTs.

Closes apache#42526 from itholic/pandas_stat.

Authored-by: itholic <haejoon.lee@databricks.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Reduction Operations sum, mean, min, max, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants