-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: DataFrame with Int64 columns casts to float64 with .max()/.min() #34210
BUG: DataFrame with Int64 columns casts to float64 with .max()/.min() #34210
Conversation
It looks like #25777 needs to be fixed first, so closing to clear queue. |
pandas/core/frame.py
Outdated
if numeric_only is not None and axis in [0, 1]: | ||
if ( | ||
self._is_homogeneous_type and self._mgr.any_extension_types and axis == 0 | ||
) or (numeric_only is not None and axis in [0, 1]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the axis in [0, 1] is superfluous
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_any_all_np_func passes axis=None for reduction along both axes.
pandas/core/frame.py
Outdated
@@ -8419,7 +8419,9 @@ def _get_data(axis_matters): | |||
raise NotImplementedError(msg) | |||
return data | |||
|
|||
if numeric_only is not None and axis in [0, 1]: | |||
if ( | |||
self._is_homogeneous_type and self._mgr.any_extension_types and axis == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need all of these conditions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no I don't think so, as this is hiding some bugs (or inconsistencies)
>>> import pandas as pd
>>>
>>> pd.__version__
'1.1.0.dev0+1600.gdede2c7a1e'
>>>
>>> df = pd.DataFrame({"A": pd.Series([0, 1], dtype="category")})
>>> df
A
0 0
1 1
>>>
>>> df.all()
A False
dtype: bool
>>>
>>> df.A.all()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\simon\pandas\pandas\core\generic.py", line 11394, in logical_func
return self._reduce(
File "C:\Users\simon\pandas\pandas\core\series.py", line 4028, in _reduce
return delegate._reduce(name, skipna=skipna, **kwds)
File "C:\Users\simon\pandas\pandas\core\arrays\categorical.py", line 2076, in _reduce
raise TypeError(f"Categorical cannot perform the operation {name}")
TypeError: Categorical cannot perform the operation all
>>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will need to fix #21020 first
@jorisvandenbossche @jreback This PR in the current state would close #32651 BUT the issue would persist for DataFrames with non-numeric columns. I see the benefit of the changes here is consistency of the code path taken regardless of numeric_only parameter for numeric only DataFrames. I can't find an issue for this, but could potentially concoct one. There is overlap here with #32867 and #32867 would likely close #32651 anyway. I'll leave this as draft for now, or could close to clear queue if preferred. in the meantime, I assume investigating #21020 would also benefit #32867. There's also an issue with PeriodArray raising
I guess we want to maintain the same behaviour. or is inconsistent behaviour not desirable.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks fine, just some questions.
pandas/core/frame.py
Outdated
@@ -8441,6 +8443,10 @@ def blk_func(values): | |||
assert isinstance(res, dict) | |||
if len(res): | |||
assert len(res) == max(list(res.keys())) + 1, res.keys() | |||
elif not out_dtype: | |||
# The default dtype for empty Series will be 'object' instead of | |||
# 'float64' in a future version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have tests which currently hit this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was added because of test failures in tests/frame/test_analytics.py (maybe others)
========================================================================= short test summary info ==========================================================================
FAILED pandas/tests/frame/test_analytics.py::TestDataFrameAnalytics::test_stat_op_calc - AssertionError: Caused unexpected warning(s): [('DeprecationWarning', Deprecation...
FAILED pandas/tests/frame/test_analytics.py::TestDataFrameAnalytics::test_median - AssertionError: Caused unexpected warning(s): [('DeprecationWarning', DeprecationWarnin...
FAILED pandas/tests/frame/test_analytics.py::TestDataFrameAnalytics::test_mean_excludes_datetimes[None] - AssertionError: Caused unexpected warning(s): [('DeprecationWarn...
FAILED pandas/tests/frame/test_analytics.py::TestDataFrameAnalytics::test_mean_excludes_datetimes[UTC] - AssertionError: Caused unexpected warning(s): [('DeprecationWarni...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have tests which currently hit this case?
so I guess on master, no tests hit this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these cases were for empty DataFrames (after removing timestamp columns). using all for is_numeric was returning True for empty iterable (_mgr.blocks). changing is_numeric and this elif not out_dtype code block doesn't need to be added.
pandas/core/frame.py
Outdated
@@ -8419,7 +8419,9 @@ def _get_data(axis_matters): | |||
raise NotImplementedError(msg) | |||
return data | |||
|
|||
if numeric_only is not None and axis in [0, 1]: | |||
is_numeric = all(b.is_numeric for b in self._mgr.blocks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do the added tests hit this sufficiently (e.g. some blocks numeric some blocks not)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, but that is another issue #34520
can you merge master and update to comments |
I think we should rather fix this in #32867 (the test could still be useful though, so that could be merged here as an xfail, or I can move it to that PR) |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff