Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behaviour with min method on object dtype columns #18588

Open
turner3467 opened this issue Dec 1, 2017 · 3 comments
Open

Inconsistent behaviour with min method on object dtype columns #18588

turner3467 opened this issue Dec 1, 2017 · 3 comments
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Reduction Operations sum, mean, min, max, etc.

Comments

@turner3467
Copy link

turner3467 commented Dec 1, 2017

xref #18021, #16832

Code Sample, a copy-pastable example if possible

df = pd.DataFrame({'col1': [0, 1, 2, 3], 'col2': ['a', 'b', None, 'd'], 'col3': ['e', 'f', None, 'h']})
df.min(axis=0, skipna=True)

Problem description

If I've read the documentation correctly I think min should return a series of the same length as the relevant axis of the dataframe, and especially with the skipna flag set as True (which is the default) NA values should be ignored in the calculation.

Output

col1 0
dtype: int64

Expected Output

col1 0
col2 a
col3 e
dtype: object

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.3
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Dec 1, 2017

so couple of things going on.

.min(numeric_only=None) by default so non-numerics are ignored (if they raise). But these fail anyhow if executed directly on the column.

In [8]: df.col2.min()
TypeError: '<=' not supported between instances of 'str' and 'float'

so we are not skipping NaN here and passing directly to numpy which raises (would have to patch nanmax as well);

In [1]: df = pd.DataFrame({'col1': [0, 1, 2, 3], 'col2': ['a', 'b',None, 'd'], 'col3': ['e', 'f', None, 'h']})

In [2]: df.col2.min()
Out[2]: 'a'

In [3]: df.min()
Out[3]: 
col1    0
col2    a
col3    e
dtype: object

patch

(pandas) bash-3.2$ git diff 
diff --git a/pandas/core/nanops.py b/pandas/core/nanops.py
index e1c09947a..33377002e 100644
--- a/pandas/core/nanops.py
+++ b/pandas/core/nanops.py
@@ -478,6 +478,8 @@ def _nanminmax(meth, fill_value_typ):
             except:
                 result = np.nan
         else:
+
+            values = values[~mask]
             result = getattr(values, meth)(axis)
 
         result = _wrap_results(result, dtype)

@jreback
Copy link
Contributor

jreback commented Dec 1, 2017

so open to fixing this (PR welcome!) it might break some existing tests.

@jreback jreback added Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions Difficulty Intermediate labels Dec 1, 2017
@jreback jreback added this to the Next Major Release milestone Dec 1, 2017
@jreback jreback modified the milestones: Contributions Welcome, 1.0.2 Feb 15, 2020
@TomAugspurger TomAugspurger modified the milestones: 1.0.2, Contributions Welcome Mar 10, 2020
@TomAugspurger
Copy link
Contributor

#31757 was closed, so moving this off 1.0.2

@mroeschke mroeschke added Bug and removed Compat pandas objects compatability with Numpy or Python functions labels Apr 10, 2020
@jbrockmendel jbrockmendel added Numeric Operations Arithmetic, Comparison, and Logical operations Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Reduction Operations sum, mean, min, max, etc. labels Sep 21, 2020
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@jbrockmendel jbrockmendel removed the Numeric Operations Arithmetic, Comparison, and Logical operations label Mar 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Reduction Operations sum, mean, min, max, etc.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants