-
-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: groupby first()/last() change datatype of all NaN columns to float; nth() preserves datatype #33591
Labels
Bug
Duplicate Report
Duplicate issue or pull request
Groupby
Missing-data
np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Comments
jdmarino
added
Bug
Needs Triage
Issue that has not been reviewed by a pandas team member
labels
Apr 16, 2020
I think this is fixed on master: [ins] In [1]: import pandas as pd
...:
...: df = pd.DataFrame(
...: {
...: "id": ["a", "b", "b", "c"],
...: "sym": ["ibm", "msft", "msft", "goog"],
...: "sectype": ["E", "E", "E", "E"],
...: }
...: )
...: df["osi"] = df.sym.where(df.sectype == "O")
...: print(df.dtypes)
...: print(df.groupby("id")["osi"].first())
...: print(df.groupby("id")["osi"].last())
...: print(pd.__version__)
...:
id object
sym object
sectype object
osi object
dtype: object
id
a NaN
b NaN
c NaN
Name: osi, dtype: object
id
a NaN
b NaN
c NaN
Name: osi, dtype: object
1.1.0.dev0+1288.g3a5ae505b |
dsaxton
added
Groupby
and removed
Needs Triage
Issue that has not been reviewed by a pandas team member
labels
Apr 17, 2020
mroeschke
added
good first issue
Needs Tests
Unit test(s) needed to prevent regressions
and removed
Bug
Groupby
labels
Apr 17, 2020
5 tasks
The code sample is different from the OP. The issue in OP does not appear to be fixed.
|
@simonjayhawkins Oops, yeah I think you're right |
simonjayhawkins
added
Bug
Groupby
Missing-data
np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
and removed
Needs Tests
Unit test(s) needed to prevent regressions
good first issue
labels
Apr 19, 2020
If no one has taken this, I would like to help solve this issue. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Bug
Duplicate Report
Duplicate issue or pull request
Groupby
Missing-data
np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Version 1.0.3 from the conda distro.
Code Sample, a copy-pastable example
Problem description
Given a dataframe with an all-NaN column of str/objects, performing a groupby().first() will convert the all-NaN column to float. This is true for .last() as well, but not for .nth(0) and .nth(-1), so these are workarounds.
This is a problem for me as the groupby().first() is in general code that is iteratively called and the results either pd.concat'd (resulting in a column with mixed types) or appended to an hdf file (causing failure on the write).
Expected Output
The resulting dataframe from a groupby().first()/.last() should have the same metadata (column structure and datatypes) as the input dataframe. The result of .first()/.last() should match that of .nth(0)/.nth(-1) .
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 85 Stepping 4, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 1.0.3
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.1.3.post20200330
Cython : 0.29.15
pytest : 5.4.1
hypothesis : 5.8.3
sphinx : 2.4.4
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.1.3
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.4.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.15
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.8
numba : 0.48.0
The text was updated successfully, but these errors were encountered: