Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.last() does not perform as expected #20657

Closed
sursu opened this issue Apr 11, 2018 · 6 comments
Closed

.last() does not perform as expected #20657

sursu opened this issue Apr 11, 2018 · 6 comments
Labels
Duplicate Report Duplicate issue or pull request Groupby

Comments

@sursu
Copy link
Contributor

sursu commented Apr 11, 2018

I find that .last() does not perform as expected.

Example:

df = pd.DataFrame([[179293473,'2016-06-01 00:00:03.549745','http://www.dr.dk/nyheder/',39169523],[179293473,'2016-06-01 00:04:22.346018','https://www.information.dk/indland/2016/05/hvert-tredje-offer-naar-anmelde-voldtaegt-tide', 39125224],
 [179773461, '2016-06-01 22:13:16.588146', 'https://www.google.dk', 31658124],
 [179773461, '2016-06-01 22:14:04.059781', 'https://www.google.dk', 31658124],
 [179773461, '2016-06-01 22:16:37.230587', np.nan, 31658124],
 [179773461, '2016-06-01 22:23:09.847149', 'https://www.google.dk', 32718401],
 [179773461, '2016-06-01 22:23:55.158929', np.nan, 32718401],
 [179773461, '2016-06-01 22:27:00.857224', np.nan, 32718401]],
columns=['SessionID', 'PageTime', 'ReferrerURL', 'PageID'])

Problem description

When I run:
df.groupby('SessionID').last()
I get:

  SessionID PageTime ReferrerURL PageID
179293473 2016-06-01 00:04:22.346018 https://www.information.dk/indland/2016/05/hve... 39125224
179773461 2016-06-01 22:27:00.857224 https://www.google.dk 32718401

Expected Output

When, in fact, I was expecting the same result as obtained from:
df.groupby('SessionID').nth(-1)

  SessionID PageID PageTime ReferrerURL
179293473 39125224 2016-06-01 00:04:22.346018 https://www.information.dk/indland/2016/05/hve...
179773461 32718401 2016-06-01 22:27:00.857224 NaN

And while we are at .nth(), why does it mix up my column order?

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 79 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.22.0
pytest: 3.3.2
pip: 9.0.3
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@GriffinHines
Copy link

Working on this. Why not just have last() reference nth(-1)?

@sursu
Copy link
Contributor Author

sursu commented Apr 14, 2018

It seems that this issue has been around for quiet some time: here #6732, and here #8427.

@GriffinHines If nth(-1) performs as fast as the intended .last() why even have a .last() function?

@jreback
Copy link
Contributor

jreback commented Apr 14, 2018

duplicate of #8427

@jreback jreback closed this as completed Apr 14, 2018
@jreback jreback added Groupby Duplicate Report Duplicate issue or pull request labels Apr 14, 2018
@jreback jreback added this to the No action milestone Apr 14, 2018
@sursu
Copy link
Contributor Author

sursu commented Apr 14, 2018

@jreback I see you have closed this issue, and not because it would be.. close to being closed.

#8427 exists for nearly 4 years.
As a user I am curious:

  • Why hasn't it been resolved?
  • Wouldn't it be a good idea there at least to be a warning in the documentation about this?

@jreback
Copy link
Contributor

jreback commented Apr 14, 2018

@sursu because its a duplicate, which is already open.

you are welcome to contribute a fix for either of these things. We have 2400 open issues. How would you prioritize things?

@sursu
Copy link
Contributor Author

sursu commented Apr 19, 2018

@jreback I understand.
But, as @GriffinHines suggested, a temporary and very quick solution could be just to reference .nth(-1).
Yes, it's patching, but I believe people prefer that to bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request Groupby
Projects
None yet
Development

No branches or pull requests

3 participants