.last() does not perform as expected #20657

sursu · 2018-04-11T14:00:16Z

I find that .last() does not perform as expected.

Example:

df = pd.DataFrame([[179293473,'2016-06-01 00:00:03.549745','http://www.dr.dk/nyheder/',39169523],[179293473,'2016-06-01 00:04:22.346018','https://www.information.dk/indland/2016/05/hvert-tredje-offer-naar-anmelde-voldtaegt-tide', 39125224],
 [179773461, '2016-06-01 22:13:16.588146', 'https://www.google.dk', 31658124],
 [179773461, '2016-06-01 22:14:04.059781', 'https://www.google.dk', 31658124],
 [179773461, '2016-06-01 22:16:37.230587', np.nan, 31658124],
 [179773461, '2016-06-01 22:23:09.847149', 'https://www.google.dk', 32718401],
 [179773461, '2016-06-01 22:23:55.158929', np.nan, 32718401],
 [179773461, '2016-06-01 22:27:00.857224', np.nan, 32718401]],
columns=['SessionID', 'PageTime', 'ReferrerURL', 'PageID'])

Problem description

When I run:
df.groupby('SessionID').last()
I get:

SessionID	PageTime	ReferrerURL	PageID
179293473	2016-06-01 00:04:22.346018	https://www.information.dk/indland/2016/05/hve...	39125224
179773461	2016-06-01 22:27:00.857224	https://www.google.dk	32718401

Expected Output

When, in fact, I was expecting the same result as obtained from:
df.groupby('SessionID').nth(-1)

SessionID	PageID	PageTime	ReferrerURL
179293473	39125224	2016-06-01 00:04:22.346018	https://www.information.dk/indland/2016/05/hve...
179773461	32718401	2016-06-01 22:27:00.857224	NaN

And while we are at .nth(), why does it mix up my column order?

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 79 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.22.0
pytest: 3.3.2
pip: 9.0.3
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

GriffinHines · 2018-04-14T00:25:28Z

Working on this. Why not just have last() reference nth(-1)?

sursu · 2018-04-14T12:05:11Z

It seems that this issue has been around for quiet some time: here #6732, and here #8427.

@GriffinHines If nth(-1) performs as fast as the intended .last() why even have a .last() function?

jreback · 2018-04-14T12:18:25Z

duplicate of #8427

sursu · 2018-04-14T14:07:11Z

@jreback I see you have closed this issue, and not because it would be.. close to being closed.

#8427 exists for nearly 4 years.
As a user I am curious:

Why hasn't it been resolved?
Wouldn't it be a good idea there at least to be a warning in the documentation about this?

jreback · 2018-04-14T14:16:25Z

@sursu because its a duplicate, which is already open.

you are welcome to contribute a fix for either of these things. We have 2400 open issues. How would you prioritize things?

sursu · 2018-04-19T13:01:48Z

@jreback I understand.
But, as @GriffinHines suggested, a temporary and very quick solution could be just to reference .nth(-1).
Yes, it's patching, but I believe people prefer that to bugs.

sursu mentioned this issue Apr 11, 2018

groupby().first() much slower with a str column present in the data. #19283

Closed

jreback closed this as completed Apr 14, 2018

jreback added Groupby Duplicate Report Duplicate issue or pull request labels Apr 14, 2018

jreback added this to the No action milestone Apr 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.last() does not perform as expected #20657

.last() does not perform as expected #20657

sursu commented Apr 11, 2018

INSTALLED VERSIONS

GriffinHines commented Apr 14, 2018

sursu commented Apr 14, 2018

jreback commented Apr 14, 2018

sursu commented Apr 14, 2018

jreback commented Apr 14, 2018

sursu commented Apr 19, 2018 •

edited

Loading

.last() does not perform as expected #20657

.last() does not perform as expected #20657

Comments

sursu commented Apr 11, 2018

Example:

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

GriffinHines commented Apr 14, 2018

sursu commented Apr 14, 2018

jreback commented Apr 14, 2018

sursu commented Apr 14, 2018

jreback commented Apr 14, 2018

sursu commented Apr 19, 2018 • edited Loading

Output of `pd.show_versions()`

sursu commented Apr 19, 2018 •

edited

Loading