BUG: .transform(...) with "first" and "last" fail when axis=1 #46074

rhshadrach · 2022-02-19T18:43:50Z

Part of #45986 (first/last)

Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

This removes warnings of dropping nuisance columns when using e.g. .transform("mean"). This makes it consistent with .agg, and in #46072 both transform and agg will warn about the default switching from numeric_only=False to numeric_only=True. Doing this first will make #46072 slightly easier.

Not falling back to _transform_item_by_item is also better for performance. For the benchmark, I disabled the warning being emitted in main to make sure that wasn't messing up the results.

size = 1_000_000
df = pd.DataFrame(
    {
        "A": size * ["foo", "bar"],
        "B": "one",
        "C": np.random.randn(2*size),
        "D": np.random.randn(2*size),
    }
)
%timeit df.groupby("A").transform("mean")

# This PR
108 ms ± 966 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# main, but with size set to 100_000 (1mm was taking very very long)
920 ms ± 7.95 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

rhshadrach · 2022-02-19T18:48:06Z

pandas/tests/groupby/transform/test_transform.py

@@ -418,45 +413,36 @@ def test_transform_select_columns(df):
    tm.assert_frame_equal(result, expected)


-@pytest.mark.parametrize("duplicates", [True, False])
-def test_transform_exclude_nuisance(df, duplicates):


In order to still hit the warning this was testing for, I needed to switch from a duplicated float column to a duplicated string column, as we no longer warn in the case of the duplicated float column. But using a string column would always fail with SeriesGroupBy, which is why duplicates=False was removed.

jreback · 2022-02-26T21:59:01Z

thanks @rhshadrach

…-dev#46074)

rhshadrach added 2 commits February 19, 2022 13:15

CLN: Always use transform_fast

eb362b1

whatsnew

07429bc

rhshadrach added Bug Groupby Performance Memory or execution speed performance Clean Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply labels Feb 19, 2022

rhshadrach mentioned this pull request Feb 19, 2022

BUG: Some str reducers fail in Groupby.transform with axis=1 #45986

Closed

7 tasks

rhshadrach commented Feb 19, 2022

View reviewed changes

jreback added this to the 1.5 milestone Feb 26, 2022

jreback merged commit e932ec9 into pandas-dev:main Feb 26, 2022

rhshadrach deleted the transform_wrap_results branch February 27, 2022 15:30

yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022

BUG: .transform(...) with "first" and "last" fail when axis=1 (pandas…

9823931

…-dev#46074)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: .transform(...) with "first" and "last" fail when axis=1 #46074

BUG: .transform(...) with "first" and "last" fail when axis=1 #46074

rhshadrach commented Feb 19, 2022

rhshadrach Feb 19, 2022

jreback commented Feb 26, 2022

BUG: .transform(...) with "first" and "last" fail when axis=1 #46074

BUG: .transform(...) with "first" and "last" fail when axis=1 #46074

Conversation

rhshadrach commented Feb 19, 2022

rhshadrach Feb 19, 2022

Choose a reason for hiding this comment

jreback commented Feb 26, 2022