Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: first try inplace setitem for .at indexer #49772

Merged
merged 9 commits into from
Dec 9, 2022

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Nov 18, 2022

This is from the commit that I originally also tested in the PR that caused the perf regression: https://github.com/pandas-dev/pandas/pull/47074/files#r878902106 (but eventually didn't include because it didn't seem needed for getting the correct behaviour, but didn't consider performance at the time)

Using the example from #49729:

import pandas as pd

def foo(df):
    for idx in df.index:
        df.at[idx, "bar"] = 3

df = pd.DataFrame(range(10000))
df["bar"] = 0

%timeit foo(df)
# 442 ms ± 16.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  <-- main
# 202 ms ± 3.13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)` <-- PR

This is not yet as fast as on 1.4.x, but #49771 further helps with this as well. With both together, I get to down to 119 ms (vs 50 ms on 1.4.3).
The remaining overhead is mostly from Manager.iget to get the SingleBlockManager

@jorisvandenbossche jorisvandenbossche modified the milestones: 1.5.3, 1.5.2 Nov 18, 2022
@jorisvandenbossche jorisvandenbossche added Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance labels Nov 18, 2022
@jorisvandenbossche jorisvandenbossche marked this pull request as ready for review November 18, 2022 21:11
@datapythonista datapythonista modified the milestones: 1.5.2, 1.5.3 Nov 21, 2022
@jorisvandenbossche
Copy link
Member Author

@jbrockmendel any other comment, or OK to merge?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u add the regressed benchmark

@jorisvandenbossche
Copy link
Member Author

Good point. Unless I am missing something, it seems we actually don't have any benchmarks at all for .at ..
(the underlying _get_value was previously still benchmarked through the lookup benchmarks, but those have been removed since that method had been removed)

@jbrockmendel
Copy link
Member

@jorisvandenbossche
Copy link
Member Author

Added a benchmark case for at (both getitem and setitem) to the existing DataFrame indexing benchmark class. Using a code snippet mimicking the benchmark setup:

import pandas._testing as tm

index = tm.makeStringIndex(1000)
columns = tm.makeStringIndex(30)
df = DataFrame(np.random.randn(1000, 30), index=index, columns=columns)
idx_scalar = index[100]
col_scalar = columns[10]

In [3]: %timeit df.at[idx_scalar, col_scalar] = 0.0
100 µs ± 18.8 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)  # <-- main
25.7 µs ± 2.54 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)  # <-- PR

@@ -17,6 +17,7 @@ Fixed regressions
- Fixed regression in :meth:`DataFrameGroupBy.transform` when used with ``as_index=False`` (:issue:`49834`)
- Enforced reversion of ``color`` as an alias for ``c`` and ``size`` as an alias for ``s`` in function :meth:`DataFrame.plot.scatter` (:issue:`49732`)
- Fixed regression in :meth:`SeriesGroupBy.apply` setting a ``name`` attribute on the result if the result was a :class:`DataFrame` (:issue:`49907`)
- Fixed performance regression in the :meth:`~DataFrame.at` indexer (:issue:`49771`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you clarify it is setitem and not getitem

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Dec 9, 2022
@jorisvandenbossche jorisvandenbossche deleted the perf-at-loop branch December 9, 2022 09:00
jorisvandenbossche added a commit that referenced this pull request Dec 9, 2022
…or .at indexer) (#50142)

Backport PR #49772: PERF: first try inplace setitem for .at indexer

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants