PERF: regression in DataFrame reduction ops performance #37081 #37118

ukarroum · 2020-10-14T18:22:06Z

closes PERF: regression in DataFrame reduction ops performance #37081
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff

Made the change proposed by @jorisvandenbossche in #35881 (comment)

Did a very quick comparison :

With self.dtypes (old version) :

In [8]: values = np.random.randn(100000, 4)   
   ...: df = pd.DataFrame(values).astype("int") 
   ...: %timeit df.sum() 
714 µs ± 21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

With self._iter_column_arrays() (new version) :

In [4]: values = np.random.randn(100000, 4)   
   ...: df = pd.DataFrame(values).astype("int") 
   ...: %timeit df.sum() 
477 µs ± 8.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

jreback

do we have an asv that covers this case, if not can you add one?

this doesn't need a note as its on master.

ukarroum · 2020-10-14T19:09:46Z

do we have an asv that covers this case, if not can you add one?

this doesn't need a note as its on master.

I believe we do :

https://pandas.pydata.org/speed/pandas/#stat_ops.FrameOps.time_op?p-op='sum'&p-dtype='int'

jorisvandenbossche · 2020-10-14T19:11:34Z

Indeed. Jeff, see the issue, it was actually catched thanks to our asv suite

jbrockmendel · 2020-10-14T20:25:06Z

pandas/core/frame.py

+        any_object = np.array(
+            [is_object_dtype(values.dtype) for values in self._iter_column_arrays()],
+            dtype=bool,
+        ).any()


let's only find the dtypes once (i.e. share with dtype_is_dt above

own_dtypes = [arr.dtype for arr in self._iter_column_arrays()] # or own_dtypes = [blk.dtype for blk in self._mgr.blocks]

done in 41827fb

…pe_is_dt'

jorisvandenbossche

Thanks @ukarroum, looks good!

ukarroum · 2020-10-17T09:57:48Z

Should i do something about the : 2 failed azure pipelines ?
Looks like they're failing on master too and the ./ci/code_checks.sh localy return no error.

jreback · 2020-10-17T13:50:01Z

thanks @ukarroum

(pandas-dev#37118)

[PERF] Fixed issue pandas-dev#37081

6357ac2

jreback changed the title ~~[PERF] Fixed issue #37081~~ PERF: regression in DataFrame reduction ops performance #37081 Oct 14, 2020

jreback added the Performance Memory or execution speed performance label Oct 14, 2020

jreback added this to the 1.1.4 milestone Oct 14, 2020

jreback added the Regression Functionality that used to work in a prior pandas version label Oct 14, 2020

jreback modified the milestones: 1.1.4, 1.2 Oct 14, 2020

jreback requested changes Oct 14, 2020

View reviewed changes

jbrockmendel reviewed Oct 14, 2020

View reviewed changes

PERF : pandas-dev#37081 Compute dtypes once for 'any_object' and 'dty…

41827fb

…pe_is_dt'

jorisvandenbossche approved these changes Oct 15, 2020

View reviewed changes

ukarroum requested a review from jreback October 17, 2020 09:55

jreback approved these changes Oct 17, 2020

View reviewed changes

jreback merged commit 9fed16c into pandas-dev:master Oct 17, 2020

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Oct 26, 2020

PERF: regression in DataFrame reduction ops performance pandas-dev#37081

a4e08f6

(pandas-dev#37118)

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020

PERF: regression in DataFrame reduction ops performance pandas-dev#37081

ee353ce

(pandas-dev#37118)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: regression in DataFrame reduction ops performance #37081 #37118

PERF: regression in DataFrame reduction ops performance #37081 #37118

ukarroum commented Oct 14, 2020 •

edited

Loading

jreback left a comment

ukarroum commented Oct 14, 2020

jorisvandenbossche commented Oct 14, 2020

jbrockmendel Oct 14, 2020

ukarroum Oct 14, 2020

jorisvandenbossche left a comment

ukarroum commented Oct 17, 2020

jreback commented Oct 17, 2020

PERF: regression in DataFrame reduction ops performance #37081 #37118

PERF: regression in DataFrame reduction ops performance #37081 #37118

Conversation

ukarroum commented Oct 14, 2020 • edited Loading

jreback left a comment

Choose a reason for hiding this comment

ukarroum commented Oct 14, 2020

jorisvandenbossche commented Oct 14, 2020

jbrockmendel Oct 14, 2020

Choose a reason for hiding this comment

ukarroum Oct 14, 2020

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

ukarroum commented Oct 17, 2020

jreback commented Oct 17, 2020

ukarroum commented Oct 14, 2020 •

edited

Loading