PERF: fix performance regression in memory_usage(deep=True) for object dtype #33102

neilkg · 2020-03-28T16:02:20Z

closes PERF: Performance regression with memory_usage(deep=True) on object columns #33012
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

The pull request is to update lib.memory_usage_of_objects from taking self.arrays to self._values. An ASV included to benchmark with and without object-dtype columns.

Before:

After:

WillAyd · 2020-03-28T16:41:10Z

@jbrockmendel thoughts?

jbrockmendel · 2020-03-28T16:46:53Z

LGTM. @jorisvandenbossche has stronger opinions than i do on when to use .array

jorisvandenbossche · 2020-03-30T11:20:21Z

@jbrockmendel This is not about an opinion on when to use .array or not, this is about a cython routine needing a numpy array and not an EA.
(so the alternative to _values would be extract_array(... , extract_numpy=True), but since we already know we are object dtype here, it should be fine to use _values)

jorisvandenbossche

Thanks for the PR @neilkg !
Looks good to me

ShaharNaveh · 2020-03-30T22:26:27Z

@neilkg Can you please merge master?
This should solve the failing test.

https://dev.pandas.io/docs/development/contributing.html#updating-your-pull-request

…em-usage

neilkg · 2020-03-30T23:27:21Z

@neilkg Can you please merge master?
This should solve the failing test.

@MomIsBestFriend done

mroeschke · 2020-03-31T00:11:59Z

Thanks @neilkg!

…ory_usage(deep=True) for object dtype

jreback · 2020-03-31T00:25:23Z

this was tagged for 1.04? hmm

not sure if we are doing that but no harm
i guess (though the release note will need to be moved)

…deep=True) for object dtype (#33157) Co-authored-by: neilkg <33635204+neilkg@users.noreply.github.com>

updated to _values and added ASV

0622947

jorisvandenbossche changed the title ~~updated to _values and added ASV~~ PERF: fix performance regression in memory_usage(deep=True) for object dtype Mar 30, 2020

jorisvandenbossche added the Performance Memory or execution speed performance label Mar 30, 2020

jorisvandenbossche added this to the 1.0.4 milestone Mar 30, 2020

jorisvandenbossche reviewed Mar 30, 2020

View reviewed changes

Merge remote-tracking branch 'upstream/master' into perf-regression-m…

6c936dc

…em-usage

mroeschke approved these changes Mar 30, 2020

View reviewed changes

ShaharNaveh approved these changes Mar 31, 2020

View reviewed changes

mroeschke merged commit 30724b9 into pandas-dev:master Mar 31, 2020

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Mar 31, 2020

Backport PR pandas-dev#33102: PERF: fix performance regression in mem…

b56b25a

…ory_usage(deep=True) for object dtype

meeseeksmachine mentioned this pull request Mar 31, 2020

Backport PR #33102 on branch 1.0.x (PERF: fix performance regression in memory_usage(deep=True) for object dtype) #33157

Merged

simonjayhawkins pushed a commit that referenced this pull request May 5, 2020

Backport PR #33102: PERF: fix performance regression in memory_usage(…

1f87931

…deep=True) for object dtype (#33157) Co-authored-by: neilkg <33635204+neilkg@users.noreply.github.com>

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this pull request May 5, 2020

release note for pandas-dev#33102

3950580

simonjayhawkins added a commit that referenced this pull request May 6, 2020

release note for #33102 (#34005)

8923fd2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: fix performance regression in memory_usage(deep=True) for object dtype #33102

PERF: fix performance regression in memory_usage(deep=True) for object dtype #33102

neilkg commented Mar 28, 2020 •

edited

Loading

WillAyd commented Mar 28, 2020

jbrockmendel commented Mar 28, 2020

jorisvandenbossche commented Mar 30, 2020

jorisvandenbossche left a comment

ShaharNaveh commented Mar 30, 2020 •

edited

Loading

neilkg commented Mar 30, 2020

mroeschke commented Mar 31, 2020

jreback commented Mar 31, 2020

PERF: fix performance regression in memory_usage(deep=True) for object dtype #33102

PERF: fix performance regression in memory_usage(deep=True) for object dtype #33102

Conversation

neilkg commented Mar 28, 2020 • edited Loading

WillAyd commented Mar 28, 2020

jbrockmendel commented Mar 28, 2020

jorisvandenbossche commented Mar 30, 2020

jorisvandenbossche left a comment

Choose a reason for hiding this comment

ShaharNaveh commented Mar 30, 2020 • edited Loading

neilkg commented Mar 30, 2020

mroeschke commented Mar 31, 2020

jreback commented Mar 31, 2020

neilkg commented Mar 28, 2020 •

edited

Loading

ShaharNaveh commented Mar 30, 2020 •

edited

Loading