Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: fix performance regression in memory_usage(deep=True) for object dtype #33102

Merged
merged 2 commits into from
Mar 31, 2020

Conversation

neilkg
Copy link
Contributor

@neilkg neilkg commented Mar 28, 2020

The pull request is to update lib.memory_usage_of_objects from taking self.arrays to self._values. An ASV included to benchmark with and without object-dtype columns.

Before:
Screen Shot 2020-03-28 at 11 57 22 AM

After:
Screen Shot 2020-03-28 at 11 56 42 AM

@WillAyd
Copy link
Member

WillAyd commented Mar 28, 2020

@jbrockmendel thoughts?

@jbrockmendel
Copy link
Member

LGTM. @jorisvandenbossche has stronger opinions than i do on when to use .array

@jorisvandenbossche
Copy link
Member

@jbrockmendel This is not about an opinion on when to use .array or not, this is about a cython routine needing a numpy array and not an EA.
(so the alternative to _values would be extract_array(... , extract_numpy=True), but since we already know we are object dtype here, it should be fine to use _values)

@jorisvandenbossche jorisvandenbossche changed the title updated to _values and added ASV PERF: fix performance regression in memory_usage(deep=True) for object dtype Mar 30, 2020
@jorisvandenbossche jorisvandenbossche added the Performance Memory or execution speed performance label Mar 30, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.0.4 milestone Mar 30, 2020
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @neilkg !
Looks good to me

@ShaharNaveh
Copy link
Member

ShaharNaveh commented Mar 30, 2020

@neilkg Can you please merge master?
This should solve the failing test.


https://dev.pandas.io/docs/development/contributing.html#updating-your-pull-request

@neilkg
Copy link
Contributor Author

neilkg commented Mar 30, 2020

@neilkg Can you please merge master?
This should solve the failing test.

@MomIsBestFriend done

@mroeschke mroeschke merged commit 30724b9 into pandas-dev:master Mar 31, 2020
@mroeschke
Copy link
Member

Thanks @neilkg!

@jreback
Copy link
Contributor

jreback commented Mar 31, 2020

this was tagged for 1.04? hmm

not sure if we are doing that but no harm
i guess (though the release note will need to be moved)

simonjayhawkins pushed a commit that referenced this pull request May 5, 2020
…deep=True) for object dtype (#33157)

Co-authored-by: neilkg <33635204+neilkg@users.noreply.github.com>
simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this pull request May 5, 2020
simonjayhawkins added a commit that referenced this pull request May 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: Performance regression with memory_usage(deep=True) on object columns
7 participants