Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use memoryviews instead of ndarrays #22147

Merged
merged 3 commits into from
Aug 1, 2018
Merged

Conversation

jbrockmendel
Copy link
Member

For lookups and slicing memoryviews are supposed to be more performant than ndarrays. I haven't been able to detect any difference. @xhochy IIRC you've looked at this in some depth; are memoryviews unambiguously better for our purposes?

Getting rid of from numpy cimport [...] is necessary (but not sufficient) for getting rid of numpy 1.7 deprecation warnings.

@xhochy
Copy link
Contributor

xhochy commented Aug 1, 2018

I also had the feeling that they are more performant and flexible than using ndarray but they don't support const object[:] at the moment which was problematic in my case. I actually have the feeling that this is a bug in the type parsing of Cython as const object is not possible but const object[:] should be. Asked for clarification at cython/cython#2485 but got no answer yet and I did not fully understand the Cython codebase to answer it myself.

@codecov
Copy link

codecov bot commented Aug 1, 2018

Codecov Report

Merging #22147 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #22147   +/-   ##
=======================================
  Coverage   92.07%   92.07%           
=======================================
  Files         170      170           
  Lines       50694    50694           
=======================================
  Hits        46674    46674           
  Misses       4020     4020
Flag Coverage Δ
#multiple 90.47% <ø> (ø) ⬆️
#single 42.29% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 57c7daa...4cfe9e8. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Aug 1, 2018

so this is not faster, but not slower? is it worth it at all?

@jbrockmendel
Copy link
Member Author

so this is not faster, but not slower? is it worth it at all?

Based on %timeit results, the memoryviews do appear to be every-so-slightly faster.

The other upsides I have in mind are:

  • weens us off of being numpy-specific (think: arrow)
  • easier to lint (though we're a few steps away from that)
  • getting rid of from numpy cimport ... is prerequisite to getting rid of deprecation warnings

@jreback jreback added the Compat pandas objects compatability with Numpy or Python functions label Aug 1, 2018
@jreback jreback added this to the 0.24.0 milestone Aug 1, 2018
@jreback jreback merged commit d010469 into pandas-dev:master Aug 1, 2018
@jreback
Copy link
Contributor

jreback commented Aug 1, 2018

ok thanks

@jbrockmendel jbrockmendel deleted the buffers3 branch August 1, 2018 22:16
dberenbaum pushed a commit to dberenbaum/pandas that referenced this pull request Aug 3, 2018
minggli added a commit to minggli/pandas that referenced this pull request Aug 5, 2018
* master: (47 commits)
  Run tests in conda build [ci skip] (pandas-dev#22190)
  TST: Check DatetimeIndex.drop on DST boundary (pandas-dev#22165)
  CI: Fix Travis failures due to lint.sh on pandas/core/strings.py (pandas-dev#22184)
  Documentation: typo fixes in MultiIndex / Advanced Indexing (pandas-dev#22179)
  DOC: added .join to 'see also' in Series.str.cat (pandas-dev#22175)
  DOC: updated Series.str.contains see also section (pandas-dev#22176)
  0.23.4 whatsnew (pandas-dev#22177)
  fix: scalar timestamp assignment (pandas-dev#19843) (pandas-dev#19973)
  BUG: Fix get dummies unicode error (pandas-dev#22131)
  Fixed py36-only syntax [ci skip] (pandas-dev#22167)
  DEPR: pd.read_table (pandas-dev#21954)
  DEPR: Removing previously deprecated datetools module (pandas-dev#6581) (pandas-dev#19119)
  BUG: Matplotlib scatter datetime (pandas-dev#22039)
  CLN: Use public method to capture UTC offsets (pandas-dev#22164)
  implement tslibs/src to make tslibs self-contained (pandas-dev#22152)
  Fix categorical from codes nan 21767 (pandas-dev#21775)
  BUG: Better handling of invalid na_option argument for groupby.rank(pandas-dev#22124) (pandas-dev#22125)
  use memoryviews instead of ndarrays (pandas-dev#22147)
  Remove depr. warning in SeriesGroupBy.count (pandas-dev#22155)
  API: Default to_* methods to compression='infer' (pandas-dev#22011)
  ...
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
@WillAyd WillAyd mentioned this pull request Jul 12, 2019
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants