Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: NDArrayBacked in cython #40054

Closed

Conversation

jbrockmendel
Copy link
Member

Context: looking at backing DatetimeBlock/TimedeltaBlock by DatetimeArray/TimedeltaArray directly (as opposed to the status quo where we back by ndarrays) and profiling, the constructors stand out as a major source of overhead.

Idea: put the ndarray-wrapping methods in cython

Upsides: performance! stats currently pasted into the NDArrayBacked docstring:

import pandas as pd
from pandas._libs.lib import NDArrayBacked as cls

dti = pd.date_range("2016-01-01", periods=3)
dta = dti._data
arr = dta._ndarray

obj = cls._simpler_new(arr, arr.dtype)

# for foo in [arr, dta, obj]: ...

%timeit foo.copy()
299 ns ± 30 ns per loop     # <-- arr underlying ndarray (for reference)
530 ns ± 9.24 ns per loop   # <-- dta with cython NDArrayBacked
1.66 µs ± 46.3 ns per loop  # <-- dta without cython NDArrayBacked  <-- i.e. master
328 ns ± 5.29 ns per loop   # <-- obj with NDArrayBacked.__cinit__
371 ns ± 6.97 ns per loop   # <-- obj with NDArrayBacked._simpler_new

%timeit foo.T
125 ns ± 6.27 ns per loop   # <-- arr underlying ndarray (for reference)
226 ns ± 7.66 ns per loop   # <-- dta with cython NDArrayBacked
911 ns ± 16.6 ns per loop   # <-- dta without cython NDArrayBacked  <-- i.e. master
215 ns ± 4.54 ns per loop   # <-- obj with NDArrayBacked._simpler_new

i.e. dta.copy() is 3x faster, and within 2x of numpy. With further tweaks we could get within 10% of numpy (the 328 ns number). dta.T is 4x faster and within 2x of numpy.

(T is more-improved than copy bc we override copy in python to copy .freq)

Downsides: 1) pickle is a PITA, 2) constructor gymnastics

Future:

  • lib.pyx is likely not the best place for this
  • can squeeze out more perf if we use __cinit__; requires more constructor gymnatics
  • can use in PeriodArray, Categorical, PandasArray with some effort
  • more methods could be optimized

@jreback
Copy link
Contributor

jreback commented Feb 26, 2021

does this change real world benchmarks by a non-trivial amount?

@jbrockmendel
Copy link
Member Author

does this change real world benchmarks by a non-trivial amount?

havent run the asvs yet, have been asv-ing the crap out of another branch (which led to #40066)

@jorisvandenbossche
Copy link
Member

I think you could apply several of the "tricks" that you now do in cython also in the current python code to get some speed-up already without using cython (of course not as fast as with cython, but it can certainly improve compared to master, I think): add a _from_backing_data to DatetimeArray that doesn't go through simple_new (which checks the dtype, asserts the array type), or in copy use a direct reference to the method (i.e. NDArrayBackedExtensionArray.copy(self)) instead of using super().

Secondly, this mostly matters for tiny arrays. Doing your ndarray vs EA copy comparison from above with a larger array gives:

In [19]: dti = pd.date_range("2016-01-01", freq='H', periods=100_000)
    ...: dta = dti._data
    ...: arr = dta._ndarray

In [20]: %timeit dta.copy()
27 µs ± 320 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [21]: %timeit arr.copy()
25 µs ± 359 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Which is not to say that micro-performance for small arrays cannot be important for certain use cases. But as Jeff says, I would first look for some more real-world uses cases where this matters, before considering using cython be worth the added complexity.

@jbrockmendel
Copy link
Member Author

that doesn't go through simple_new

good idea. ill give some of these a go.

@jbrockmendel
Copy link
Member Author

mothballing

@jbrockmendel jbrockmendel added the Mothballed Temporarily-closed PR the author plans to return to label Mar 4, 2021
@jbrockmendel jbrockmendel deleted the perf-ndarray-backed-2 branch April 15, 2021 02:28
@jbrockmendel jbrockmendel removed the Mothballed Temporarily-closed PR the author plans to return to label Apr 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants