-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: support parallel calculation of nancorr #24795
Conversation
Hello @noamher! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on January 16, 2019 at 08:57 Hours UTC |
2f182a9
to
04a384c
Compare
@noamher : Thanks for opening this PR! While the performance increases sounds appealing from the onset, a couple of points that could improve this proposal:
|
I don't think we're quite ready for this. We need to do some more discussion on when and how we'll use parallelism. |
|
||
from libc.stdlib cimport malloc, free | ||
from libc.string cimport memmove | ||
from libc.math cimport fabs, sqrt | ||
from cpython cimport bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does cython treat this differently from built-in bool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is the type used by cython to declare a python boolean (an object which means many operations can only be taken on it when gil is taken).
int64_t nobs = 0 | ||
float64_t vx, vy, sumx, sumy, sumxx, sumyy, meanx, meany, divisor | ||
int64_t blah = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary, will be removed.
nancorr_single_row(mat, N, K, result, xi, mask, minpv, cov) | ||
else: | ||
with nogil: | ||
for xi in range(K): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this be collapsed with something like range_func = range(K) if not parallel else prange(K, schedule='dynamic')
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK No, because prange is not really a function, it is converted into a #pragma of sorts in the c code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I think there is an issue on Cython's GH about making a maybe_nogil that would be semantically similar to maybe_prange. Might be worth commenting there
@@ -677,10 +678,11 @@ def srcpath(name=None, suffix='.pyx', subdir='src'): | |||
obj = Extension('pandas.{name}'.format(name=name), | |||
sources=sources, | |||
depends=data.get('depends', []), | |||
include_dirs=include, | |||
include_dirs=include + [numpy.get_include()], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this different from the pkg_resources version that is already in there?
language=data.get('language', 'c'), | ||
define_macros=data.get('macros', macros), | ||
extra_compile_args=extra_compile_args) | ||
extra_compile_args=['-fopenmp'] + extra_compile_args, | ||
extra_link_args=['-fopenmp']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this platform-dependent in any way? i.e. do we need to check that it is available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is the gcc flag to link against openmp and I think clang uses the same flag but I'm not sure.
@@ -230,14 +232,15 @@ def kth_smallest(numeric[:] a, Py_ssize_t k) -> numeric: | |||
|
|||
@cython.boundscheck(False) | |||
@cython.wraparound(False) | |||
def nancorr(ndarray[float64_t, ndim=2] mat, bint cov=0, minp=None): | |||
def nancorr(float64_t[:, :] mat, bint cov=0, minp=None, bool parallel=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on changing to memoryviews where viable. if we don't alter mat
inplace, might want to add the const
modifier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
bint minpv, | ||
bint cov=0) nogil: | ||
for yi in range(xi + 1): | ||
nancorr_single(mat, N, K, result, xi, yi, mask, minpv, cov) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if nancorr_single isn't used elsewhere, I'd inline it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I agree, this is out of scope for now w/o further API discussion. We don't have a generic parallelism story ATM. this would have to be threaded down to the function and have the abilitty to generically turn parallelism on/off at a much higher level for other frameworks. @noamher if you would open an issue and try to reference a couple of open issues (pls do a search). for discussions. |
@noamher the memory view changes are ok, would take a PR on those. |
@jreback I'll open a relevant issue and see if I can find other relevant issues. I'll do the memory view changes. |
This is a proposal for using openmp to speedup the nancorr function (used by pd.DataFrame.corr).
If this is something that is useful, it can probably be implemented for other cython algorithms implemented in algos.pyx.
Also, the interface has to be decided on: how to choose whether to use parallelization or not, how many cpus, schedule strategy for the prange, etc.
I am not sure what the implications are for adding openmp to compilation and linkage in terms of portability.
Using 4 cpus I got ~60% speedup on a pd.DataFrame.corr (of size 20000 x 1300).