-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend sum reduction kernel, add argmin reduction kernel #46
Conversation
01e12bc
to
48190dd
Compare
ef3975f
to
e2ebdfe
Compare
…d argmin reduction kernel
e2ebdfe
to
ee0897f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fcharras. Here are a few comments.
TODO: add couple of tests and merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed on a call with @fcharras, this is ready to merge modulo a few tests on the new common kernels.
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Tests added. LGTM ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is my review.
kernels_and_empty_tensors_pairs.append((kernel, result)) | ||
|
||
def sum_reduction(summands): | ||
# TODO: manually dispatch the kernels with a SyclQueue | ||
for kernel, result in kernels_and_empty_tensors_pairs: | ||
kernel(summands, result) | ||
summands = result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more efficient to do the calls to dpt.empty(result_shape, dtype=dtype, device=device)
lazily inside this loop, instead of allocating all the results buffers ahead of time?
Or maybe the Python GC would add sequential overhead that would kill the performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By allocating ahead of time we also keep the buffers allocated and re-use them across all future calls to this instance of sum_reduction
. The buffers will be only be garbage collected when the instance of sum_reduction
is garbage collected. A given instance of sum_reduction
in our loops is going to be called once per iteration so I think it's more sensible this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add an inline comment to explain this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @ogrisel outlined all everything already.
I guess, make_sum_reduction_2d_axis0_kernel
can be implemented when need and those test generalised if it is the case. What do you think?
|
||
@pytest.mark.parametrize("dtype", float_dtype_params) | ||
def test_argmin_reduction_1d(dtype): | ||
n_items = 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to define n_items
based on the length of array_in
?
This comment also applies in other tests.
what do you mean ? all the aspects in this PR are required for kmeans++ (edit: indeed _axis0 is not required only axis1) |
I meant that If |
Let's not worry about |
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Merged! ty for review |
…d argmin reduction kernel (#46) Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
* Add a module for rng kernels and kernel funcs * Test the RNG kernels against a reference implementation (randomgen) * Ensure that the float32 rng is equivalent to the float64 rng casted to float32 * Mimic https\:\/\/prng\.di\.unimi\.it\/xoroshiro128plusplus\.c implementation rather than `randomgen` and document the issue * Extend sum reduction kernel to axis1 reduction on 2d arrays and add 1d argmin reduction kernel * Working k-means++ * Port kmeansplusplus tests from sklearn test_k_means module * Reactivate sklearn relocation cluster unit test * Clarity and commenting Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> * Fix variable name * Overall cleaning kmeansplusplus Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> * test fix centers.dtype attr error * test fix centers.dtype attr error * test float32 and float64 rng, assert same rng, and commenting * Overall commenting and nits Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> * Clarity in tests for rng Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Extend sum reduction kernel, add argmin reduction kernel (#46) * Extend sum reduction kernel to axis1 reduction on 2d arrays and add 1d argmin reduction kernel Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Commenting + add a test for rng quality. Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Apply comment suggestions Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * fix: subsequence_start * centroid -> candidate * Apply docstring suggestions Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * Apply comment suggestions & minor fixes * minor fix * Add a quality test for kmeans plusplus * Comment highlighting the equivalence of the results with engine and vanilla kmeans++ when evaluated with more iterations * Fix kmeans plusplus test * Enable k-means++ support of daal4py Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Two things in this PR:
Not sure what is better, using those kernels or using
dpnp
functions.Pros for using those kernels:
dpnp
(which can especially be a good things ifnumba_dpex
get interoperability to nvidia/amd gpus beforedpnp
does)dpnp
can run well on gpusCons for using those kernels:
dpnp
?) will be available and the kernels will be obsolete anyway.numba_dpex
to use async dispatching like described in On making task configuration and task args available to the user without executing IntelPython/numba-dpex#769 .Those pro / cons can also be weighted for all other kernels in this file, implementations for most of those are already available in dpnp.