Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend sum reduction kernel, add argmin reduction kernel #46

Merged
merged 5 commits into from
Nov 15, 2022

Conversation

fcharras
Copy link
Collaborator

@fcharras fcharras commented Oct 27, 2022

Two things in this PR:

  • extend the sum reduction kernel for 1d arrays to 2d arrays when reducing over axis 1
  • add a similar argmin kernel

Not sure what is better, using those kernels or using dpnp functions.

Pros for using those kernels:

  • limit the dependency to dpnp (which can especially be a good things if numba_dpex get interoperability to nvidia/amd gpus before dpnp does)
  • more confidence regarding the strategy for dispatching to thread that fit more GPU devices, it's not clear at what point dpnp can run well on gpus
  • fun

Cons for using those kernels:

  • it's verbose, and if we want to optimize other usecases (depending if axis0 is "big enough", or else which is bigger from axis 0 or axis 1,...) it would require yet other kernels.
  • it does not sound like it's a good thing to jit those simple operations, the added jit time might get annoying
  • at some point, it can be assumed that a reliable tensor library (probably dpnp ?) will be available and the kernels will be obsolete anyway.
  • even if the implementation fits the gpu model, I think the performance is not that great, it might become competitive if there's a public api in numba_dpex to use async dispatching like described in On making task configuration and task args available to the user without executing IntelPython/numba-dpex#769 .
  • in fact, the performance does not even really matter because until now I don't think it contributes much to total execution time anyway.

Those pro / cons can also be weighted for all other kernels in this file, implementations for most of those are already available in dpnp.

@fcharras fcharras requested review from jjerphan and ogrisel October 27, 2022 15:24
@fcharras fcharras force-pushed the extend_sum_kernel_to_dim2 branch from 01e12bc to 48190dd Compare October 27, 2022 16:53
@fcharras fcharras force-pushed the extend_sum_kernel_to_dim2 branch 2 times, most recently from ef3975f to e2ebdfe Compare October 27, 2022 17:13
Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fcharras. Here are a few comments.

sklearn_numba_dpex/common/kernels.py Outdated Show resolved Hide resolved
sklearn_numba_dpex/common/kernels.py Outdated Show resolved Hide resolved
sklearn_numba_dpex/common/kernels.py Outdated Show resolved Hide resolved
sklearn_numba_dpex/common/kernels.py Outdated Show resolved Hide resolved
sklearn_numba_dpex/common/kernels.py Outdated Show resolved Hide resolved
sklearn_numba_dpex/common/kernels.py Outdated Show resolved Hide resolved
@fcharras
Copy link
Collaborator Author

TODO: add couple of tests and merge

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on a call with @fcharras, this is ready to merge modulo a few tests on the new common kernels.

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
@fcharras
Copy link
Collaborator Author

Tests added. LGTM ?

@fcharras fcharras requested a review from jjerphan November 14, 2022 08:57
Copy link
Collaborator

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is my review.

sklearn_numba_dpex/common/tests/test_kernels.py Outdated Show resolved Hide resolved
sklearn_numba_dpex/common/tests/test_kernels.py Outdated Show resolved Hide resolved
sklearn_numba_dpex/common/tests/test_kernels.py Outdated Show resolved Hide resolved
sklearn_numba_dpex/kmeans/drivers.py Outdated Show resolved Hide resolved
sklearn_numba_dpex/kmeans/drivers.py Outdated Show resolved Hide resolved
sklearn_numba_dpex/kmeans/drivers.py Outdated Show resolved Hide resolved
sklearn_numba_dpex/common/kernels.py Outdated Show resolved Hide resolved
kernels_and_empty_tensors_pairs.append((kernel, result))

def sum_reduction(summands):
# TODO: manually dispatch the kernels with a SyclQueue
for kernel, result in kernels_and_empty_tensors_pairs:
kernel(summands, result)
summands = result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more efficient to do the calls to dpt.empty(result_shape, dtype=dtype, device=device) lazily inside this loop, instead of allocating all the results buffers ahead of time?

Or maybe the Python GC would add sequential overhead that would kill the performance?

Copy link
Collaborator Author

@fcharras fcharras Nov 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By allocating ahead of time we also keep the buffers allocated and re-use them across all future calls to this instance of sum_reduction. The buffers will be only be garbage collected when the instance of sum_reduction is garbage collected. A given instance of sum_reduction in our loops is going to be called once per iteration so I think it's more sensible this way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add an inline comment to explain this!

sklearn_numba_dpex/common/kernels.py Show resolved Hide resolved
sklearn_numba_dpex/common/tests/test_kernels.py Outdated Show resolved Hide resolved
Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @ogrisel outlined all everything already.

I guess, make_sum_reduction_2d_axis0_kernel can be implemented when need and those test generalised if it is the case. What do you think?


@pytest.mark.parametrize("dtype", float_dtype_params)
def test_argmin_reduction_1d(dtype):
n_items = 4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to define n_items based on the length of array_in?

This comment also applies in other tests.

@fcharras
Copy link
Collaborator Author

fcharras commented Nov 14, 2022

@jjerphan

make_sum_reduction_2d_axis0_kernel can be implemented when need and those test generalised if it is the case.

what do you mean ? all the aspects in this PR are required for kmeans++

(edit: indeed _axis0 is not required only axis1)

@jjerphan
Copy link
Member

jjerphan commented Nov 14, 2022

I meant that make_sum_reduction_2d_axis1_kernel is implemented as it is needed for kmeans++, but make_sum_reduction_2d_axis0_kernel is not at the moment but might be needed in the future.

If make_sum_reduction_2d_axis0_kernel is implemented in the future, tests could be adapted accordingly: to me, nothing is to change for this PR.

@ogrisel
Copy link
Collaborator

ogrisel commented Nov 14, 2022

Let's not worry about make_sum_reduction_2d_axis0_kernel before we ever need it and refactor only then if needed.

fcharras and others added 2 commits November 14, 2022 15:40
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@fcharras fcharras requested a review from ogrisel November 14, 2022 17:33
Copy link
Collaborator

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@fcharras fcharras merged commit 5235fb3 into rng_kernels Nov 15, 2022
@fcharras fcharras deleted the extend_sum_kernel_to_dim2 branch November 15, 2022 07:31
@fcharras
Copy link
Collaborator Author

Merged! ty for review

fcharras added a commit that referenced this pull request Nov 15, 2022
…d argmin reduction kernel (#46)

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
ogrisel added a commit that referenced this pull request Nov 17, 2022
* Add a module for rng kernels and kernel funcs

* Test the RNG kernels against a reference implementation (randomgen)

* Ensure that the float32 rng is equivalent to the float64 rng casted to float32

* Mimic https\:\/\/prng\.di\.unimi\.it\/xoroshiro128plusplus\.c implementation rather than `randomgen` and document the issue

* Extend sum reduction kernel to axis1 reduction on 2d arrays and add 1d argmin reduction kernel

* Working k-means++

* Port kmeansplusplus tests from sklearn test_k_means module

* Reactivate sklearn relocation cluster unit test

* Clarity and commenting

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

* Fix variable name

* Overall cleaning kmeansplusplus

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

* test fix centers.dtype attr error

* test fix centers.dtype attr error

* test float32 and float64 rng, assert same rng, and commenting

* Overall commenting and nits

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

* Clarity in tests for rng

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

* Extend sum reduction kernel, add argmin reduction kernel (#46)

* Extend sum reduction kernel to axis1 reduction on 2d arrays and add 1d argmin reduction kernel

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

* Commenting + add a test for rng quality.

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

* Apply comment suggestions

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

* fix: subsequence_start

* centroid -> candidate

* Apply docstring suggestions

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

* Apply comment suggestions & minor fixes

* minor fix

* Add a quality test for kmeans plusplus

* Comment highlighting the equivalence of the results with engine and vanilla kmeans++ when evaluated with more iterations

* Fix kmeans plusplus test

* Enable k-means++ support of daal4py

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants