Eager jitting does not work with `numba_dpex` #816

fcharras · 2022-11-08T10:08:32Z

Following IntelPython/dpctl#894 I want to use numba_dpex with eager compilation, so I can access attributes such as preferred_work_group_size_multiple attached to the corresponding compiler.Kernel object that is available in the JitKernel.definitions dictionary.

This partially works since as long as the signature passed to dpex.kernel uses numba.Array(and nothing else) type, some compilation actually takes effect and I can access the attributes.

However, when running the kernel later on, there's a new signature generated with types USMNdArrayType, a new compilation step happens, and a new Kernel is added to the definitions, with the same signature but using USMNdArrayType instead of numba.Array.

As a result, the eager compilation is mostly pointless since the kernel that is eagerly compiled will not be used, instead an additional lazy compilation will take place.

A solution to that might be to relax npytypes_array_to_dpex_array to also accept USMNdArrayType arrays ?

The text was updated successfully, but these errors were encountered:

diptorupd · 2022-11-22T05:12:30Z

@fcharras I am addressing the issue in my ongoing refactoring of the dpex kernel decorator internals. In my current set of changes, I am deprecating the support for NumPy ndarray as a kernel argument. Instead, we will only support USM-based array types: dpctl.tensor.usm_ndarray, dpnp.ndarray, or any other type that provides a properly defined __sycl_usm_array_interface__ attribute. Thus, the compute follows data programming model is the prescribed option going forward.

Coming to the specialization issue. Dpex does not yet define its own types for specialization and we are dependent on the Numba types such as void, int32, etc. to specialize a kernel. Due to this reason, there is no clean way to tell dpex that a kernel will only be called with a __sycl_usm_array_interface__ complaint array.

Till we have a cleaner way to improve type support for __sycl_usm_array_interface__-based arrays, I propose the following detour for kernel specialization:

Type specialization of kernels continues to use the numba array notation .
Although, numba will infer the type as numba.core.types.Array internally dpex specializes the kernel on the USMNdArrayType, and at the point of dispatch checks if the arguments are of USMNdArrayType.
Two new keyword args will be required when specializing: device_type and backend.

import dpctl
from numba import void, int32
import numba_dpex as dpex

@dpex.kernel(
    func_or_sig=void(int32[:], int32[:], int32[:]), 
    device_type=dpctl.device_type.gpu, 
    backend=dpctl.backend.level_zero
)
def vecadd(a,b,c):
   i = dpex.get_global_id(0)
   c[i] = a[i] + b[i]

import numpy as np 

a = np.arange(100)
b = np.arange(100)
c = np.zeros_like(a)

vecadd[100](a,b,c) # Exception raised

import dpnp

a = dpnp.arange(100)
b = dpnp.arange(100)
c = dpnp.zeros_like(a)

vecadd[100](a,b,c) # Works provided the device and backend specializations match

Point 2 above is a hack to get around missing proper type support in dpex for USM arrays. I am going to start that work next. Till then what do you feel about the above approach?

fcharras · 2022-11-22T16:37:03Z

For me that seems to work. One minor remark: why not pass a device argument that would be a SyclDevice rather than device_type + backend ?

To make sure that I understand correctly, with those changes:

the JitKernel.definitions dictionary will never contain keys containing USMNdArrayType anymore, only numba.core.types.Array type. (a good thing I think)
In this dictionary, the value mapped to a key with numba.core.types.Array types will be in fact the result of two step:
- step 1: map the key to the corresponding signature with USMNdArrayType arrays
- step 2: compile the kernel using the signature with USMNdArrayType arrays

I think it's good 👌

Also as I understand, it seems that USMNdArrayType has more potential for specialization with possible parameterization that do not make sense for numba types (like type of memory) but those parameters are not really effective at the moment. So only relying on numba types for the moment seems right.

diptorupd · 2022-11-24T23:16:17Z

@fcharras

One minor remark: why not pass a device argument that would be a SyclDevice rather than device_type + backend ?

Ok. Let us use device instead.

the JitKernel.definitions dictionary will never contain keys containing USMNdArrayType anymore, only numba.core.types.Array type. (a good thing I think)
In this dictionary, the value mapped to a key with numba.core.types.Array types will be in fact the result of two step:
step 1: map the key to the corresponding signature with USMNdArrayType arrays
step 2: compile the kernel using the signature with USMNdArrayType arrays

Yes, that is what I was proposing.

After some more thinking and analyzing the existing code, I have identified gaps with the above approach. As you stated, my proposed approach needs to map the numba.core.types.Array key to the corresponding signature with USMNdArrayType arrays. However, USMNdArrayType requires the usm_type to be specified and we will either have to specialize for a specific default usm_type (say device), or else keep the specialization agnostic of the usm_type and allow any USMNdArrayType to be used in the signature ignoring the usm_type.

Instead, how about the following alternative:

import dpctl
from numba import void, int32
from numba_dpex import usm_array
import numba_dpex as dpex

usmarrty = usm_array(dtype=int32, ndim=2, layout="C", usm_type="device", device="gpu")

@dpex.kernel(
    func_or_sig=void(usmarrty, usmarrty, usmarrty)
)
def vecadd(a,b,c):
   i = dpex.get_global_id(0)
   c[i] = a[i] + b[i]

import numpy as np 

a = np.arange(100)
b = np.arange(100)
c = np.zeros_like(a)

vecadd[100](a,b,c) # Exception raised

import dpnp

a = dpnp.arange(100)
b = dpnp.arange(100)
c = dpnp.zeros_like(a)

vecadd[100](a,b,c) # Works provided the device and backend specializations match

fcharras · 2022-11-25T10:34:30Z

I also think it's fine. I haven't met usecases requiring use of usm_type other than with value usm_type=device so I can't provide more feedback about how much I think usm_type would need to be exposed or not.

Also there's maybe a middle-ground solution where both interfaces are supported, but the first syntax will create two entries in the dictionary definitions, one with the key defined using the numba types, and the other key defined using numba_dpex.usm_array types where usm_type always defaults to device, with the value being the same kernel object for both of those keys. But when using the second syntax, it only creates one entry in the dictionary.

When I explored this API the features I was looking for was to be able to retrieve the kernel object that was early-jitted using the signature I inputed, and also have the kernel be reused later on if inputs match the signature, in both cases it's possible with this middle-ground.

diptorupd · 2023-01-21T22:29:41Z

@fcharras I have pushed to main changes that address the eager compilation issue for kernels. A separate PR that addresses the same issue for func decorator is in the works.

The main changes are that:

Kernels can be specialized using usm_ndarray type only and I have removed support specializing using NumPy arrays.
Specialization using strings are not supported yet.
Instead of JIT compiling a specialized kernel, an exception is raised if a specialized kernel is not called with right arguments.

The docs have not yet been updated. But, you can refer https://github.com/IntelPython/numba-dpex/blob/main/numba_dpex/examples/kernel/kernel_specialization.py.

Will be good to get your feedback.

fcharras · 2023-01-23T13:55:14Z

👍 for the API and usage, this is excellent progress TY !

But unfortunately, for me numba_dpex doesn't work well after the merge of the refacto PR. There is a huge slowdown after this merge. It's not related to cache. I can't really say what's going on :-/ trying to get a minimal reproducer now.

I'd be curious to know if the new code (haven't read yet) would make easier the following usage:

letting the user to pass manually a kernel to a SyclQueue and manually call wait() ? (suggested in On making task configuration and task args available to the user without executing #769)
AOT compilation

fcharras · 2023-01-23T15:12:59Z

I also see segfaults, makes me think that something is wrong with local memory.

diptorupd · 2023-01-24T03:39:06Z

I also see segfaults, makes me think that something is wrong with local memory.

The problem most likely is because I without realizing the impact changed the way the global and local range args passed to the __getitem__. We are working on reverting my changes and fixing the issue.

diptorupd · 2023-01-24T03:46:55Z

I'd be curious to know if the new code (haven't read yet) would make easier the following usage:
* letting the user to pass manually a kernel to a `SyclQueue` 

Possibly. I will like to look at your use case to be able to answer definitively.

and manually call wait() ? (suggested in On making task configuration and task args available to the user without executing
#769)

Not yet. The problem is that NumPy ndarrays can still be passed to a kernel. In which case, we do a copy to USM from NumPy prior to the call and copy back after the wait. I deprecated the NumPy support for now. Let me enquire with another set of dpex users if they will be impacted. If there is no impact, then I will not wait an just remove the NumPy calls (the work was already started in #866). Once that is done, we can start returning the SyclEvent object from the __call__ function of JitKernel.

* AOT compilation

Yes, AOT is fixed now. See these examples: https://github.com/IntelPython/numba-dpex/blob/main/numba_dpex/examples/kernel/kernel_specialization.py.

fcharras · 2023-01-24T06:59:21Z

AOT

By AOT I meant also being able to further distribute pre-compiled code in python packages (see relevant numba section). Even better if it could be distributed without a dependency to numba_dpex.

I'm not sure this is something we would really want for our project and I wouldn't see it as a short-term priority but if that's something you have in mind for the future it would be interesting to know.

wait()

If you decide to keep support for numpy ndarrays in the future (something indeed we don't use for sklearn-numba-dpex) do you think a keyword to dpex.kernel to disable it and enable instead exposing the SyclEvent more easily could be considered ?

diptorupd · 2023-02-17T01:52:06Z

If you decide to keep support for numpy ndarrays in the future (something indeed we don't use for sklearn-numba-dpex) do you think a keyword to dpex.kernel to disable it and enable instead exposing the SyclEvent more easily could be considered ?

NumPy support is deprecated and I will drop it as soon as 0.20 is out of the door. As for allowing kernels to execute asynchronously, let me track it in a separate issue. We should be able to return dpctlSyclEvent objects from a kernel instead of always calling wait, once NumPy arrays are no longer allowed as kernel arguments.

diptorupd · 2023-02-17T01:55:49Z

Closing this issue as the main issue of supporting eager compilation works as expected in main now.

diptorupd self-assigned this Nov 9, 2022

fcharras mentioned this issue Nov 15, 2022

Remove dependency on pyopencl when dpctl==0.14.0 is out soda-inria/sklearn-numba-dpex#52

Closed

This was referenced Nov 25, 2022

On making task configuration and task args available to the user without executing #769

Closed

FEA: engine accepts dpnp.ndarray and dpt.usm_ndarray objects as input data. soda-inria/sklearn-numba-dpex#62

Merged

diptorupd mentioned this issue Jan 8, 2023

Refactor/kernel interfaces #804

Merged

22 tasks

fcharras mentioned this issue Jan 23, 2023

Can't reuse dpex.func device functions with different signatures #867

Closed

diptorupd mentioned this issue Jan 23, 2023

Perormance regressions introduced by latest changes to main #886

Closed

diptorupd mentioned this issue Feb 17, 2023

Supporting AOT compilation of kernels in Numba-dpex #915

Open

diptorupd mentioned this issue Feb 17, 2023

Allow returning a SyclEvent from kernels instead of always adding a wait. #916

Closed

diptorupd closed this as completed Feb 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eager jitting does not work with `numba_dpex` #816

Eager jitting does not work with `numba_dpex` #816

fcharras commented Nov 8, 2022 •

edited

Loading

diptorupd commented Nov 22, 2022

fcharras commented Nov 22, 2022

diptorupd commented Nov 24, 2022

fcharras commented Nov 25, 2022

diptorupd commented Jan 21, 2023

fcharras commented Jan 23, 2023 •

edited

Loading

fcharras commented Jan 23, 2023

diptorupd commented Jan 24, 2023

diptorupd commented Jan 24, 2023

fcharras commented Jan 24, 2023 •

edited

Loading

diptorupd commented Feb 17, 2023

diptorupd commented Feb 17, 2023

Eager jitting does not work with numba_dpex #816

Eager jitting does not work with numba_dpex #816

Comments

fcharras commented Nov 8, 2022 • edited Loading

diptorupd commented Nov 22, 2022

fcharras commented Nov 22, 2022

diptorupd commented Nov 24, 2022

fcharras commented Nov 25, 2022

diptorupd commented Jan 21, 2023

fcharras commented Jan 23, 2023 • edited Loading

fcharras commented Jan 23, 2023

diptorupd commented Jan 24, 2023

diptorupd commented Jan 24, 2023

fcharras commented Jan 24, 2023 • edited Loading

diptorupd commented Feb 17, 2023

diptorupd commented Feb 17, 2023

Eager jitting does not work with `numba_dpex` #816

Eager jitting does not work with `numba_dpex` #816

fcharras commented Nov 8, 2022 •

edited

Loading

fcharras commented Jan 23, 2023 •

edited

Loading

fcharras commented Jan 24, 2023 •

edited

Loading