Request for additonal attribute `preferred_work_group_size_multiple` #886

fcharras · 2022-08-17T09:12:34Z

dpctl.SyclDevice exposes a number of useful attributes when scheduling kernels, but preferred_work_group_size_multiple seems to be an important information (used when choosing kernel local size) that is missing.

It is available with clinfo:

> clinfo
...
Device Name                                     Intel(R) UHD Graphics [0x9a60]
...
Device Type                                     GPU
...
Max work group size                             512
Preferred work group size multiple              64
...

or exposed in pyopencl

The text was updated successfully, but these errors were encountered:

fcharras · 2022-08-17T12:30:19Z

(global and local cache sizes would be a useful addition too)

oleksandr-pavlyk · 2022-08-25T13:21:20Z

The preferred_work_groups_size_multiple is exposed in SYCL as a device-specific information about a kernel, see Table 136 in the SYCL 2020 spec rev. 5.

Querying it requires a kernel, which can only be extract from a kernel bundle in the executable state.

oleksandr-pavlyk · 2022-08-25T13:37:01Z

Device object could definitely expose

sycl::info::device::global_mem_cache_type (can be none, read_only, read_write), sycl::info::device::global_mem_cache_line_size and sycl::info::device::global_mem_cache_size.

There does not seem to be any descriptor to query local cache size. Only the local_mem_size which is already exposed as dpctl.SyclDevice.local_mem_size property.

@fcharras Could you please clarify what you mean by "local cache sizes" and perhaps refer to it in clinfo output?

fcharras · 2022-08-25T13:54:14Z

The preferred_work_groups_size_multiple is exposed in SYCL as a device-specific information about a kernel, see Table 136 in the SYCL 2020 spec rev. 5.

Is there a way to access the information elsewhere from python (maybe in SyclKernel or numba_dpex.compiler.Kernel attributes ?)

oleksandr-pavlyk · 2022-08-25T13:55:55Z

The preferred_work_groups_size_multiple is exposed in SYCL as a device-specific information about a kernel, see Table 136 in the SYCL 2020 spec rev. 5.

Is there a way to access the information elsewhere from python (maybe in SyclKernel or numba_dpex.compiler.Kernel attributes ?)

Not yet, but perhaps exposing it for dpctl.SyclKernel is the right thing to do. I will make it happen and tag you in the PR.

oleksandr-pavlyk · 2022-08-25T13:58:11Z

BTW, notice that in OpenCL the property is also specific to kernel and device, see CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE.

clinfo just queries it (https://github.com/Oblomov/clinfo/blob/master/src/clinfo.c#L1584) for a simple kernel https://github.com/Oblomov/clinfo/blob/master/src/clinfo.c#L395-L402 amounting to out[i] = in1[i] + in2[i] where out, in1 and in2 are float *.

fcharras · 2022-08-25T14:12:38Z

Thank you very much.

Querying it requires a kernel, which can only be extract from a kernel bundle in the executable state.

If I understand correctly, the information on the preferred work group size is only accessible once the kernel as been compiled (which makes sense because the compiler could have extra information on this value). Then if it is exposed with dpctl.SyclKernel, the correct order of instruction with numba_dpex would be:

explicitly pass a signature to numba_dpex.kernel to trigger compilation immediately rather than delaying it to the first call (I don't know if this works yet but I think it's a common use case in numba, that's probably a good practice I've missed)
query the preferred_work_group_size_multiple from the kernel it returns
use this information to choose the work_group_size accordingly
start the compute

edit: that's probably the right workflow considering https://github.com/IntelPython/numba-dpex/blob/12cbcf80f09da38bad23cfc7327266da6a4fc5e1/numba_dpex/decorators.py#L28

fcharras · 2022-08-25T14:21:12Z

Could you please clarify what you mean by "local cache sizes"

You're right, that doesn't seem to exist in clinfo, sorry for the misdirection. The point only holds for global_mem_cache_line_size and global_mem_cache_size and global_mem_cache_type then.

FYI, in soda-inria/sklearn-numba-dpex#2 we've been using global_mem_cache_size to adjust the strategy ("privatization of updated arrays") used to avoid collisions on atomic updates in global memory, which is about creating some temporary writing structure that is larger but is hoped to be cached correctly. (and we've observed that the strategy is very effective, it almosts completely alleviates the bottleneck on atomics)

fcharras · 2022-08-25T14:32:01Z

Not yet, but perhaps exposing it for dpctl.SyclKernel is the right thing to do. I will make it happen and tag you in the PR.

Thank you, that would be awesome

These are DPCTLDevice_GetGlobalMemCacheSize, DPCTLDevice_GlobalMemCacheLineSize, and DPCTLDevice_GetGlobalMemCacheType. To support the latter, introduced DPCTLGlobalMemCacheType enum in dpctl_sycl_enum_types.h Tests are added to test_capi target.

fcharras · 2022-08-30T09:39:38Z

Thank you for the PR @oleksandr-pavlyk

I've also come accross papers (e.g this one for top k) where the authors use some other information about the device for the finest grained optimizations, like bank size or size of simd units, it's also exposed in pyopencl although it's a different API , but I don't think it's covered by the sycl spec at the moment.

gh-886: Added 3 new device attributes and kernel's device-specific attributes

These are DPCTLDevice_GetGlobalMemCacheSize, DPCTLDevice_GlobalMemCacheLineSize, and DPCTLDevice_GetGlobalMemCacheType. To support the latter, introduced DPCTLGlobalMemCacheType enum in dpctl_sycl_enum_types.h Tests are added to test_capi target.

oleksandr-pavlyk mentioned this issue Aug 26, 2022

gh-886: Added 3 new device attributes and kernel's device-specific attributes #894

Merged

5 tasks

oleksandr-pavlyk closed this as completed in #894 Sep 4, 2022

oleksandr-pavlyk added a commit that referenced this issue Sep 4, 2022

Merge pull request #894 from IntelPython/gh-886-more-device-attributes

f48cf30

gh-886: Added 3 new device attributes and kernel's device-specific attributes

oleksandr-pavlyk mentioned this issue Nov 14, 2022

Remove dependency on pyopencl when dpctl==0.14.0 is out soda-inria/sklearn-numba-dpex#52

Closed

fcharras mentioned this issue Nov 17, 2022

High-level API for registering the engine with scikit-learn's entry point soda-inria/sklearn-numba-dpex#20

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for additonal attribute `preferred_work_group_size_multiple` #886

Request for additonal attribute `preferred_work_group_size_multiple` #886

fcharras commented Aug 17, 2022

fcharras commented Aug 17, 2022

oleksandr-pavlyk commented Aug 25, 2022

oleksandr-pavlyk commented Aug 25, 2022

fcharras commented Aug 25, 2022

oleksandr-pavlyk commented Aug 25, 2022

oleksandr-pavlyk commented Aug 25, 2022

fcharras commented Aug 25, 2022 •

edited

Loading

fcharras commented Aug 25, 2022 •

edited

Loading

fcharras commented Aug 25, 2022

fcharras commented Aug 30, 2022

Request for additonal attribute preferred_work_group_size_multiple #886

Request for additonal attribute preferred_work_group_size_multiple #886

Comments

fcharras commented Aug 17, 2022

fcharras commented Aug 17, 2022

oleksandr-pavlyk commented Aug 25, 2022

oleksandr-pavlyk commented Aug 25, 2022

fcharras commented Aug 25, 2022

oleksandr-pavlyk commented Aug 25, 2022

oleksandr-pavlyk commented Aug 25, 2022

fcharras commented Aug 25, 2022 • edited Loading

fcharras commented Aug 25, 2022 • edited Loading

fcharras commented Aug 25, 2022

fcharras commented Aug 30, 2022

Request for additonal attribute `preferred_work_group_size_multiple` #886

Request for additonal attribute `preferred_work_group_size_multiple` #886

fcharras commented Aug 25, 2022 •

edited

Loading

fcharras commented Aug 25, 2022 •

edited

Loading