Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dependency on pyopencl when dpctl==0.14.0 is out #52

Closed
oleksandr-pavlyk opened this issue Nov 14, 2022 · 5 comments · Fixed by #49
Closed

Remove dependency on pyopencl when dpctl==0.14.0 is out #52

oleksandr-pavlyk opened this issue Nov 14, 2022 · 5 comments · Fixed by #49

Comments

@oleksandr-pavlyk
Copy link
Contributor

As discussed in IntelPython/dpctl#886 preferred_wor_groups_size_multiple is a device-specific property of a kernel, rather than the property of dpctl.SyclDevice.

clinfo retrieves exactly as such. The following Python code uses SPIR-V from simple kernel taken from dpctl example (https://github.com/IntelPython/dpctl/tree/master/examples/pybind11/use_dpctl_sycl_kernel/resource):

import dpctl
import dpctl.program as dpp
import dpctl.tensor._device as dpt_d

double_it_spv = (
    b"\x03\x02#\x07\x00\x04\x01\x00\x0e\x00\x06\x00\x14\x00\x00\x00\x00"
    b"\x00\x00\x00\x11\x00\x02\x00\x04\x00\x00\x00\x11\x00\x02\x00\x05\x00"
    b"\x00\x00\x11\x00\x02\x00\x06\x00\x00\x00\x0b\x00\x05\x00\x01\x00\x00\x00"
    b"OpenCL.std\x00\x00\x0e\x00\x03\x00\x01\x00\x00\x00\x02\x00\x00\x00\x0f"
    b"\x00\x07\x00\x06\x00\x00\x00\t\x00\x00\x00double_it\x00\x00\x00\x05\x00"
    b"\x00\x00\x03\x00\x03\x00\x03\x00\x00\x00p\x8e\x01\x00\x05\x00\x0b\x00\x05"
    b"\x00\x00\x00__spirv_BuiltInGlobalInvocationId\x00\x00\x00\x05\x00\x03\x00"
    b"\n\x00\x00\x00x\x00\x00\x00\x05\x00\x03\x00\x0b\x00\x00\x00y\x00\x00\x00"
    b"\x05\x00\x04\x00\x0c\x00\x00\x00entry\x00\x00\x00\x05\x00\x04\x00\x0e\x00"
    b"\x00\x00call\x00\x00\x00\x00\x05\x00\x05\x00\x0f\x00\x00\x00arrayidx\x00\x00"
    b"\x00\x00\x05\x00\x03\x00\x12\x00\x00\x00mul\x00\x05\x00\x05\x00\x13\x00\x00"
    b"\x00arrayidx1\x00\x00\x00G\x00\r\x00\x05\x00\x00\x00)\x00\x00\x00"
    b"__spirv_BuiltInGlobalInvocationId\x00\x00\x00\x01\x00\x00\x00G\x00\x03\x00"
    b"\x05\x00\x00\x00\x16\x00\x00\x00G\x00\x04\x00\x05\x00\x00\x00\x0b\x00\x00\x00"
    b"\x1c\x00\x00\x00G\x00\x04\x00\n\x00\x00\x00&\x00\x00\x00\x05\x00\x00\x00G\x00"
    b"\x04\x00\n\x00\x00\x00&\x00\x00\x00\x06\x00\x00\x00G\x00\x04\x00\n\x00\x00\x00,"
    b"\x00\x00\x00\x04\x00\x00\x00G\x00\x04\x00\x0b\x00\x00\x00&\x00\x00\x00\x05\x00"
    b"\x00\x00G\x00\x04\x00\x0b\x00\x00\x00,\x00\x00\x00\x04\x00\x00\x00G\x00\x03\x00"
    b"\x12\x00\x00\x00u\x11\x00\x00\x15\x00\x04\x00\x02\x00\x00\x00 \x00\x00\x00\x00"
    b"\x00\x00\x00+\x00\x04\x00\x02\x00\x00\x00\x11\x00\x00\x00\x01\x00\x00\x00\x17"
    b"\x00\x04\x00\x03\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00 \x00\x04\x00\x04"
    b"\x00\x00\x00\x01\x00\x00\x00\x03\x00\x00\x00\x13\x00\x02\x00\x06\x00\x00\x00 "
    b"\x00\x04\x00\x07\x00\x00\x00\x05\x00\x00\x00\x02\x00\x00\x00!\x00\x05\x00\x08"
    b"\x00\x00\x00\x06\x00\x00\x00\x07\x00\x00\x00\x07\x00\x00\x00;\x00\x04\x00\x04"
    b"\x00\x00\x00\x05\x00\x00\x00\x01\x00\x00\x006\x00\x05\x00\x06\x00\x00\x00\t\x00"
    b"\x00\x00\x00\x00\x00\x00\x08\x00\x00\x007\x00\x03\x00\x07\x00\x00\x00\n\x00\x00"
    b"\x007\x00\x03\x00\x07\x00\x00\x00\x0b\x00\x00\x00\xf8\x00\x02\x00\x0c\x00\x00"
    b"\x00=\x00\x06\x00\x03\x00\x00\x00\r\x00\x00\x00\x05\x00\x00\x00\x02\x00\x00\x00"
    b"\x10\x00\x00\x00Q\x00\x05\x00\x02\x00\x00\x00\x0e\x00\x00\x00\r\x00\x00\x00\x00"
    b"\x00\x00\x00F\x00\x05\x00\x07\x00\x00\x00\x0f\x00\x00\x00\n\x00\x00\x00\x0e\x00"
    b"\x00\x00=\x00\x06\x00\x02\x00\x00\x00\x10\x00\x00\x00\x0f\x00\x00\x00\x02\x00"
    b"\x00\x00\x04\x00\x00\x00\xc4\x00\x05\x00\x02\x00\x00\x00\x12\x00\x00\x00\x10\x00"
    b"\x00\x00\x11\x00\x00\x00F\x00\x05\x00\x07\x00\x00\x00\x13\x00\x00\x00\x0b\x00"
    b"\x00\x00\x0e\x00\x00\x00>\x00\x05\x00\x13\x00\x00\x00\x12\x00\x00\x00\x02\x00"
    b"\x00\x00\x04\x00\x00\x00\xfd\x00\x01\x008\x00\x01\x00"
)

def get_preferred_work_group_size_multiple(o):
    if isinstance(o, dpctl.SyclQueue):
        q = o
    elif isinstance(o, dpctl.SyclDevice):
        q = dpt_d.normalize_queue_device(device=o)
    else:
        raise TypeError(
            f"Expected dpctl.SyclQueue or dpctl.SyclDevice, got {type(o)}"
        )
    pr = dpp.create_program_from_spirv(q, double_it_spv, "")
    krn = pr.get_sycl_kernel("double_it")
    return krn.preferred_work_group_size_multiple

Using it:

Python 3.9.12 (main, Jun  1 2022, 11:38:51)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.4.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import dpctl

In [2]: import get_preferred_work_group_multiple as pwgsm

In [3]: d_gpu = dpctl.SyclDevice("gpu")

In [4]: d_cpu = dpctl.SyclDevice("cpu")

In [5]: pwgsm.get_preferred_work_group_size_multiple(d_gpu)
Out[5]: 8

In [6]: pwgsm.get_preferred_work_group_size_multiple(d_cpu)
Out[6]: 128

In [7]: d_gpu.global_mem_cache_size
Out[7]: 1048576

In [8]: d_cpu.global_mem_cache_size
Out[8]: 262144

Incorporating this snippet into "sklean-numba-dpex" allows to drop "pyopencl" as dependency.

In the interest of full disclose, clinfo uses simple add kernel (c[i] = a[i] + b[i] for float32 input) instead of "double_it".

Once can compile such a kernel using clang as indicated in README of reference example and then save its content as byte object in the used example.

@jjerphan
Copy link
Member

Thank you for this proposal, @oleksandr-pavlyk.

While I think having get_preferred_work_group_multiple makes sense, I am uncomfortable storing inlined raw bytes (representing SPIR-V kernel) (i.e. the value of double_it_spv here) in a program for various reasons (especially maintenance and safety).

Would it be possible to define a SPIR-V kernel and have it compiled and loaded instead?
What do you think, @fcharras?

@fcharras
Copy link
Collaborator

fcharras commented Nov 15, 2022

TY for the suggestion @oleksandr-pavlyk , on my side I was also able to reproduce this using dpctl==0.14.0dev1 and accessing the attributes of the a[i] = b[i] + c[i] kernel created with numba_dpex, and also observed that the value can be different for different kernels (e.g preferred_work_group_size_multiple is set to 8 by opencl for ou lloyd kernel).

But in fact we were after sub_group_size rather that preferred_work_group_size_multiple and I think that we mistakenly used the latter, there's already a PR #49 that replace the use of preferred_group_size_multiple with a better approximation of sub_group_size using max_work_group_size // dpctl_device.max_num_sub_groups. Related : IntelPython/dpctl#975 , We will probably rather use min(sub_group_sizes) after 0.14 ? or maybe use max_sub_group_size attribute from kernel but there are issues related to that in numba_dpex see IntelPython/numba-dpex#816 .

Also, global_mem_cache_size will be available in next dpctl release.

Those two missing attributes being available, certainly pyopencl will be removed after 0.14 release.

@fcharras
Copy link
Collaborator

Also, the way we use sub_group_sizes might improve in the future too, personally I'm not understanding it fully yet. Why is there plural sub_group_size**s**, which actually contains several value for intel gpus (e.g for my hardware clinfo gives 8, 16, 32) but nvidia/amd have the warp size / wavefront size ? It also seems that with dpctl / numba_dpex there's no way neither to set the sub_group_size when a kernel is ran, nor to check what sub_group_size is actually used. Is it always set to max_sub_group_size ?

@fcharras
Copy link
Collaborator

Maybe we can rename this issue to "remove dependency to pyopencl when dpctl==0.14.0 is out" ?

@oleksandr-pavlyk oleksandr-pavlyk changed the title Suggestion to incorporate preferred_work_groups_size_multiple Remove dependency on pyopencl when dpctl==0.14.0 is out Nov 15, 2022
fcharras added a commit that referenced this issue Nov 18, 2022
…dpex into performance_fixes_and_improvements

* Apply same optimization to kmeans++

* Remove use of now useless Device module

* Remove dependency to pyopencl ( Fixes #52 )
@fcharras
Copy link
Collaborator

Dependency on pyopencl has been removed. main now requires dpctl>=0.14.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants