Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Specification] oneMKL lapack to allow asynchronous functions #589

Open
JackAKirk opened this issue Oct 11, 2024 · 5 comments
Open

[Specification] oneMKL lapack to allow asynchronous functions #589

JackAKirk opened this issue Oct 11, 2024 · 5 comments
Labels
help wanted Tasks, issues or features that could be implemented and contributed to the project

Comments

@JackAKirk
Copy link
Contributor

Summary

Linear algebra operators in oneMKL lapack that return computation error (e.g. for matrix operations such as inversion (e.g. getri) that may not have a solution) return this error via an exception ([oneapi::mkl::lapack::computation_error](https://oneapi-spec.uxlfoundation.org/specifications/oneapi/latest/elements/onemkl/source/architecture/architecture#onemkl-lapack-exception-computation-error)). To achieve this there is a implementation constraint that such functions as getri are synchronous, since they generally don't know this error code until completion. This means that even if (for example) a programmer inputs a matrix that does have a valid solution for the given operation (e.g. a matrix that is non-singular for an inverse operation), the user is forced to have all work wait on the return of this synchronous operation to check for an error code that is irrelevant. This affects are large proportion (maybe most?) of oneMKL lapacks most computationally intensive functions. Any workload using these functions will be severely bottlenecked with respect to asynchronous performance.

However native libraries such as cusolver (that oneMKL uses), can return this "computation error" information via a return value that is returned asynchronously. Therefore a change to the oneMKL specification would fix this issue.

Problem statement

Provide asynchronous oneMKL interfaces for Linear algebra operators that currently return "computation error" exceptions.

Details

oneMKL will need to remove the [oneapi::mkl::lapack::computation_error](https://oneapi-spec.uxlfoundation.org/specifications/oneapi/latest/elements/onemkl/source/architecture/architecture#onemkl-lapack-exception-computation-error) exception, and replace it with either:

  • Probably the only sensible solution: an extra parameter for each function that returns such an exception, that instead returns "SomethingInfo" asynchronously, that provides this computational error info: mapping one to one with e.g. cusolver.
  • Some kind of solution with SYCL asynchronous exceptions: I'm not sure if this is possible but could be looked into. AFAIK currently sycl asynchronous exceptions are completely unused.
@JackAKirk JackAKirk added the RFC A proposal to add new API label Oct 11, 2024
@ericlars
Copy link
Contributor

Hi @JackAKirk, thanks for the RFC.

The oneMKL LAPACK team has had an ongoing discussion on the issue you raise which I'll summarize here. We agree with your assessment of the blocking nature of using exceptions for computation errors and find it entirely reasonable to replace them with info variables (or arrays in the batch case).

Some kind of solution with SYCL asynchronous exceptions: I'm not sure if this is possible but could be looked into. AFAIK currently sycl asynchronous exceptions are completely unused.

SYCL does not allow exceptions to be thrown in kernel scope, we're only aware of the possibility to throw asynchronous exceptions from host_tasks which limits their usefulness.

Provide asynchronous oneMKL interfaces for Linear algebra operators that currently return "computation error" exceptions

Exception handling of computation errors is not the only blocker for asynchronous behavior. As we understand it, SYCL provides host_task for scheduling CPU tasks with device tasks. A limitation of host_task is that it is undefined behavior to capture queues or events, so even if a kernel updates an info variable it is not possible to asynchronously schedule a task conditioned on the outcome of a prior kernel within the SYCL framework.

Furthermore, several oneMKL LAPACK functions do not lend themselves to performant GPU-only implementations and so perform some critical sections on the CPU. While the GPU portions are bound to the context provided by the SYCL queue, the CPU portions generally assume they have unfettered access to CPU resources. For these routines the benefit of asynchronicity is unclear to us.

@JackAKirk
Copy link
Contributor Author

JackAKirk commented Oct 11, 2024

Thanks for the quick reply!

Exception handling of computation errors is not the only blocker for asynchronous behavior. As we understand it, SYCL provides host_task for scheduling CPU tasks with device tasks. A limitation of host_task is that it is undefined behavior to capture queues or events, so even if a kernel updates an info variable it is not possible to asynchronously schedule a task conditioned on the outcome of a prior kernel within the SYCL framework.

oneMKL is a library and does not have to use only the existing sycl 2020 specification. In fact we have already solved this issue for the two backends that it affects via the enqueue_native_command dpc++ extension: please see #572. As I understand it this completely resolves the issue you raise here.

Furthermore, several oneMKL LAPACK functions do not lend themselves to performant GPU-only implementations and so perform some critical sections on the CPU. While the GPU portions are bound to the context provided by the SYCL queue, the CPU portions generally assume they have unfettered access to CPU resources. For these routines the benefit of asynchronicity is unclear to us.

Sure I understand that certain functions (and/or certain backends) may not be able to take advantage of this. However the cusolver and rocsolver backends have a large number of functions to which such limitations do not currently exist; it also sounds like intel backends at least have a few cases that could take advantage of such an improved interface? And I expect that future generations of intel implementations will improve on this current situations?.

@ericlars
Copy link
Contributor

Glad to hear the host_task issues have been worked around, if at least for some backends. We support this change; do you plan on driving the spec update over on https://github.com/uxlfoundation/oneAPI-spec?

@JackAKirk
Copy link
Contributor Author

Glad to hear the host_task issues have been worked around, if at least for some backends. We support this change; do you plan on driving the spec update over on https://github.com/uxlfoundation/oneAPI-spec?

@Ruyk could I work on this? these linear algebra operators are used in pytorch and already they are hooked up to intel python's numpy implementation: https://github.com/IntelPython/dpnp

@Rbiessy
Copy link
Contributor

Rbiessy commented Oct 16, 2024

Thanks for the issue Jack. We won't have time to work on this at Codeplay but external contributions are welcomed to improve this!

@Rbiessy Rbiessy added help wanted Tasks, issues or features that could be implemented and contributed to the project and removed RFC A proposal to add new API labels Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Tasks, issues or features that could be implemented and contributed to the project
Projects
None yet
Development

No branches or pull requests

3 participants