Implemented GPU OpenCL runtime #343

AndreyPavlenko · 2024-09-14T21:04:00Z

How to use:

  // Create a builder. The 'module' argument is an MLIR module
  // with a single function to be executed.
  OclModuleBuilder builder(module);
  // Build the module with one of the build() methods, that takes
  // either runtime (preferred), OpenCL device/context or queue.
  // The module is built for each device/context pair and cached.
  auto mod = gcGetOrReport(builder.build(device, context));
  // Create an execution context. The 'queue' argument is an OpenCL queue.
  OclContext ctx(mod->runtime, queue);
  // Create an executor.
  if (mod->isStatic) {
    // If all the function arguments are memrefs with static shapes
    // use this one.
    StaticExecutor exec(mod);
    // Add the function arguments - aligned memory buffers.
    exec.arg(buf0);
    exec.arg(buf1);
    exec.arg(buf2);
    // Execute the function.
    exec(ctx);

    // Or in a single line
    exec(ctx, buf0, buf1, buf2);
  } else {
    // Dynamic shapes are not currently supported.
    DynamicExecutor exec(mod);
    exec.arg(buf0, 2, shape, strides);
    exec.arg(buf1, 2, shape, strides);
    exec.arg(buf2, 2, shape, strides);
    exec(ctx);
  }

See the unit test.
Depends on #333 and #329

include/gc/ExecutionEngine/GPURuntime/GpuOclRuntime.h

lib/gc/ExecutionEngine/GPURuntime/ocl/GpuOclRuntime.cpp

test/mlir/test/gc/gpu-runner/lit.local.cfg

dchigarev · 2024-09-30T12:29:46Z

test/mlir/unittests/ExecutionEngine/GPU/GpuOclRuntimeTest.cpp

+
+constexpr char matmulAddStatic[] = R"mlir(
+module @fragment_name attributes {"#dlti.sys_spec" = #dlti.target_system_spec<"CPU" : #dlti.target_device_spec<#dlti.dl_entry<"tile_size", 32 : i32>>>} {
+  func.func @entry(%arg0: memref<64x128xf32>, %arg1: memref<128x128xf32>, %arg2: memref<64x128xf32>) {


lowering to xegpu.dpas is not supported with f32, let's change this test to f16

src/gc-opt/CMakeLists.txt

lib/gc/Transforms/GPU/GpuToGpuOcl.cpp

dchigarev · 2024-09-30T14:47:00Z

I think I'm more or less good with the changes (OV integration works with this runtime). The only thing that keeps me from merging this PR is that we have to temporarily disable XeGPU tests in GC until the gpu-runner is merged.

@AndreyPavlenko is there a way of how we can merge this PR and keep both gc-cpu-runner and GPURuntime tests working? Maybe some temporary option in gc-gpu-pipeline that disables new passes (something like legacyOCLRuntime=true)?

Also, @kurapov-peter, what do you think on merging this one without a tedious review? I think it's the last puzzle piece that keeps us from claiming that OV integration works on GC main branch (technically we also need this one, but we already have an approve there)

kurapov-peter

I'm generally OK with the change. I wonder if we really need another logger along with the llvm's one. I'd also prefer to not drop tests. Can we merge the two changes back to back?

AndreyPavlenko · 2024-09-30T14:59:47Z

The runner is already implemented and the tests pass with it. There are not many changes - 9adeab8

AndreyPavlenko · 2024-09-30T15:40:30Z

I wonder if we really need another logger along with the llvm's one

To be honest, I don't like the llvm's logger. This one is easier to use:

gcLogD("This is a debug message");

VS

LLVM_DEBUG(llvm::dbgs() << "This is a debug message\n");

Also, for debug builds, it prints the file and line number, that makes it convenient for in-IDE navigation - single click on a log message navigates to the corresponding line.

[DEBUG] [/path/to/graph-compiler/lib/gc/ExecutionEngine/GPURuntime/ocl/GpuOclRuntime.cpp:432] Created new OpenCL context: 0x560643946d20
[DEBUG] [/path/to/graph-compiler/lib/gc/ExecutionEngine/GPURuntime/ocl/GpuOclRuntime.cpp:507] Created new OpenCL command queue: 0x560642affff0
[DEBUG] [/path/to/graph-compiler/lib/gc/ExecutionEngine/GPURuntime/ocl/GpuOclRuntime.cpp:523] Allocated 16384 bytes of device USM memory: 0xff00fffffffe0000

But, if required, I could integrate this logger with llvm's one. It's quite simple.

kurapov-peter

We'll need to remove imex from the default GPU passes target.

AndreyPavlenko · 2024-09-30T16:29:47Z

Maybe some temporary option in gc-gpu-pipeline that disables new passes (something like legacyOCLRuntime=true)?

Added the use-gpu-ocl pipeline option.

AndreyPavlenko force-pushed the gc-ocl branch 5 times, most recently from 7b81633 to 48e6f1d Compare September 15, 2024 01:47

AndreyPavlenko marked this pull request as ready for review September 15, 2024 02:16

AndreyPavlenko requested review from kurapov-peter and dchigarev September 15, 2024 02:16

AndreyPavlenko force-pushed the gc-ocl branch 3 times, most recently from 7630169 to 048f66d Compare September 15, 2024 19:09

AndreyPavlenko added the GPU label Sep 15, 2024

dchigarev reviewed Sep 16, 2024

View reviewed changes

kurapov-peter linked an issue Sep 16, 2024 that may be closed by this pull request

Support external queue for kernel submission #230

Closed

AndreyPavlenko force-pushed the gc-ocl branch 5 times, most recently from eba9a73 to 2a2e058 Compare September 17, 2024 00:06

dchigarev mentioned this pull request Sep 17, 2024

OV GPU integration #207

Closed

AndreyPavlenko force-pushed the gc-ocl branch 6 times, most recently from d8daf57 to 8d10a7a Compare September 20, 2024 17:01

AndreyPavlenko mentioned this pull request Sep 23, 2024

Convert a subset of GPU dialect ops to the OpenCL GPU runtime calls #333

Merged

AndreyPavlenko force-pushed the gc-ocl branch 3 times, most recently from 0ebb9d7 to b93b5d9 Compare September 24, 2024 00:00

AndreyPavlenko force-pushed the gc-ocl branch 8 times, most recently from 44af344 to 37d306f Compare September 25, 2024 00:09

dchigarev reviewed Sep 25, 2024

View reviewed changes

test/mlir/test/gc/gpu-runner/lit.local.cfg Outdated Show resolved Hide resolved

AndreyPavlenko force-pushed the gc-ocl branch 3 times, most recently from 48dcdc2 to 4efc410 Compare September 25, 2024 21:33

AndreyPavlenko mentioned this pull request Sep 25, 2024

Implemented GPU runner #362

Merged

AndreyPavlenko force-pushed the gc-ocl branch 3 times, most recently from 3ed8730 to 66d8694 Compare September 27, 2024 22:30

dchigarev reviewed Sep 30, 2024

View reviewed changes

kurapov-peter reviewed Sep 30, 2024

View reviewed changes

kurapov-peter approved these changes Sep 30, 2024

View reviewed changes

AndreyPavlenko added 2 commits September 30, 2024 16:23

Implemented GPU OpenCL runtime

d6f583f

Changed f32 to f16

b6ed40b

AndreyPavlenko force-pushed the gc-ocl branch from c055fad to b6ed40b Compare September 30, 2024 16:24

AndreyPavlenko added 2 commits September 30, 2024 19:04

Added option, that allows to disable GpuToGpuOcl path

d801150

Added debug logs configuration

3325a1b

dchigarev approved these changes Oct 1, 2024

View reviewed changes

dchigarev merged commit ba5cd5d into intel:main Oct 1, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented GPU OpenCL runtime #343

Implemented GPU OpenCL runtime #343

AndreyPavlenko commented Sep 14, 2024 •

edited

Loading

dchigarev Sep 30, 2024

AndreyPavlenko Sep 30, 2024

dchigarev commented Sep 30, 2024

kurapov-peter left a comment

AndreyPavlenko commented Sep 30, 2024

AndreyPavlenko commented Sep 30, 2024 •

edited

Loading

kurapov-peter left a comment

AndreyPavlenko commented Sep 30, 2024 •

edited

Loading

Implemented GPU OpenCL runtime #343

Implemented GPU OpenCL runtime #343

Conversation

AndreyPavlenko commented Sep 14, 2024 • edited Loading

dchigarev Sep 30, 2024

Choose a reason for hiding this comment

AndreyPavlenko Sep 30, 2024

Choose a reason for hiding this comment

dchigarev commented Sep 30, 2024

kurapov-peter left a comment

Choose a reason for hiding this comment

AndreyPavlenko commented Sep 30, 2024

AndreyPavlenko commented Sep 30, 2024 • edited Loading

kurapov-peter left a comment

Choose a reason for hiding this comment

AndreyPavlenko commented Sep 30, 2024 • edited Loading

AndreyPavlenko commented Sep 14, 2024 •

edited

Loading

AndreyPavlenko commented Sep 30, 2024 •

edited

Loading

AndreyPavlenko commented Sep 30, 2024 •

edited

Loading