Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented GPU OpenCL runtime #343

Merged
merged 4 commits into from
Oct 1, 2024
Merged

Conversation

AndreyPavlenko
Copy link
Contributor

@AndreyPavlenko AndreyPavlenko commented Sep 14, 2024

How to use:

  // Create a builder. The 'module' argument is an MLIR module
  // with a single function to be executed.
  OclModuleBuilder builder(module);
  // Build the module with one of the build() methods, that takes
  // either runtime (preferred), OpenCL device/context or queue.
  // The module is built for each device/context pair and cached.
  auto mod = gcGetOrReport(builder.build(device, context));
  // Create an execution context. The 'queue' argument is an OpenCL queue.
  OclContext ctx(mod->runtime, queue);
  // Create an executor.
  if (mod->isStatic) {
    // If all the function arguments are memrefs with static shapes
    // use this one.
    StaticExecutor exec(mod);
    // Add the function arguments - aligned memory buffers.
    exec.arg(buf0);
    exec.arg(buf1);
    exec.arg(buf2);
    // Execute the function.
    exec(ctx);

    // Or in a single line
    exec(ctx, buf0, buf1, buf2);
  } else {
    // Dynamic shapes are not currently supported.
    DynamicExecutor exec(mod);
    exec.arg(buf0, 2, shape, strides);
    exec.arg(buf1, 2, shape, strides);
    exec.arg(buf2, 2, shape, strides);
    exec(ctx);
  }

See the unit test.
Depends on #333 and #329

@AndreyPavlenko AndreyPavlenko force-pushed the gc-ocl branch 5 times, most recently from 7b81633 to 48e6f1d Compare September 15, 2024 01:47
@AndreyPavlenko AndreyPavlenko marked this pull request as ready for review September 15, 2024 02:16
@AndreyPavlenko AndreyPavlenko force-pushed the gc-ocl branch 3 times, most recently from 7630169 to 048f66d Compare September 15, 2024 19:09
@AndreyPavlenko AndreyPavlenko force-pushed the gc-ocl branch 8 times, most recently from 44af344 to 37d306f Compare September 25, 2024 00:09
@AndreyPavlenko AndreyPavlenko force-pushed the gc-ocl branch 3 times, most recently from 48dcdc2 to 4efc410 Compare September 25, 2024 21:33
@AndreyPavlenko AndreyPavlenko force-pushed the gc-ocl branch 3 times, most recently from 3ed8730 to 66d8694 Compare September 27, 2024 22:30

constexpr char matmulAddStatic[] = R"mlir(
module @fragment_name attributes {"#dlti.sys_spec" = #dlti.target_system_spec<"CPU" : #dlti.target_device_spec<#dlti.dl_entry<"tile_size", 32 : i32>>>} {
func.func @entry(%arg0: memref<64x128xf32>, %arg1: memref<128x128xf32>, %arg2: memref<64x128xf32>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lowering to xegpu.dpas is not supported with f32, let's change this test to f16

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

src/gc-opt/CMakeLists.txt Show resolved Hide resolved
lib/gc/Transforms/GPU/GpuToGpuOcl.cpp Show resolved Hide resolved
@dchigarev
Copy link
Contributor

I think I'm more or less good with the changes (OV integration works with this runtime). The only thing that keeps me from merging this PR is that we have to temporarily disable XeGPU tests in GC until the gpu-runner is merged.

@AndreyPavlenko is there a way of how we can merge this PR and keep both gc-cpu-runner and GPURuntime tests working? Maybe some temporary option in gc-gpu-pipeline that disables new passes (something like legacyOCLRuntime=true)?

Also, @kurapov-peter, what do you think on merging this one without a tedious review? I think it's the last puzzle piece that keeps us from claiming that OV integration works on GC main branch (technically we also need this one, but we already have an approve there)

Copy link
Contributor

@kurapov-peter kurapov-peter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally OK with the change. I wonder if we really need another logger along with the llvm's one. I'd also prefer to not drop tests. Can we merge the two changes back to back?

@AndreyPavlenko
Copy link
Contributor Author

The runner is already implemented and the tests pass with it. There are not many changes - 9adeab8

@AndreyPavlenko
Copy link
Contributor Author

AndreyPavlenko commented Sep 30, 2024

I wonder if we really need another logger along with the llvm's one

To be honest, I don't like the llvm's logger. This one is easier to use:

gcLogD("This is a debug message");

VS

LLVM_DEBUG(llvm::dbgs() << "This is a debug message\n");

Also, for debug builds, it prints the file and line number, that makes it convenient for in-IDE navigation - single click on a log message navigates to the corresponding line.

[DEBUG] [/path/to/graph-compiler/lib/gc/ExecutionEngine/GPURuntime/ocl/GpuOclRuntime.cpp:432] Created new OpenCL context: 0x560643946d20
[DEBUG] [/path/to/graph-compiler/lib/gc/ExecutionEngine/GPURuntime/ocl/GpuOclRuntime.cpp:507] Created new OpenCL command queue: 0x560642affff0
[DEBUG] [/path/to/graph-compiler/lib/gc/ExecutionEngine/GPURuntime/ocl/GpuOclRuntime.cpp:523] Allocated 16384 bytes of device USM memory: 0xff00fffffffe0000

But, if required, I could integrate this logger with llvm's one. It's quite simple.

Copy link
Contributor

@kurapov-peter kurapov-peter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to remove imex from the default GPU passes target.

@AndreyPavlenko
Copy link
Contributor Author

AndreyPavlenko commented Sep 30, 2024

Maybe some temporary option in gc-gpu-pipeline that disables new passes (something like legacyOCLRuntime=true)?

Added the use-gpu-ocl pipeline option.

@dchigarev dchigarev merged commit ba5cd5d into intel:main Oct 1, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support external queue for kernel submission
3 participants