Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add macOS support #602

Draft
wants to merge 19 commits into
base: main
Choose a base branch
from
Draft

Add macOS support #602

wants to merge 19 commits into from

Conversation

JayFoxRox
Copy link

@JayFoxRox JayFoxRox commented Aug 27, 2023

Introduction

This adds macOS support in chipStar and its underlying infrastructure (LLVM, HIP, HIPCC).
To make this even more interesting, this is on M1 (Apple Silicon), e.g. ARM.

I assume this will take a while to get merged.
For now, I hope that others can help to fix Windows / Linux things that I break in this macOS port.

OpenCL / SPIR-V on macOS

Obviously there isn't a metal backend currently (and I don't plan to write one), although I theorize it might be much easier to write once this has been merged.

This probably won't work with the Apple OpenCL implementation due to the lack of SPIR-V support.
Instead, I added support for the OpenCL ICD loader and I use a hacked version of pocl from brew for which I enabled SPIR-V support.
I don't care about performance right now either - all I want is to run some CUDA and HIP projects on macOS.. so chipStar it is!

Installation

Once everything is merged, I plan to create a Formula for brew (in homebrew-core), so this can be installed more easily (brew install chipstar).
However, I expect this can only happen once more of the chipStar changes in HIP, HIPCC and LLVM have been integrated in their respective upstreams (so that we don't need the forks in brew).
We could probably have a CHIP-SPV/homebrew-chipStar or similar though, to distribute this to macOS users.

For now, if you want to test this (on macOS):

Cloning

Follow steps in README for structure, but clone this branch.
For each submodule, check the TODO section below and checkout the corresponding branches.

Also see the note about pocl in the next section.

Install dependencies

# Install OpenCL base
brew install opencl-headers
brew install opencl-icd-loader

# Install pocl with SPIR-V support
brew install spirv-llvm-translator
brew install pocl # Needs SPIR-V support, see below

Set up some config:

export J=8
export CHIPSTAR_LLVM_INSTALL=/tmp/chipstar-llvm-install/
export CHIPSTAR_INSTALL=/tmp/chipstar-install/
export SDK_PATH=$(xcrun --show-sdk-path)
export BUILD_TYPE=Debug
export BUILD_NAME=build-debug

Build LLVM:

cmake -S llvm -B $BUILD_NAME \
  -DCMAKE_BUILD_TYPE=$BUILD_TYPE \
  -DLLVM_ENABLE_PROJECTS="clang;openmp" \
  -DLLVM_TARGETS_TO_BUILD="host" \
  -DDEFAULT_SYSROOT=$SDK_PATH \
  -DCMAKE_INSTALL_PREFIX=$CHIPSTAR_LLVM_INSTALL
make -C $BUILD_NAME -j $J all install

Build chipStar:

mkdir $BUILD_NAME && cd $BUILD_NAME
cmake .. \
    -DCMAKE_BUILD_TYPE=$BUILD_TYPE \
    -DLLVM_CONFIG_BIN=$CHIPSTAR_LLVM_INSTALL/bin/llvm-config \
    -DCMAKE_INSTALL_PREFIX=$CHIPSTAR_INSTALL \
    "-DCMAKE_PREFIX_PATH=$HOMEBREW_PREFIX/opt/opencl-icd-loader/share/cmake;$HOMEBREW_PREFIX/opt/opencl-headers/share/cmake/"
make all build_tests install -j $J

Testing:

These are mostly notes to myself, but this is how to test:

CHIP_LOGLEVEL=trace $CHIPSTAR_INSTALL/bin/chip_spv_samples/hipmath
make CHIP && bin/hipcc.bin ../samples/hipmath/hipmath.cc && POCL_DEBUG=all POCL_LEAVE_KERNEL_COMPILER_TEMP_FILES=1 CHIP_DUMP_SPIRV=1 ./a.out

Status

  • Compiles
  • Tests run, (but results not verified yet)
  • Samples run, (but results not verified yet)
  • Never used in a real application

TODO

#define CL_DEPRECATED(...)
#define GCL_API_SUFFIX__VERSION_1_1
//#include <OpenCL/opencl.h>
#include <CL/opencl.h>
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably be changed back; I believe ICD Loader also overloads <OpenCL/opencl.h>

list(APPEND CHIP_SPV_DEFINITIONS HAVE_OPENCL)
list(PREPEND CHIP_INTERFACE_LIBS ${OpenCL_LIBRARY})
list(PREPEND CHIP_INTERFACE_LIBS
#OpenCL::Headers
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deadcode

@@ -114,10 +140,10 @@ if(NOT DEFINED LevelZero_LIBRARY)
endif()
endif()

message(STATUS "OpenCL_LIBRARY: ${OpenCL_LIBRARY}")
message(STATUS "OpenCL_FOUND: ${OpenCL_FOUND}")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is obviously ugly; I'll have to figure out how to show the path with import targets.
Maybe it's also enough to report YES / NO for each of these.

find_package(OpenCLHeaders)
find_package(OpenCLICDLoader)

#add_library(OpenCL ALIAS OpenCLICDLoader)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code

#add_library(OpenCL ALIAS OpenCLICDLoader)
message("OpenCL_FOUND: ${OpenCL_FOUND}")
set(OpenCL_FOUND ${OpenCLICDLoader_FOUND})
message("OpenCL_FOUND: ${OpenCL_FOUND}")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug leftover

set(ARCH "x64")
else()
set(ARCH "unknown")
endif()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire block is super ugly. I'm not sure how to do this best

add_compile_options(-mf16c)
elseif("${CMAKE_C_COMPILER_ARCHITECTURE_ID}" STREQUAL "x64")
add_compile_options(-mf16c)
endif()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How important is -mf16c for x86? Could it be removed? It's in a "temporary" block, too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall, the -mf16c is required for host side native half type and operations (hip_fp16.h).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction: the -mf16c is used for working around an undefined reference issue. However, the same issue is seen here too despite the workaround so I’m not sure if the option is actually effective.

@@ -52,4 +52,7 @@
typedef _Float16 api_half;
typedef _Float16 api_half2 __attribute__((ext_vector_type(2)));

typedef unsigned int uint;
typedef unsigned long ulong;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary on macOS? Does it still work for other platforms or will it report duplicate identifier?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might cause troubles. On my system (Ubuntu 22.04) <stdlib.h> includes <sys/types.h> which defines typedefs for u{short,int,long}.

/usr/include/x86_64-linux-gnu/sys/types.h:

...
#ifdef __USE_MISC
/* Old compatibility names for C types.  */
typedef unsigned long int ulong;
typedef unsigned short int ushort;
typedef unsigned int uint;
#endif
...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's not a problem as long as the duplicate typedefs aliases the same type.

target_link_libraries(LLVMHipDynMem "-undefined dynamic_lookup")
target_link_libraries(LLVMHipStripUsedIntrinsics "-undefined dynamic_lookup")
target_link_libraries(LLVMHipDefrost "-undefined dynamic_lookup")
target_link_libraries(LLVMHipPasses "-undefined dynamic_lookup")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should potentially be set_target_properties(LLVMHipDynMem PROPERTIES LINK_FLAGS "-undefined dynamic_lookup) etc.; I'll have to read up on it.

I'm not sure why this is necessary in the first place (I think it should be the default for MODULE libs)

Works for now.

InverseCallGraphNode *N = popAny(WorkList);
if (N == nullptr) {
printf("N was nullptr! skipping\n");
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets triggered in the abort sample. Why?
Is this expected or a macOS issue?

@@ -129,8 +129,14 @@ set(SAMPLES
ccompat
hipComplex
hipHostMallocSample
)
# We skip this test on macOS, because macho does not support EXCLUDE section flags (used by clang-offload-bundler)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to fix clang-offload-bundler eventually, but I'm new to most of LLVM, mach-o, HIP, OpenCL and chipStar.
So I'll have to understand why this is necessary in the first place / what exactly it does.

@@ -1257,7 +1257,12 @@ void chipstar::Backend::waitForThreadExit() {
*
* So we just wait for 0.5 seconds before starting to check for thread exit.
*/
#if defined(__APPLE__) || defined(__MACOSX)
sched_yield();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Untested

@@ -77,7 +77,7 @@ void CHIPReadEnvVarsCallOnce() {

CHIPDeviceTypeStr = readEnvVar("CHIP_DEVICE_TYPE");
if (CHIPDeviceTypeStr.size() == 0)
CHIPDeviceTypeStr = "gpu";
CHIPDeviceTypeStr = "default";
Copy link
Author

@JayFoxRox JayFoxRox Aug 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After spending half an hour until realizing pocl was missing SPIR-V, I've spent another half an hour trying to figure out why chipStar still didn't work, until I realized it was filtering out my CPU device.
I've tried to find a good solution but the CL spec was lacking. Would "default" type still work for most people? (or will they be surprised if their OpenCL platform suddenly kicks them from CPU to GPU after a chipStar update?)

launchUnaryFn<float>([] __device__(auto x) { return std::abs(x); });
#endif
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting warnings about this being ambiguous, I can add a log with these errors later.
Why isn't this an issue for other platforms?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a log with these errors later.

Please, send us the log.

@pvelesko
Copy link
Collaborator

cool! I will try this on my m1 this week

@pjaaskel
Copy link
Collaborator

Nice! Thanks for working on this. For the "Metal backend" -- you might do more good contributing that backend to PoCL instead. You'd get at least SYCL support (via DPC++ or OpenSYCL) too that way as a side effect.

@Kerilk
Copy link
Contributor

Kerilk commented Aug 28, 2023

@JayFoxRox Thank you so much for this contribution! We discussed it extensively this morning,and we would very much like to merge it when it is ready. I order to do so we will need CI support.
Our plan here is to start leveraging GitHub action to test CPU pocl for Ubuntu, and then we should be able to extend it for MacOS. This should prevent things from bit-rotting without adding additional burden on the few GPU machines we have available.

@JayFoxRox
Copy link
Author

JayFoxRox commented Aug 28, 2023

For the "Metal backend" -- you might do more good contributing that backend to PoCL instead. You'd get at least SYCL support (via DPC++ or OpenSYCL) too that way as a side effect.

Yes, pocl integration was also what I was thinking.
I also toyed around with spirv-cross and spirv-tools earlier today and noticed that they recently added support for SPIR-V compute to metal compute. This means that even much of the SPIR-V stuff is already done!

However, for chipStar, it might be interesting to add a Vulkan backend, because that would probably work with MoltenVK, so you'd get the rest of the metal API "for-free" + you'd also get support for a bunch of Vulkan devices which lack OpenCL (or proper drivers).
Even then you might still be able to do SYCL (unless I'm missing something - didn't read much into SYCL yet):

  • SYCL application > OpenSYCL > OpenSYCL-HIP-Backend > chipStar > chipStar-Vulkan-Backend > Vulkan intermediate > MoltenVK > Metal

I probably won't find time to ever work on that. I'm fine with having CPU-only (for now). For me, it's mostly about getting a bit into HIP/CUDA + I want to run some projects which are CUDA only.

cool! I will try this on my m1 this week

Note that it doesn't really work yet, but patches are welcome.
I still have to figure out how to resolve (or even reproduce with OpenCL) the mangling issues in pocl/pocl#1288.

Our plan here is to start leveraging GitHub action to test CPU pocl for Ubuntu, and then we should be able to extend it for MacOS.

Sounds good.

@pjaaskel
Copy link
Collaborator

However, for chipStar, it might be interesting to add a Vulkan backend, because that would probably work with MoltenVK, so you'd get the rest of the metal API "for-free" + you'd also get support for a bunch of Vulkan devices which lack OpenCL (or proper drivers). Even then you might still be able to do SYCL (unless I'm missing something - didn't read much into SYCL yet):

I forgot that there's already also a starting point PoCL-Vulkan backend (quite early and primitive) that could be improved for MacOS needs if it supports Vulkan.

* SYCL application > OpenSYCL > OpenSYCL-HIP-Backend > chipStar > chipStar-Vulkan-Backend > Vulkan intermediate > MoltenVK > Metal

Why not... To me HIP is a higher level than OpenCL though so I'd simplify this to SYCL -> DPC++/OpenSYCL -> OpenCL -> X instead of hopping through HIP for SYCL support.

@isuruf
Copy link

isuruf commented Sep 7, 2023

I've tried the pocl-vulkan backend with macos-m1, but couldn't get it to work. I also tried writing a Metal backend for PoCL, and made some progress, but the Metal documentation was very hard to digest. (For eg: how to use events)

@pjaaskel
Copy link
Collaborator

pjaaskel commented Sep 7, 2023

@isuruf is Metal backend starting point usable at all? I suggest to contribute it to the main if it runs any examples so it's easier for other volunteers to contribute to make it better.

@JayFoxRox
Copy link
Author

I've tried the pocl-vulkan backend with macos-m1, but couldn't get it to work. I also tried writing a Metal backend for PoCL

So far, I've only tried the CPU backend, but I ran into a bunch of issues with PoCL (partially issues in the brew package, but also in PoCL itself) which is also why there hasn't been much progress here. Until there's a solution for pocl/pocl#1288 I can't really continue on the chipStar side either - I can't really know if the issues I'm seeing are from PoCL or from chipStar.
However, I don't have time to also work on on PoCL and am not sure how to continue.

@isuruf
Copy link

isuruf commented Sep 7, 2023

@isuruf is Metal backend starting point usable at all?

No, it's not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants