Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mlir_tblgen is broken for cross compile #1094

Closed
powderluv opened this issue Jul 21, 2022 · 36 comments
Closed

mlir_tblgen is broken for cross compile #1094

powderluv opened this issue Jul 21, 2022 · 36 comments
Assignees

Comments

@powderluv
Copy link
Collaborator

powderluv commented Jul 21, 2022

Upstream MLIR has some quirks exporting MLIR_TBLGEN into the CMake PARENT_SCOPE https://github.com/llvm/llvm-project/blob/07b749800c5cd4105d49ab46be5f0a2079dd709a/mlir/CMakeLists.txt#L151-L156

Some of the tools like mlir-linalg-ods-yaml-gen do the right thing for cross compile with https://github.com/llvm/llvm-project/blob/27945f9282030136cb8b043b91b229ea2758c9ed/mlir/tools/mlir-linalg-ods-gen/CMakeLists.txt#L23-L35

We have some hacks on top of the "directly include tools/ in the top level mlir/CMakeLists.txt" hack here:

# In-tree build with LLVM_EXTERNAL_PROJECTS=torch-mlir
# FIXME: This should really be inherited from the LLVM tree. In particular,
# it's going to change when cross-compiling.
set(MLIR_TABLEGEN_EXE mlir-tblgen)
if (TORCH_MLIR_ENABLE_MHLO)
set(MLIR_PDLL_TABLEGEN_EXE mlir-pdll)
endif()

This causes ARM64 cross compile for Apple M1 to fail on the MacOS Github Actions Runner like https://github.com/llvm/torch-mlir/runs/7441128393?check_suite_focus=true

cd /Users/runner/work/torch-mlir/torch-mlir/build && /Users/runner/work/torch-mlir/torch-mlir/build/bin/mlir-tblgen -gen-pass-decls -DTORCH_MLIR_ENABLE_MHLO -I /Users/runner/work/torch-mlir/torch-mlir/include/torch-mlir/Conversion -I/Users/runner/work/torch-mlir/torch-mlir/build/include -I/Users/runner/work/torch-mlir/torch-mlir/externals/llvm-project/llvm/include -I/Users/runner/work/torch-mlir/torch-mlir/externals/mlir-hlo/include -I/Users/runner/work/torch-mlir/torch-mlir/build/tools/torch-mlir/mlir-hlo/include -I/Users/runner/work/torch-mlir/torch-mlir/externals/llvm-project/llvm/../mlir/include -I/Users/runner/work/torch-mlir/torch-mlir/build/tools/mlir/include -I/Users/runner/work/torch-mlir/torch-mlir/include -I/Users/runner/work/torch-mlir/torch-mlir/build/tools/torch-mlir/include /Users/runner/work/torch-mlir/torch-mlir/include/torch-mlir/Conversion/Passes.td --write-if-changed -o tools/torch-mlir/include/torch-mlir/Conversion/Passes.h.inc -d tools/torch-mlir/include/torch-mlir/Conversion/Passes.h.inc.d
/bin/sh: /Users/runner/work/torch-mlir/torch-mlir/build/bin/mlir-tblgen: Bad CPU type in executable 
@jpienaar
Copy link
Member

And locally on an M1 this should repro with

mkdir build
  cd build
  cmake $GITHUB_WORKSPACE/externals/llvm-project/llvm -GNinja \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_LINKER=lld \
    -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
    -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ \
    -DPython3_EXECUTABLE=$(which python) \
    -DLLVM_ENABLE_ASSERTIONS=ON \
    -DLLVM_ENABLE_PROJECTS=mlir \
    -DLLVM_EXTERNAL_PROJECTS="torch-mlir;torch-mlir-dialects" \
    -DLLVM_EXTERNAL_TORCH_MLIR_SOURCE_DIR="$GITHUB_WORKSPACE" \
    -DLLVM_EXTERNAL_TORCH_MLIR_DIALECTS_SOURCE_DIR="${GITHUB_WORKSPACE}/external/llvm-external-projects/torch-mlir-dialects" \
    -DMLIR_ENABLE_BINDINGS_PYTHON=ON \
    -DTORCH_MLIR_USE_INSTALLED_PYTORCH=OFF \
    -DCMAKE_OSX_ARCHITECTURES=arm6[4](https://github.com/llvm/torch-mlir/runs/7441128393?check_suite_focus=true#step:4:4) \
    -DMACOSX_DEPLOYMENT_TARGET=12.0 \
    -DLLVM_TARGETS_TO_BUILD=AArch[6](https://github.com/llvm/torch-mlir/runs/7441128393?check_suite_focus=true#step:4:6)4 \
    -DLLVM_USE_HOST_TOOLS=ON \
    -DLLVM_TARGETS_TO_BUILD=host
  ninja

?

@powderluv
Copy link
Collaborator Author

on an M1 you want to build x86_64 then :D let me try on my m1 and post a command

@jpienaar
Copy link
Member

Thanks!

@powderluv
Copy link
Collaborator Author

if you don't have Rosetta you can do:

(mlir_venv) anush@MacBook-Pro build % cmake externals/llvm-project/llvm -B build -GNinja \                                                
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_LINKER=lld \
    -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ \
    -DPython3_EXECUTABLE=$(which python) \
    -DLLVM_ENABLE_ASSERTIONS=ON \
    -DLLVM_ENABLE_PROJECTS=mlir \
    -DLLVM_EXTERNAL_PROJECTS="torch-mlir;torch-mlir-dialects" \
    -DLLVM_EXTERNAL_TORCH_MLIR_SOURCE_DIR=`pwd` \
    -DLLVM_EXTERNAL_TORCH_MLIR_DIALECTS_SOURCE_DIR=`pwd`/external/llvm-external-projects/torch-mlir-dialects \
    -DMLIR_ENABLE_BINDINGS_PYTHON=ON \
    -DTORCH_MLIR_USE_INSTALLED_PYTORCH=OFF \
    -DCMAKE_OSX_ARCHITECTURES=x86_64 \
    -DMACOSX_DEPLOYMENT_TARGET=12.0 \
    -DLLVM_TARGETS_TO_BUILD=X86 \
    -DLLVM_USE_HOST_TOOLS=ON

Disabling Rosetta 2 can be complicated I think. (from https://developer.apple.com/forums/thread/669486)

Obtain a list of files/directories and LaunchAgents with: pkgutil --files com.apple.pkg.RosettaUpdateAuto
Save them in a way that you can access them in the recovery
Boot into recovery
Open terminal in recovery (btw to load recovery on M1 macs you long press the power button instead of holding CMD+R)
Run csrutil disable and confirm (temporary disable SIP)
Reboot
Delete the files listed at step 1 (in my case it was enough to delete/Library/Apple/usr/share/rosetta and /Library/Apple/usr/libexec with all their contents)
Reboot back to recovery terminal
Run csrutil enable and confirm

@powderluv
Copy link
Collaborator Author

Looks like the native tools are being built:

(mlir_venv) anush@MacBook-Pro build % file NATIVE/bin/mlir-tblgen 
NATIVE/bin/mlir-tblgen: Mach-O 64-bit executable arm64
(mlir_venv) anush@MacBook-Pro build % file bin/mlir-tblgen 
bin/mlir-tblgen: Mach-O 64-bit executable x86_64

so may just need to set the right version when -DLLVM_USE_HOST_TOOLS=ON

@powderluv
Copy link
Collaborator Author

@jpienaar made an easier recreate for you on Ubuntu / Linux:

Install ARM cross-compile toolchain:

sudo apt-get install gcc-arm-linux-gnueabihf g++-arm-linux-gnueabihf

Build torch-mlir with:

ubuntu:~/github/torch-mlir$ cmake -GNinja -Bbuild   -DCMAKE_BUILD_TYPE=Release   -DCMAKE_C_COMPILER=arm-linux-gnueabihf-gcc   -DCMAKE_CXX_COMPILER=arm-linux-gnueabihf-g++   -DPython3_FIND_VIRTUALENV=ONLY   -DLLVM_ENABLE_PROJECTS=mlir   -DLLVM_EXTERNAL_PROJECTS="torch-mlir;torch-mlir-dialects"   -DLLVM_EXTERNAL_TORCH_MLIR_SOURCE_DIR=`pwd`   -DLLVM_EXTERNAL_TORCH_MLIR_DIALECTS_SOURCE_DIR=`pwd`/externals/llvm-external-projects/torch-mlir-dialects   -DMLIR_ENABLE_BINDINGS_PYTHON=OFF -DTORCH_MLIR_USE_INSTALLED_PYTORCH=OFF  -DLLVM_TARGETS_TO_BUILD=AArch64 -DLLVM_USE_HOST_TOOLS=ON externals/llvm-project/llvm && cmake --build build --target check-torch-mlir-all

it will fail with:

[464/2371] Building passes.h.inc...
FAILED: tools/torch-mlir/mlir-hlo/include/mlir-hlo/Dialect/gml_st/transforms/passes.h.inc /home/anush/github/torch-mlir/build/tools/torch-mlir/mlir-hlo/include/mlir-hlo/Dialect/gml_st/transforms/passes.h.inc 
cd /home/anush/github/torch-mlir/build && /home/anush/github/torch-mlir/build/bin/mlir-tblgen -gen-pass-decls -name GmlSt -I /home/anush/github/torch-mlir/externals/mlir-hlo/include/mlir-hlo/Dialect/gml_st/transforms -I/home/anush/github/torch-mlir/build/include -I/home/anush/github/torch-mlir/externals/llvm-project/llvm/include -I/home/anush/github/torch-mlir/externals/llvm-project/llvm/../mlir/include -I/home/anush/github/torch-mlir/build/tools/mlir/include -I/home/anush/github/torch-mlir/externals/llvm-project/llvm/../mlir/include -I/home/anush/github/torch-mlir/build/tools/mlir/include -I/home/anush/github/torch-mlir/externals/mlir-hlo/include -I/home/anush/github/torch-mlir/build/tools/torch-mlir/mlir-hlo/include -I/home/anush/github/torch-mlir/build/tools/torch-mlir/mlir-hlo /home/anush/github/torch-mlir/externals/mlir-hlo/include/mlir-hlo/Dialect/gml_st/transforms/passes.td --write-if-changed -o tools/torch-mlir/mlir-hlo/include/mlir-hlo/Dialect/gml_st/transforms/passes.h.inc -d tools/torch-mlir/mlir-hlo/include/mlir-hlo/Dialect/gml_st/transforms/passes.h.inc.d
/bin/sh: 1: /home/anush/github/torch-mlir/build/bin/mlir-tblgen: Exec format error
ninja: build stopped: subcommand failed.

You can see the NATIVE binary is not built

ubuntu:~/github/torch-mlir$ find build  -name mlir-tblgen  | xargs file
build/tools/mlir/tools/mlir-tblgen:        directory
build/bin/mlir-tblgen:                     ELF 32-bit LSB pie executable, ARM, EABI5 version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=d15ff4888c68e3084c3e26e9c1961595954bb549, for GNU/Linux 3.2.0, not stripped
build/NATIVE/tools/mlir/tools/mlir-tblgen: directory

@jpienaar
Copy link
Member

Great, I'm in a PC meeting today but will try checking soon. I looked a bit yesterday and did see add_tablegen considered cross compilation, so seems likely it is "nesting" where a different setting is needed.

@marbre
Copy link
Member

marbre commented Jul 22, 2022

Seems related to what is described in https://llvm.org/docs/HowToCrossCompileLLVM.html:

The TableGen options are required to compile it with the host compiler, so you’ll need to compile LLVM (or at least llvm-tblgen) to your host platform before you start.

When cross-compiling IREE for bare-metal Arm, we need a two stage approach as well to compile the host tools and to compile for the target platform.

@powderluv
Copy link
Collaborator Author

@marbre yeah but it should be automatic for LLVM projects with -DLLVM_USE_HOST_TOOLS=ON . I think if we fix the "directly include tools/ in the top level mlir/CMakeLists.txt" hack it should automatically work.

@marbre
Copy link
Member

marbre commented Jul 26, 2022

Build torch-mlir with:

ubuntu:~/github/torch-mlir$ cmake -GNinja -Bbuild   -DCMAKE_BUILD_TYPE=Release   -DCMAKE_C_COMPILER=arm-linux-gnueabihf-gcc   -DCMAKE_CXX_COMPILER=arm-linux-gnueabihf-g++   -DPython3_FIND_VIRTUALENV=ONLY   -DLLVM_ENABLE_PROJECTS=mlir   -DLLVM_EXTERNAL_PROJECTS="torch-mlir;torch-mlir-dialects"   -DLLVM_EXTERNAL_TORCH_MLIR_SOURCE_DIR=`pwd`   -DLLVM_EXTERNAL_TORCH_MLIR_DIALECTS_SOURCE_DIR=`pwd`/externals/llvm-external-projects/torch-mlir-dialects   -DMLIR_ENABLE_BINDINGS_PYTHON=OFF -DTORCH_MLIR_USE_INSTALLED_PYTORCH=OFF  -DLLVM_TARGETS_TO_BUILD=AArch64 -DLLVM_USE_HOST_TOOLS=ON externals/llvm-project/llvm && cmake --build build --target check-torch-mlir-all

I am quite certain that this one has to fail, since you're passing -DCMAKE_C_COMPILER=arm-linux-gnueabihf-gcc -DCMAKE_CXX_COMPILER=arm-linux-gnueabihf-g++.

@marbre yeah but it should be automatic for LLVM projects with -DLLVM_USE_HOST_TOOLS=ON .

I would need to take a closer look to what is behind. I might have some time this evening.

I think if we fix the "directly include tools/ in the top level mlir/CMakeLists.txt" hack it should automatically work.

Can you elaborate on this? Do suggest you shuffle https://github.com/llvm/llvm-project/blob/28e665fa054d62d4e2c777774cc83dea533dfe6e/mlir/CMakeLists.txt#L154 around?
Edit: I think you mean a fix similar to https://github.com/llvm/llvm-project/blob/27945f9282030136cb8b043b91b229ea2758c9ed/mlir/tools/mlir-linalg-ods-gen/CMakeLists.txt#L23-L35 for mlir-tblgen.

@powderluv
Copy link
Collaborator Author

Yes the latter.

I think you mean a fix similar to https://github.com/llvm/llvm-project/blob/27945f9282030136cb8b043b91b229ea2758c9ed/mlir/tools/mlir-linalg-ods-gen/CMakeLists.txt#L23-L35 for mlir-tblgen.

@jpienaar
Copy link
Member

https://reviews.llvm.org/D130350 relevant here.

@powderluv
Copy link
Collaborator Author

I don't think that fixes the issue. With the ARM recreate above you still get:

FAILED: include/llvm/IR/IntrinsicsR600.h /home/anush/github/torch-mlir/build/include/llvm/IR/IntrinsicsR600.h 
cd /home/anush/github/torch-mlir/build && /home/anush/github/torch-mlir/build/NATIVE/bin/llvm-tblgen -gen-intrinsic-enums -intrinsic-prefix=r600 -I /home/anush/github/torch-mlir/externals/llvm-project/llvm/include/llvm/IR -I/home/anush/github/torch-mlir/build/include -I/home/anush/github/torch-mlir/externals/llvm-project/llvm/include /home/anush/github/torch-mlir/externals/llvm-project/llvm/include/llvm/IR/Intrinsics.td --write-if-changed -o include/llvm/IR/IntrinsicsR600.h -d include/llvm/IR/IntrinsicsR600.h.d
/bin/sh: 1: /home/anush/github/torch-mlir/build/NATIVE/bin/llvm-tblgen: Exec format error
ninja: build stopped: subcommand failed.
1 anush@nod-shared-a100-ubuntu:~/github/torch-mlir$ file /home/anush/github/torch-mlir/build/NATIVE/bin/llvm-tblgen
/home/anush/github/torch-mlir/build/NATIVE/bin/llvm-tblgen: ELF 32-bit LSB pie executable, ARM, EABI5 version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=eddee09609986fae38bfca74602ec8def8646241, for GNU/Linux 3.2.0, not stripped

@marbre
Copy link
Member

marbre commented Jul 28, 2022

I don't think that fixes the issue. With the ARM recreate above you still get:

Do you still pass -DCMAKE_C_COMPILER=arm-linux-gnueabihf-gcc -DCMAKE_CXX_COMPILER=arm-linux-gnueabihf-g++?

Anyway, I can offer to take over and look into the issue in more detail next week (oof tomorrow). I already digged into the cross-compiling mechanism LLVM relies on and into add_tablegen().

@powderluv
Copy link
Collaborator Author

Yes I did pass that and still no luck

@powderluv
Copy link
Collaborator Author

Thank you for offering to take over.

@jpienaar
Copy link
Member

Indeed, thanks!

@marbre marbre self-assigned this Jul 28, 2022
@marbre
Copy link
Member

marbre commented Jul 28, 2022

Yes I did pass that and still no luck

Well, one thing is that arm-none-linux-gnueabihf-g++ won't be able to produce the native binaries for x86. It only supports Arm targets. So you will need to use a compiler that has target support for Arm + x86 (e.g. a Clang build with multiple targets enabled) or you'll need a multi-stage compilation.

Looking into solving this issue next week :)

@powderluv
Copy link
Collaborator Author

With the host tools flag set LLVM builds native tools (the mlir_linalg_ods_gen_yaml) handles this well.

@powderluv
Copy link
Collaborator Author

@marbre any luck with this issue ?

@u99127
Copy link
Contributor

u99127 commented Aug 4, 2022

arm-linux-gnueabihf-gcc produces AArch32 code from a cross compiler suitable for executing in AArch32 ISA state under a Linux environment with glibc. Is that what you are looking for here ?

Ramana

@powderluv
Copy link
Collaborator Author

I think the arm-linux was just an easy repro case for folks to try out without requiring a macOS install. In all cases MLIR_TABLEGEN doesn't seem to respect -DLLVM_USE_HOST_TOOLS=ON like mlir-linalg-ods-yaml-gen does. That requires us to build host tools once (~3500+ files) and then use the host tools to cross compile (another ~3500 files).

A fix was attempted with https://reviews.llvm.org/D130350 but that doesn't fix it.

@u99127
Copy link
Contributor

u99127 commented Aug 4, 2022

Ah I see - sorry about the noise.

Ramana

@marbre
Copy link
Member

marbre commented Aug 5, 2022

@marbre any luck with this issue ?

I am still on it.

In all cases MLIR_TABLEGEN doesn't seem to respect -DLLVM_USE_HOST_TOOLS=ON like mlir-linalg-ods-yaml-gen does.

@powderluv What behavior exactly do you expect when setting LLVM_USE_HOST_TOOLS to ON? Also mlir-linalg-ods-yaml-gen calls build_native_tool and therefore should build the native tool, so I am not sure I fully understand what you think the expected behavior should look like.

@powderluv
Copy link
Collaborator Author

powderluv commented Aug 5, 2022

When -DLLVM_USE_HOST_TOOLS=ON any required host tools should be built with build_native_tool() and the cross-compile here: #1094 (comment) should just work and not fail. The mlir-linalg-ods-yaml-gen tool is built correctly and works as expected using host tool and cross-compile. However mlir_tblgen sets up the NATIVE tools correctly but somehow it is not built correctly.

So expected output of

ubuntu:~/github/torch-mlir$ cmake -GNinja -Bbuild   -DCMAKE_BUILD_TYPE=Release   -DCMAKE_C_COMPILER=arm-linux-gnueabihf-gcc   -DCMAKE_CXX_COMPILER=arm-linux-gnueabihf-g++   -DPython3_FIND_VIRTUALENV=ONLY   -DLLVM_ENABLE_PROJECTS=mlir   -DLLVM_EXTERNAL_PROJECTS="torch-mlir;torch-mlir-dialects"   -DLLVM_EXTERNAL_TORCH_MLIR_SOURCE_DIR=`pwd`   -DLLVM_EXTERNAL_TORCH_MLIR_DIALECTS_SOURCE_DIR=`pwd`/externals/llvm-external-projects/torch-mlir-dialects   -DMLIR_ENABLE_BINDINGS_PYTHON=OFF -DTORCH_MLIR_USE_INSTALLED_PYTORCH=OFF  -DLLVM_TARGETS_TO_BUILD=AArch64 -DLLVM_USE_HOST_TOOLS=ON externals/llvm-project/llvm && cmake --build build --target check-torch-mlir-all

Should pass and build for ARM but it will fail trying to run arm mlir_tblgen on x86_64.

@powderluv
Copy link
Collaborator Author

Ok I did some digging into it:

-DLLVM_USE_HOST_TOOLS=ON does the right thing for OSX and builds the NATIVE tools. It doesn't do the right thing on the linux recreate I posted above. I think that is because it expects a toolchain cmake file like: https://github.com/llvm/llvm-project/blob/main/llvm/cmake/platforms/Android.cmake. We can revisit the linux_x86_64 --> linux_arm64 later and test it with the correct toolchain.cmake.

On OSX (the original issue reported here): I think it just comes down to MLIR_TABLEGEN being exposed to downstream projects. Setting the CACHE will help if we set it for all tools (like PDLL etc). I will give it a try and if it is easy will post a PR.

@jpienaar
Copy link
Member

jpienaar commented Aug 7, 2022

There is also patch under review that sets the install directories of these more correctly, could be unrelated, but seems many folks hitting related pains here. Setting pdll in cache SGTM.

@powderluv
Copy link
Collaborator Author

ok more debug info:

When we run cmake we set the correct values

...
-- Setting MLIR_TABLEGEN_EXE to /Users/anush/github/torch-mlir/build/NATIVE/bin/mlir-tblgen                                  
-- Setting MLIR_TABLEGEN_TARGET to MLIR-tablegen-host                                                                        
-- Setting MLIR_PDLL_TABLEGEN_EXE to /Users/anush/github/torch-mlir/build/NATIVE/bin/mlir-pdll   

and then when we are building the NATIVE tools it gets set again to

-- Setting MLIR_TABLEGEN_EXE to mlir-tblgen                                                                                                                                                                                                               
-- Setting MLIR_TABLEGEN_TARGET to mlir-tblgen                                                                                                                                                                                                            
-- Setting MLIR_PDLL_TABLEGEN_EXE to mlir-pdll          

@stephenneuendorffer
Copy link
Contributor

I've seen things happen like this with toolchain files, where information discovered from the outer build doesn't get passed to the inner build, including the toolchain file. As a result, the inner build can discover a different set of information from the outer build. This is an area where I think cmake recursion is very subtle and it's easy to shoot yerself in the foot. I've tended to lean on the side of making all information used by a sub-build explicit, but this can be challenging when there is alot of information to be passed.

@marbre
Copy link
Member

marbre commented Aug 8, 2022

So expected output of

ubuntu:~/github/torch-mlir$ cmake -GNinja -Bbuild   -DCMAKE_BUILD_TYPE=Release   -DCMAKE_C_COMPILER=arm-linux-gnueabihf-gcc   -DCMAKE_CXX_COMPILER=arm-linux-gnueabihf-g++   -DPython3_FIND_VIRTUALENV=ONLY   -DLLVM_ENABLE_PROJECTS=mlir   -DLLVM_EXTERNAL_PROJECTS="torch-mlir;torch-mlir-dialects"   -DLLVM_EXTERNAL_TORCH_MLIR_SOURCE_DIR=`pwd`   -DLLVM_EXTERNAL_TORCH_MLIR_DIALECTS_SOURCE_DIR=`pwd`/externals/llvm-external-projects/torch-mlir-dialects   -DMLIR_ENABLE_BINDINGS_PYTHON=OFF -DTORCH_MLIR_USE_INSTALLED_PYTORCH=OFF  -DLLVM_TARGETS_TO_BUILD=AArch64 -DLLVM_USE_HOST_TOOLS=ON externals/llvm-project/llvm && cmake --build build --target check-torch-mlir-all

Should pass and build for ARM but it will fail trying to run arm mlir_tblgen on x86_64.

  • Thanks for clarifying. First of all, I don't know if the target check-torch-mlir-all can ever pass when cross-compiling (at least not the way it is right now). The necessary torch-mlir-opt is cross-compiled for the target (from a quick look a native versions won't be built) and thus the test cannot be executed. Anyway, we should be able to cross-compile those tools for the target :)

-DLLVM_USE_HOST_TOOLS=ON does the right thing for OSX and builds the NATIVE tools. It doesn't do the right thing on the linux recreate I posted above.

I think that is because it expects a toolchain cmake file like: https://github.com/llvm/llvm-project/blob/main/llvm/cmake/platforms/Android.cmake. We can revisit the linux_x86_64 --> linux_arm64 later and test it with the correct toolchain.cmake.

  • You don't need a toolchain file to cross-compile. You can pass all the necessary flags via the command line or pass them via a script. I think the failure is rather related to wrong build args passed via CMake, especially for Linux to Linux cross-compiling (with Apple clang this might be different due to flags like MACOSX_DEPLOYMENT_TARGET). It is really tricky to get this done right and the Linux reproducer definitely misses CMAKE_SYSTEM_NAME to do a correct cross-compile. As promised, I am looking further into this.

and then when we are building the NATIVE tools it gets set again to

-- Setting MLIR_TABLEGEN_EXE to mlir-tblgen                                                                                                                                                                                                               
-- Setting MLIR_TABLEGEN_TARGET to mlir-tblgen                                                                                                                                                                                                            
-- Setting MLIR_PDLL_TABLEGEN_EXE to mlir-pdll          

@powderluv
Copy link
Collaborator Author

-DLLVM_USE_HOST_TOOLS=ON does the right thing for OSX and builds the NATIVE tools. It doesn't do the right thing on the linux recreate I posted above.

So on my x86_64 macOS building with -DCMAKE_OSX_ARCHITECTURES=arm64 if -DLLVM_USE_HOST_TOOLS=ON is not set it doesn't attempt to build_native_tool() so I have to explicitly set it. Maybe that is the root cause ?

  • Further, I think it does the correct thing with D130350 applied. I played around with mlir-emitc and had success with getting a native mlir-tblgen for an x86_64 Linux and a cross compiled on for an Arm Linux.
    Unfortunately, I messed up my build-script and cannot share exactly what I did last Friday. Trying to reproduce and will afterwards test with torch-mlir.

D130350 is a good start but ideally we want to avoid pushing these into the cache which has unintended consequences if we recompile / change flags etc. Ideally we keep exporting out into PARENT_SCOPE until the top level has the variables we are about. That said add_tblgen() itself is pushing stuff into CACHE.

I think that is because it expects a toolchain cmake file like: https://github.com/llvm/llvm-project/blob/main/llvm/cmake/platforms/Android.cmake. We can revisit the linux_x86_64 --> linux_arm64 later and test it with the correct toolchain.cmake.

  • You don't need a toolchain file to cross-compile. You can pass all the necessary flags via the command line or pass them via a script. I think the failure is rather related to wrong build args passed via CMake, especially for Linux to Linux cross-compiling (with Apple clang this might be different due to flags like MACOSX_DEPLOYMENT_TARGET). It is really tricky to get this done right and the Linux reproducer definitely misses CMAKE_SYSTEM_NAME to do a correct cross-compile. As promised, I am looking further into this.

Ok Thank you.

and then when we are building the NATIVE tools it gets set again to

-- Setting MLIR_TABLEGEN_EXE to mlir-tblgen                                                                                                                                                                                                               
-- Setting MLIR_TABLEGEN_TARGET to mlir-tblgen                                                                                                                                                                                                            
-- Setting MLIR_PDLL_TABLEGEN_EXE to mlir-pdll          

Yes I remove it locally. I will send a PR for this now that D130350 has landed.

@ashay fyi since you updated the add_tblgen() recently.

@marbre
Copy link
Member

marbre commented Aug 9, 2022

So on my x86_64 macOS building with -DCMAKE_OSX_ARCHITECTURES=arm64 if -DLLVM_USE_HOST_TOOLS=ON is not set it doesn't attempt to build_native_tool() so I have to explicitly set it. Maybe that is the root cause ?

Honestly, IDK and I currently don't have a Mac available to test :/

D130350 is a good start but ideally we want to avoid pushing these into the cache which has unintended consequences if we recompile / change flags etc. Ideally we keep exporting out into PARENT_SCOPE until the top level has the variables we are about. That said add_tblgen() itself is pushing stuff into CACHE.

Yeah, I am not really sure if we should go with a cache variable or with explicitly exporting to parent scopes. And yes, you're right, add_tablegen sets a cache variable for ${project}_TABLEGEN (here). However, it also pushes to the parent scope...

Anyway, I was able to cross-compile torch-mlir-opt for Arm. This requires the patches #1196, #1197 and tensorflow/tensorflow#57054 (I locally modified the mlir-hlo submodule to make the build pass).

@powderluv
Copy link
Collaborator Author

What is your local command to cross-compile ?

@powderluv
Copy link
Collaborator Author

magically I am able to cross-compile on the GHA macos-12 runner #1204 but not on my iMacPro. If this works on GHA we can close this issue or make it low priority.

@marbre
Copy link
Member

marbre commented Aug 10, 2022

Let me try to get a running cross-compile from x86_64 Linux to Arm Linux with Clang (I used GCC so far). But I agree, we can than deprioritize.

@powderluv
Copy link
Collaborator Author

This is now fixed.

qedawkins pushed a commit to nod-ai/torch-mlir that referenced this issue Oct 3, 2022
Signed-off-by: Charles Volzka <cjvolzka@us.ibm.com>
qedawkins pushed a commit to nod-ai/torch-mlir that referenced this issue Oct 3, 2022
* Add check-onnx-backend to Mac CI. (llvm#1069)

* Add check-onnx-backend to Mac CI.

Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Additional Docker help and split README for easier reading (llvm#1084)

* initial docker documentation

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* split README with no redundant place for info

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* update

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* update

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* update

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* update

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* update

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* respond to suggestions

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* specify that onnx-mlir.py script generates only code suitable to be exec in Linux and/or Docker env

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* fix checkdocs

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* responded to review suggestion on onnx-mlir --help

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* use ONNX-MLIR everywhere

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* add verify for concat

Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* check all inputs

Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Support filtering out lit tests based on targets (llvm#1087)

Currently we ignore what targets llvm was built for in the lit tests, but recent changes to onnx-mlir explicitly initialize the available targets.
This makes the corresponding change to the lit configuration, so that we can filter out the lit tests based on the available targets.

Signed-off-by: Stella Stamenova <stilis@microsoft.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Switch URLs to use main instead of master (llvm#1094)

Signed-off-by: Charles Volzka <cjvolzka@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Fix MacOS build badge (llvm#1092)

Signed-off-by: Gong Su <gong_su@hotmail.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* onnx-mlir.py warning about binary output (.so and .jar) (llvm#1090)

not directly usable if host is not Linux

Signed-off-by: Gong Su <gong_su@hotmail.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Make the doc example obey ONNX_MLIR_BUILD_TESTS (llvm#1083)

* Make the doc example obey ONNX_MLIR_BUILD_TESTS

Currently, ONNX_MLIR_BUILD_TESTS controls EXCLUDE_FROM_ALL, however, the targets added through add_executable will always build. We follow the llvm pattern and explicitly set EXCLUDE_FROM_ALL in the add_onnx_mlir_executable function if it is set for the directory, so that add_executable targets don't always build.

Signed-off-by: Stella Stamenova <stilis@microsoft.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Explicitly install into lib on all systems (llvm#1088)

Signed-off-by: Gong Su <gong_su@hotmail.com>

Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* add check (llvm#1098)

Signed-off-by: Tong Chen <chentong@us.ibm.com>

Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* fix typos and add ssh-client to dockerfile (llvm#1096)

* fix typos and add ssh-client to dockerfile

Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* sync doc and script

Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Emit print statement only when the verbose option is in effect. (llvm#1097)

Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com>

Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* format & refine code by request

Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Support older versions 6, 11, 12 for Clip Op (llvm#1100)

Signed-off-by: Tung D. Le <tung@jp.ibm.com>

Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* using front to get first input

Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* add 3 lit test for concat  verifier

Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* add newline

Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Add check-onnx-backend to Mac CI. (llvm#1069)

* Add check-onnx-backend to Mac CI.

Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Additional Docker help and split README for easier reading (llvm#1084)

* initial docker documentation

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* split README with no redundant place for info

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* update

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* update

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* update

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* update

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* update

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* respond to suggestions

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* specify that onnx-mlir.py script generates only code suitable to be exec in Linux and/or Docker env

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* fix checkdocs

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* responded to review suggestion on onnx-mlir --help

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

* use ONNX-MLIR everywhere

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Switch URLs to use main instead of master (llvm#1094)

Signed-off-by: Charles Volzka <cjvolzka@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Fix MacOS build badge (llvm#1092)

Signed-off-by: Gong Su <gong_su@hotmail.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* fix typos and add ssh-client to dockerfile (llvm#1096)

* fix typos and add ssh-client to dockerfile

Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* sync doc and script

Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Update document (llvm#1077)

* create

Signed-off-by: Tong Chen <chentong@us.ibm.com>

* delete HowTOAddAnOperation.md

Signed-off-by: Tong Chen <chentong@us.ibm.com>

* modify testing

Signed-off-by: Tong Chen <chentong@us.ibm.com>

* create

Signed-off-by: Tong Chen <chentong@us.ibm.com>

* delete HowTOAddAnOperation.md

Signed-off-by: Tong Chen <chentong@us.ibm.com>

* modify testing

Signed-off-by: Tong Chen <chentong@us.ibm.com>

* fix

Signed-off-by: Tong Chen <chentong@us.ibm.com>

* create

Signed-off-by: Tong Chen <chentong@us.ibm.com>

* add comment

Signed-off-by: Tong Chen <chentong@us.ibm.com>

* delete HowTOAddAnOperation.md

Signed-off-by: Tong Chen <chentong@us.ibm.com>

* modify testing

Signed-off-by: Tong Chen <chentong@us.ibm.com>

* fix

Signed-off-by: Tong Chen <chentong@us.ibm.com>

* create

Signed-off-by: Tong Chen <chentong@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Update LLVM level (llvm#1095)

* Update LLVM level to 700997a

Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Pass a type converter to all ONNX operations. (llvm#1102)

Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Nuke KrnlDummyCastOp now that we use MLIR's UnrealizedConversionCastOp (llvm#1103)

* Nuke KrnlDummyCastOp now that we use MLIR's UnrealizedConversionCastOp

Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com>

* Remove a dependency in src/Dialect/Krnl/CMakeList.txt.  Regenerate docs via 'ninja onnx-mlir-docs'.

Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Add --emitObj option to onnx-mlir (llvm#1104)

Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* fix warnings (llvm#1093)

Signed-off-by: Ian Bearman <ianb@microsoft.com>

Co-authored-by: Stella Stamenova <stilis@microsoft.com>
Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Add -march option to onnx-mlir (llvm#1107)

Signed-off-by: Ettore Tiotto <etiotto@ca.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Fix Doc spelling and broken links, removed warnings about using main (llvm#1106)

* removed warning about main vs master in CONTRIBUTING, fixed links and spelling mistakes

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

* Update BuildONNX.md

Signed-off-by: Ethan Wang <ywan2928@uwo.ca>

Co-authored-by: Ettore Tiotto <etiotto@ca.ibm.com>
Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com>
Co-authored-by: Stella Stamenova <stilis@microsoft.com>
Co-authored-by: Charles Volzka <42243335+cjvolzka@users.noreply.github.com>
Co-authored-by: gongsu832 <gong_su@hotmail.com>
Co-authored-by: chentong319 <chentong@us.ibm.com>
Co-authored-by: Tung D. Le <tung@jp.ibm.com>
Co-authored-by: Ian Bearman <ian.bearman@live.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants