Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fails to build on macOS with disabled cuda #270

Closed
davydden opened this issue Mar 16, 2019 · 16 comments · Fixed by #278
Closed

fails to build on macOS with disabled cuda #270

davydden opened this issue Mar 16, 2019 · 16 comments · Fixed by #278
Assignees
Labels
is:bug Something looks wrong. is:confirmed Someone confirmed this issue. reg:benchmarking This is related to benchmarking. reg:build This is related to the build system.

Comments

@davydden
Copy link

davydden commented Mar 16, 2019

I disable cuda

'-DBUILD_CUDA=OFF'

but build fails with undefined symbols that suggest the presence of cuda:

[  2%] Linking CXX shared library libginkgo_cuda.dylib
cd /Users/davydden/spack/var/spack/stage/ginkgo-develop-zeye3v3wrsei5bnsicjkcjkljrbhnriq/ginkgo/spack-build/core/device_hooks && /Users/davydden/spack/opt/spack/darwin-mojave-x86_64/clang-10.0.0-apple/cmake-3.14.0-qpuv2tznpy7mu7zko425s24qbkxcizty/bin/cmake -E cmake_link_script CMakeFiles/ginkgo_cuda.dir/link.txt --verbose=1
/Users/davydden/spack/lib/spack/env/clang/clang++ -O3 -DNDEBUG -dynamiclib -Wl,-headerpad_max_install_names  -o libginkgo_cuda.dylib -install_name @rpath/libginkgo_cuda.dylib CMakeFiles/ginkgo_cuda.dir/cuda_hooks.cpp.o ../devices/cuda/CMakeFiles/ginkgo_cuda_device.dir/executor.cpp.o 
Undefined symbols for architecture x86_64:
  "gko::CudaExecutor::mutex", referenced from:
      gko::CudaExecutor::CudaExecutor(int, std::__1::shared_ptr<gko::Executor>) in cuda_hooks.cpp.o
      gko::CudaExecutor::~CudaExecutor() in executor.cpp.o
  "gko::CudaExecutor::num_execs", referenced from:
      gko::CudaExecutor::CudaExecutor(int, std::__1::shared_ptr<gko::Executor>) in cuda_hooks.cpp.o
      gko::CudaExecutor::~CudaExecutor() in executor.cpp.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [core/device_hooks/libginkgo_cuda.dylib] Error 1
make[1]: *** [core/device_hooks/CMakeFiles/ginkgo_cuda.dir/all] Error 2
make: *** [all] Error 2

@hartwiganzt
Copy link
Collaborator

@davydden can you please add the CMakeCache.txt? Thanks!

@davydden
Copy link
Author

sure, here you go CMakeCache.txt

@davydden
Copy link
Author

davydden commented Mar 16, 2019

wait a second, should that be GINKGO_BUILD_CUDA instead of BUILD_CUDA? but default is still disabled...

EDIT: supposedly CMake interface changed recently, as @pratikvn documented this as BUILD_CUDA in dealii/dealii#7368

EDIT2: using GINKGO_BUILD_CUDA does not make a difference.

@pratikvn
Copy link
Member

@davydden , thank you for identifying this. Yes, we had to add GINKGO_ prefixes to conform to xSDK requirements. I will create a PR with the updated docs on deal.ii soon. I thought I would add it in the next wave when I add a few more of Ginkgo's solvers as well.

Disabling -DGINKGO_BUILD_CUDA actually does disable cudabut not the CudaExecutor paradigm, The philosophy of ginkgo is to have a modular setup and hence the CudaExecutor module in the disabled state just contains some hooks and will consequently have a GKO_NOT_COMPILED/IMPLEMENTED error when one calls something on the cuda executor.

So, it seems that you are having an issue with std::mutex from the C++ standard library. Could you tell us what clang version you are using ? It is not clear to me from the CMakeCache file. Additionally, if you could try running it with gcc instead of clang, that would be great.

@davydden
Copy link
Author

I will create a PR with the updated docs on deal.ii soon

thanks 👍

it seems that you are having an issue with std::mutex from the C++ standard library.

Should not be the case, deal.II also uses std::mutex and I have no issues with it there on macOS.

Could you tell us what clang version you are using ?

$ clang --version
Apple LLVM version 10.0.0 (clang-1000.11.45.5)
Target: x86_64-apple-darwin18.2.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

if you could try running it with gcc instead of clang, that would be great.

Unfortunately that's not an option. I build the whole stack of things, and too many things fail to build with gcc on macOS.

@davydden
Copy link
Author

davydden commented Mar 16, 2019

it seems that you are having an issue with std::mutex from the C++

btw, how is std::mutex related to

Undefined symbols for architecture x86_64:
"gko::CudaExecutor::mutex"

?

EDIT:

Oh, I see, you have

CudaExecutor {
   static std::mutex mutex[max_devices];
}

and in .cpp:

std::mutex CudaExecutor::mutex[max_devices];

Still, I would not really call this "issues with std::mutex", it's rather a compiler has some issues with initialization of static variables in this case (no idea why).

@pratikvn
Copy link
Member

pratikvn commented Mar 16, 2019

btw, how is std::mutex related to

Sorry, I wrote something that was misleading. We use std::mutex within CudaExecutor to manage when we have multiple CudaExecutors such that we have a count of the number of cuda devices that we have. We had to add this due to an issue with cudaDeviceReset(). So, for some reason, it seems that there is an issue with the variable within the CudaExecutor class gko::CudaExecutor::mutex.

EDIT:

Still, I would not really call this "issues with std::mutex", it's rather a compiler has some issues with initialization of static variables in this case (no idea why).

Yes, you are right.

@hartwiganzt
Copy link
Collaborator

@davydden I am using clang 7 and I encounter problems with some circular dependency that we have not yet resolved. I circumvent that problem by disabling the shared libs. Maybe you can give it a try? (it is an open issue #203 , sure not a long-term solution)

@thoasm
Copy link
Member

thoasm commented Mar 18, 2019

@davydden You can disable shared libraries with the cmake option -DBUILD_SHARED_LIBS=OFF. Instead of shared libraries (.lib), it will build static libraries .a. It should not make a difference performance wise, but I hope we have a better solution for that in the near future.

@davydden
Copy link
Author

@hartwiganzt @thoasm static gives other issues while building:

   906    [ 76%] Built target core_test_solver_cgs
     907    In file included from /Users/davydden/spack/var/spack/stage/ginkgo-develop-alk3r26k56kgsu6an7o3w6ofzfkuw5pg/ginkgo/examples/asynchronous_stopping_criterion/asynchronous_stopping_criterion.cpp:67:
  >> 908    /Users/davydden/spack/var/spack/stage/ginkgo-develop-alk3r26k56kgsu6an7o3w6ofzfkuw5pg/ginkgo/include/ginkgo/ginkgo.hpp:41:10: fatal error: 'ginkgo/**/*.hpp' file not found
     909    #include <ginkgo/**/*.hpp>
     910             ^~~~~~~~~~~~~~~~~
     911    [ 76%] Built target core_test_solver_fcg
     912    1 error generated.
  >> 913    make[2]: *** [examples/asynchronous_stopping_criterion/CMakeFiles/asynchronous_stopping_criterion.dir/asynchronous_stopping_criterion.cpp.o] Error 1
  >> 914    make[1]: *** [examples/asynchronous_stopping_criterion/CMakeFiles/asynchronous_stopping_criterion.dir/all] Error 2

@thoasm
Copy link
Member

thoasm commented Mar 18, 2019

Interesting, it looks like your shell does not know the ** expression that we use to update our global header file.
Temporarily, you can disable the check if any header was put into include/ginkgo/ by removing the following lines from the global CMakeLists.txt:

add_custom_target(generate_ginkgo_header ALL
        COMMAND ${Ginkgo_SOURCE_DIR}/dev_tools/scripts/update_ginkgo_header.sh
        WORKING_DIRECTORY ${Ginkgo_SOURCE_DIR})

After removing this part, it should work fine because we have the global header file include/ginkgo/ginkgo.hpp in git already.

I also created the issue #272 to tackle this problem, so this temporary solution will no longer necessary.

@davydden
Copy link
Author

@thoasm

Temporarily, you can disable the check

unfortunately, that leads to other errors:

    1002    /Applications/Xcode.app/Contents/Developer/usr/bin/make -f benchmark/solver/CMakeFiles/solver.dir/build.make benchmark/solver/CMakeFiles/solver.dir/build
     1003    [ 92%] Building CXX object benchmark/solver/CMakeFiles/solver.dir/solver.cpp.o
     1004    cd /Users/davydden/spack/var/spack/stage/ginkgo-develop-nhrsjtztfvapqtaiyqnl753pipvjq2u5/ginkgo/spack-build/benchmark/solver && /Users/davydden/spack/lib/spack/env/clang/clang++   -I/Users/davydden/spack/var/spack/stage/ginkgo-develop-nhrsjtztfvapqtaiyqnl7
             53pipvjq2u5/ginkgo/spack-build/include -I/Users/davydden/spack/var/spack/stage/ginkgo-develop-nhrsjtztfvapqtaiyqnl753pipvjq2u5/ginkgo/include -I/Users/davydden/spack/var/spack/stage/ginkgo-develop-nhrsjtztfvapqtaiyqnl753pipvjq2u5/ginkgo -I/Users/davydden/s
             pack/var/spack/stage/ginkgo-develop-nhrsjtztfvapqtaiyqnl753pipvjq2u5/ginkgo/spack-build/third_party/rapidjson/src/include -isystem /Users/davydden/spack/var/spack/stage/ginkgo-develop-nhrsjtztfvapqtaiyqnl753pipvjq2u5/ginkgo/spack-build/third_party/gflags/b
             uild/include  -O3 -DNDEBUG   -std=gnu++11 -o CMakeFiles/solver.dir/solver.cpp.o -c /Users/davydden/spack/var/spack/stage/ginkgo-develop-nhrsjtztfvapqtaiyqnl753pipvjq2u5/ginkgo/benchmark/solver/solver.cpp
     1005    In file included from /Users/davydden/spack/var/spack/stage/ginkgo-develop-nhrsjtztfvapqtaiyqnl753pipvjq2u5/ginkgo/benchmark/matrix_statistics/matrix_statistics.cpp:44:
     1006    In file included from /Users/davydden/spack/var/spack/stage/ginkgo-develop-nhrsjtztfvapqtaiyqnl753pipvjq2u5/ginkgo/benchmark/utils/general.hpp:54:
  >> 1007    /Users/davydden/spack/var/spack/stage/ginkgo-develop-nhrsjtztfvapqtaiyqnl753pipvjq2u5/ginkgo/spack-build/third_party/rapidjson/src/include/rapidjson/document.h:872:22: error: call to constructor of 'rapidjson::GenericValue<rapidjson::UTF8<char>, rapidjson:
             :MemoryPoolAllocator<rapidjson::CrtAllocator> >' is ambiguous
     1008            GenericValue v(value);
     1009                         ^ ~~~~~
     1010    /Users/davydden/spack/var/spack/stage/ginkgo-develop-nhrsjtztfvapqtaiyqnl753pipvjq2u5/ginkgo/benchmark/utils/general.hpp:115:22: note: in instantiation of function template specialization 'rapidjson::GenericValue<rapidjson::UTF8<char>, rapidjson::MemoryPoo
             lAllocator<rapidjson::CrtAllocator> >::operator=<unsigned long>' requested here
     1011            object[name] = std::forward<T>(value);
     1012                         ^
     1013    /Users/davydden/spack/var/spack/stage/ginkgo-develop-nhrsjtztfvapqtaiyqnl753pipvjq2u5/ginkgo/benchmark/matrix_statistics/matrix_statistics.cpp:123:5: note: in instantiation of function template specialization 'add_or_set_member<const unsigned long &, char
             const (&)[4], rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator> &>' requested here

@tcojean tcojean added is:bug Something looks wrong. is:confirmed Someone confirmed this issue. labels Mar 18, 2019
@tcojean tcojean added this to the Ginkgo 1.0.0 release milestone Mar 18, 2019
@tcojean tcojean added reg:build This is related to the build system. reg:benchmarking This is related to benchmarking. labels Mar 18, 2019
@thoasm
Copy link
Member

thoasm commented Mar 18, 2019

@davydden OK, now it seems that it does not accept the rapitjson package. For some reason, this is also only present on Mac systems and should be fixed in the next couple of days (when we update to a newer package of rapidjson).
For now, disabling the benchmarks in cmake (-DGINKGO_BUILD_BENCHMARKS=OFF) should be the last thing that you need to do in order for Ginkgo to work on your system.

@davydden
Copy link
Author

disabling the benchmarks in cmake (-DGINKGO_BUILD_BENCHMARKS=OFF) should be the last thing that you need to do in order for Ginkgo to work on your system

thanks, that allowed me to build ginkgo indeed.

tcojean added a commit that referenced this issue Mar 27, 2019
`gtest` and `gflags` are updated to their latest release. For `RapidJSON`, try to find a correct version as the last release is over two years old and there is a lot of recent pipeline failures.

This PR also ensures for our usage RapidJSON that there is no problem with some compilers when building benchmarks (i.e., building Ginkgo with `-DGINKGO_BUILD_BENCHMARKS=ON`). The problem was that on some compilers, no constructor can be recognized for the RapidJSON `GenericValue` object with a parameter of type `std::size_t`. The fix used is to add an explicit version of our main function `add_or_set_member` for `std::size_t` which uses a `static_cast` to `std::uint64_t`.

This is related to #270.
Associated pull request: #274.
@tcojean tcojean self-assigned this Mar 27, 2019
@tcojean
Copy link
Member

tcojean commented Mar 28, 2019

Thanks a lot for your report @davydden. All the issues you found should now be fixed after the following PRs:
#278
#274
#272

@davydden
Copy link
Author

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is:bug Something looks wrong. is:confirmed Someone confirmed this issue. reg:benchmarking This is related to benchmarking. reg:build This is related to the build system.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants