Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SLES 15.x][RHEL 8.x] Build failure 'filesystem' file not found after #2657 #2687

Closed
junliume opened this issue Jan 21, 2024 · 10 comments · Fixed by #2690
Closed

[SLES 15.x][RHEL 8.x] Build failure 'filesystem' file not found after #2657 #2687

junliume opened this issue Jan 21, 2024 · 10 comments · Fixed by #2690

Comments

@junliume
Copy link
Collaborator

[Observation]
SLES Error:

[2024-01-19T11:45:06.793Z] /long_pathname_so_that_rpms_can_package_the_debug_info/data/driver/MLOpen/addkernels/source_file_desc.hpp:29:10: fatal error: 'filesystem' file not found
[2024-01-19T11:45:06.793Z]    29 | #include <filesystem>
[2024-01-19T11:45:06.793Z]       |          ^~~~~~~~~~~~

RHEL Error:

[2024-01-19T11:50:12.560Z] [  8%] Linking CXX executable ../bin/addkernels
[2024-01-19T11:50:12.560Z] cd /long_pathname_so_that_rpms_can_package_the_debug_info/data/driver/MLOpen/build_hip/addkernels && /usr/local/lib64/python3.6/site-packages/cmake/data/bin/cmake -E cmake_link_script CMakeFiles/addkernels.dir/link.txt --verbose=1
[2024-01-19T11:50:12.560Z] /opt/rocm-6.1.0-425/llvm/bin/clang++ -O3 -DNDEBUG -s -Wl,--enable-new-dtags,--rpath,$ORIGIN/../lib CMakeFiles/addkernels.dir/include_inliner.cpp.o CMakeFiles/addkernels.dir/addkernels.cpp.o -o ../bin/addkernels 
[2024-01-19T11:50:12.560Z] ld.lld: error: undefined symbol: std::filesystem::weakly_canonical(std::filesystem::__cxx11::path const&)
[2024-01-19T11:50:12.560Z] >>> referenced by include_inliner.cpp
[2024-01-19T11:50:12.560Z] >>>               CMakeFiles/addkernels.dir/include_inliner.cpp.o:(IncludeInliner::ProcessCore(std::istream&, std::ostream&, std::filesystem::__cxx11::path const&, std::filesystem::__cxx11::path const&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, bool, bool))
[2024-01-19T11:50:12.560Z] 
[2024-01-19T11:50:12.560Z] ld.lld: error: undefined symbol: std::filesystem::status(std::filesystem::__cxx11::path const&)
[2024-01-19T11:50:12.560Z] >>> referenced by include_inliner.cpp
[2024-01-19T11:50:12.560Z] >>>               CMakeFiles/addkernels.dir/include_inliner.cpp.o:(IncludeInliner::ProcessCore(std::istream&, std::ostream&, std::filesystem::__cxx11::path const&, std::filesystem::__cxx11::path const&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, bool, bool))

The issue is likely introduced by #2657

[Proposal]:
Within #2657:

At the same time, MIOpen is using C++17 standard, and std::filesystem has most of the functionality Boost offers.
Later, we will submit PRs for other components in MIOpen, changing std::string and boost::filesystem::path to std::filesystem::path. Currently, the Linux-specific path operations block Windows MIOpen from executing specific scenarios.

Likely we would need the other way around, replace std::filesystem with boost?

@junliume
Copy link
Collaborator Author

@atamazov @JehandadKhan FYI

@dmikushin
Copy link
Contributor

dmikushin commented Jan 21, 2024

I would like to explain how this error can possibly occur.

Clang++ has two possibilities to be provided with std:: support.

  1. One possibility is its own libc++ library: it's an independent feature-complete implementation of STL, which includes std::filesystem. Therefore, in this mode the error above does NOT occur.
  2. Another option is that Clang relies on GNU's STL library by binding to its include files. You can include '-v' to the compile line of /opt/bin/llvm/bin/clang++ to see the following:
"/opt/rocm-5.7.0/llvm/bin/clang-17" -cc1 -triple x86_64-unknown-linux-gnu -emit-obj -mrelax-all -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name filesystem.cpp -mrelocation-model static -mframe-pointer=all -fmath-errno -ffp-contract=on -fno-rounding-math -mconstructor-aliases -funwind-tables=2 -target-cpu x86-64 -tune-cpu generic -debugger-tuning=gdb -v -fcoverage-compilation-dir=/home/marcusmae/amd/filesystem -resource-dir /opt/rocm-5.7.0/llvm/lib/clang/17.0.0 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/backward -internal-isystem /opt/rocm-5.7.0/llvm/lib/clang/17.0.0/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -fdeprecated-macro -fdebug-compilation-dir=/home/marcusmae/amd/filesystem -ferror-limit 19 -fgnuc-version=4.2.1 -fcxx-exceptions -fexceptions -fcolor-diagnostics -faddrsig -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o /tmp/filesystem-13f403.o -x c++ filesystem.cpp

That is, ROCm's clang++ uses -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/x86_64-linux-gnu/c++/12 path during the compilation, which belongs to the host system's GCC compiler.

Therefore, the error is caused by the fact that ROCm goes for host's gcc compiler, and it's too old to provide std::filesystem support.

This build mode of clang++ is the default, libc++ needs to be added separately. However, in Windows things could be different, build team may choose to provide libc++, as there is no other alternative.

IMO, each toolchain is better be made self-containing to avoid distro-specific headaches. One reason why nvcc compiler is not self-containing is that host-device source code separation is made to use host compiler (gcc, msvc) for the host code, and use cicc (LLVM) for device code only. BUT for ROCm I believe it does not have to be so: both host and device code could be handled by ROCm clang++ entirely.

To my best knowledge, ROCm clang++ can be made into a self-containing distro-independent compiler. Why it is not done so (as the error indicates) is a big question to the ROCm compiler devops team. I'd double-check with them what is their reasoning and do they understand the impact of their choice not to use libc++.

Likely we would need the other way around, replace std::filesystem with boost?

The best solution would be to use https://github.com/gulrak/filesystem . It's a unification package used to provide std::filesystem on top of uneven compiler support. I used it few years ago and not anymore, but it still should solve this problem very well.

@dmikushin
Copy link
Contributor

dmikushin commented Jan 21, 2024

And a side not about RHEL. RHEL chooses to keep a very old gcc compiler as the system default. RHEL uses what they call toolsets to optionally provide newer compilers, but they are installed into /opt, and the apps are not made aware of their availability by default.

On Rocky Linux 8.8 the default GCC is 8.5.0, which is 5 years old, but it already supports C++17. So another way to avoid the problem is to raise RHEL OS version requirement.

@dmikushin
Copy link
Contributor

dmikushin commented Jan 21, 2024

@junliume , since the error in RHEL only occurs for you during linking, I believe it could be fixed. Please try to add -lstdc++fs or stdc++fs to the linker, like this:

iff --git a/addkernels/CMakeLists.txt b/addkernels/CMakeLists.txt
index f904146ae..55a1d8c85 100644
--- a/addkernels/CMakeLists.txt
+++ b/addkernels/CMakeLists.txt
@@ -27,5 +27,6 @@
 set(ADD_KERNELS_SOURCE include_inliner.cpp addkernels.cpp)
 
 add_executable(addkernels EXCLUDE_FROM_ALL ${ADD_KERNELS_SOURCE})
+target_link_libraries(addkernels stdc++fs)
 
 clang_tidy_check(addkernels)
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 8feef9375..8743edec8 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -683,7 +683,7 @@ endif()
 
 target_include_directories(MIOpen SYSTEM PUBLIC $<BUILD_INTERFACE:${HALF_INCLUDE_DIR}>)
 target_include_directories(MIOpen SYSTEM PRIVATE ${BZIP2_INCLUDE_DIR})
-target_link_libraries(MIOpen PRIVATE ${CMAKE_THREAD_LIBS_INIT} ${BZIP2_LIBRARIES} ${MIOPEN_CK_LINK_FLAGS})
+target_link_libraries(MIOpen PRIVATE ${CMAKE_THREAD_LIBS_INIT} ${BZIP2_LIBRARIES} ${MIOPEN_CK_LINK_FLAGS} stdc++fs)
 generate_export_header(MIOpen
     EXPORT_FILE_NAME ${PROJECT_BINARY_DIR}/include/miopen/export.h
 )

This library should be automatically added to linker by CMake, but only if CMake is new enough.

@apwojcik
Copy link
Collaborator

Clang++ for providing STL uses the GNU standard library on Linux and the MSFT standard library on Windows. For SLES 15, the filesystem is experimental. So instead of hard coding the library name in CMake (which will not work on Windows), we should follow the MIGraphX approach: https://github.com/ROCm/AMDMIGraphX/blob/develop/src/include/migraphx/filesystem.hpp.
I will push the appropriate PR tomorrow for detecting FS correctly, including the required changes in the code.

@apwojcik
Copy link
Collaborator

The same is true for RHEL.

@apwojcik
Copy link
Collaborator

PR #2690 submitted

@dmikushin
Copy link
Contributor

The PR looks great!

Still what's about SLES? Does a missing filesystem header mean that some package needs to be added into the build environment?

@apwojcik
Copy link
Collaborator

apwojcik commented Jan 23, 2024

No, all required packages are there. It is just a matter of using std::filesystem versus std::experimental::filesystem. I successfully compiled MIOpen on freshly installed SLES/RHEL from DVD ISOs. The only missing piece is weakly_canonical, which is not present on SLES.

@dmikushin
Copy link
Contributor

dmikushin commented Jan 23, 2024

OK, likely means SLES has an older compiler version. Oldest among all target Linux distros. Therefore, it should be our first target to check when introducing new C++ features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants