Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to compile Libtorch with Plumed #1077

Open
Esenmira opened this issue May 15, 2024 · 5 comments
Open

Unable to compile Libtorch with Plumed #1077

Esenmira opened this issue May 15, 2024 · 5 comments

Comments

@Esenmira
Copy link

Hello,
I am trying to use Plumed 2.9.0 with Pytorch/Libtorch (CPU version) in the perspective of using it with CPMD to run enhanced sampling MD using machine-learned collective variables (generated by mlcolvars using Pytorch). If I try to run ./configure, I have different issues depending on the choice of the C++ compiler. I have tried both 2.0.0 and 2.3.0 versions of Pytorch/Libtorch, with or without C++11 ABIs, to no avail.
The machine on which I am trying to install this runs under Rocky Linux release 9.2 (Blue Onyx).

I was also trying to install Plumed, Libtorch and Pytorch using conda (which works) but I understand Plumed installed in this way is already compiled and cannot be set up with optional modules? So compiling is the only way to have optional modules?

I have enabled the environment variables as indicated in the installation guide. Here is the ./configure command that I run (based on recommendations in earlier issues):

./configure --prefix=/home/ac276447/mimic_sources/plumed-2.9.0/plumed-install/ --enable-libtorch LDFLAGS="-L/home/ac276447/mimic_sources/other-libtorch/libtorch-2.3.0-cxx11/lib"  CPPFLAGS='-I/home/ac276447/mimic_sources/other-libtorch/libtorch-2.3.0-cxx11/include -I/home/ac276447/mimic_sources/other-libtorch/libtorch-2.3.0-cxx11/include/torch/csrc/api/include/'

Adding the --enable-modules=pytorch option does not change anything.

The following commands have all been run with Libtorch 2.3.0, C++11 ABI, CPU version.

With default (mpiicpc) and mpiicc:

This is set up as the CXX environment variable, because it is what I found to work with other programs. Setting it explicitly in ./configure does not change anything. The following lines appear when running ./configure but are skipped quickly:

checking whether mpiicpc accepts -std=c++14... yes
checking libtorch without extra libs... no
checking libtorch with  -ltorch_cpu -lc10... no
configure: WARNING: cannot enable __PLUMED_HAS_LIBTORCH

plus a bunch of errors looking like this:

In file included from /usr/local/install/gcc-13.1.0/include/c++/13.1.0/cwchar(44),
                 from /usr/local/install/gcc-13.1.0/include/c++/13.1.0/bits/postypes.h(40),
                 from /usr/local/install/gcc-13.1.0/include/c++/13.1.0/iosfwd(42),
                 from /usr/local/install/gcc-13.1.0/include/c++/13.1.0/ios(40),
                 from /usr/local/install/gcc-13.1.0/include/c++/13.1.0/ostream(40),
                 from /usr/local/install/gcc-13.1.0/include/c++/13.1.0/iostream(41),
                 from conftest.cpp(1):
/usr/include/wchar.h(397): error: identifier "_Float32" is undefined
  extern _Float32 wcstof32 (const wchar_t *__restrict __nptr,

I am guessing that Plumed tries to use gcc at some point (?).

There is also this error when running it with Libtorch 2.3.0:

/home/ac276447/mimic_sources/other-libtorch/libtorch-2.3.0-cxx11/include/torch/csrc/api/include/torch/all.h:4:2: error: #error C++17 or later compatible compiler is required to use PyTorch. 

Invoking the compiler with -v gives:

mpiicc for the Intel(R) MPI Library 2021.10 for Linux*
Copyright Intel Corporation.
icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message.
icc version 2021.10.0 (gcc version 13.1.0 compatibility)

With gcc

When trying to set CXX=gcc: the "checking libtorch" lines have the same output but are much longer to pass (around 10s for each), the gcc-related errors disappear and the config.log file is much bigger (32MB compared to 650kB).
Of course Plumed says that it will not be configured with MPI.

gcc -v gives:

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/install/gcc-13.1.0/libexec/gcc/x86_64-pc-linux-gnu/13.1.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --prefix=/usr/local/install/gcc-13.1.0 --disable-multilib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 13.1.0 (GCC) 

With mpic++, mpiCC, mpicxx, mpigxx, mpicc, mpigcc

Same thing as above, except the MPI configuration message.

Invoking them with -v gives (mpic++, mpiCC):

Using built-in specs.
COLLECT_GCC=/usr/local/install/gcc-13.1.0/bin/g++
COLLECT_LTO_WRAPPER=/usr/local/install/gcc-13.1.0/libexec/gcc/x86_64-pc-linux-gnu/13.1.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --prefix=/usr/local/install/gcc-13.1.0 --disable-multilib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 13.1.0 (GCC) 

Or this (mpicxx, mpigxx):

mpigxx for the Intel(R) MPI Library 2021.10 for Linux*
Copyright Intel Corporation.
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/local/install/gcc-13.1.0/libexec/gcc/x86_64-pc-linux-gnu/13.1.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --prefix=/usr/local/install/gcc-13.1.0 --disable-multilib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 13.1.0 (GCC)

Or this (mpicc, mpigcc):

mpigcc for the Intel(R) MPI Library 2021.10 for Linux*
Copyright Intel Corporation.
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/local/install/gcc-13.1.0/libexec/gcc/x86_64-pc-linux-gnu/13.1.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --prefix=/usr/local/install/gcc-13.1.0 --disable-multilib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 13.1.0 (GCC)

With mpiicx and mpiicpx

Same output as above, the "checking libtorch" lines are passed much more quickly but the config.log file is much smaller only 225kB now.

I believe these compilers do not work at all for me as invoking them with -v gives:

Intel(R) oneAPI DPC++/C++ Compiler 2023.2.0 (2023.2.0.20230622)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/local/install/intel/intel-hpc-2023.2.0/compiler/2023.2.0/linux/bin-llvm
Configuration file: /usr/local/install/intel/intel-hpc-2023.2.0/compiler/2023.2.0/linux/bin-llvm/../bin/icpx.cfg
Found candidate GCC installation: /usr/local/install/gcc-13.1.0/lib/gcc/x86_64-pc-linux-gnu/13.1.0
Selected GCC installation: /usr/local/install/gcc-13.1.0/lib/gcc/x86_64-pc-linux-gnu/13.1.0
Candidate multilib: .;@m64
Selected multilib: .;@m64
 "/usr/bin/ld" --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o a.out /lib/../lib64/crt1.o /lib/../lib64/crti.o /usr/local/install/gcc-13.1.0/lib/gcc/x86_64-pc-linux-gnu/13.1.0/crtbegin.o -L/usr/local/install/intel/intel-hpc-2023.2.0/mpi/2021.10.0/lib/release -L/usr/local/install/intel/intel-hpc-2023.2.0/mpi/2021.10.0/lib -L/usr/local/install/intel/intel-hpc-2023.2.0/compiler/2023.2.0/linux/compiler/lib/intel64_lin -L/usr/local/install/intel/intel-hpc-2023.2.0/compiler/2023.2.0/linux/bin-llvm/../lib -L/usr/local/install/intel/intel-hpc-2023.2.0/compiler/2023.2.0/linux/compiler/lib/intel64_lin -L/usr/local/install/gcc-13.1.0/lib/gcc/x86_64-pc-linux-gnu/13.1.0 -L/usr/local/install/gcc-13.1.0/lib/gcc/x86_64-pc-linux-gnu/13.1.0/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/local/install/gcc-13.1.0/lib/gcc/x86_64-pc-linux-gnu/13.1.0/../../.. -L/usr/local/install/intel/intel-hpc-2023.2.0/compiler/2023.2.0/linux/bin-llvm/../lib -L/lib -L/usr/lib -L/usr/local/install/intel/intel-hpc-2023.2.0/tbb/2021.10.0/env/../lib/intel64/gcc4.8 -L/usr/local/install/intel/intel-hpc-2023.2.0/mpi/2021.10.0//libfabric/lib -L/usr/local/install/intel/intel-hpc-2023.2.0/mpi/2021.10.0//lib/release -L/usr/local/install/intel/intel-hpc-2023.2.0/mpi/2021.10.0//lib -L/usr/local/install/intel/intel-hpc-2023.2.0/mkl/2023.2.0/lib/intel64 -L/usr/local/install/intel/intel-hpc-2023.2.0/ippcp/2021.8.0/lib/intel64 -L/usr/local/install/intel/intel-hpc-2023.2.0/ipp/2021.9.0/lib/intel64 -L/usr/local/install/intel/intel-hpc-2023.2.0/dnnl/2023.2.0/cpu_dpcpp_gpu_dpcpp/lib -L/usr/local/install/intel/intel-hpc-2023.2.0/dal/2023.2.0/lib/intel64 -L/usr/local/install/intel/intel-hpc-2023.2.0/compiler/2023.2.0/linux/compiler/lib/intel64_lin -L/usr/local/install/intel/intel-hpc-2023.2.0/compiler/2023.2.0/linux/lib -L/usr/local/install/intel/intel-hpc-2023.2.0/ccl/2021.10.0/lib/cpu_gpu_dpcpp -L/home/ac276447/mimic_sources/other-libtorch/libtorch-2.0.0-precxx11/lib -L/home/ac276447/.local/lib -L. --enable-new-dtags -rpath /usr/local/install/intel/intel-hpc-2023.2.0/mpi/2021.10.0/lib/release -rpath /usr/local/install/intel/intel-hpc-2023.2.0/mpi/2021.10.0/lib -lmpicxx -lmpifort -lmpi -ldl -lrt -lpthread -Bstatic -lsvml -Bdynamic -Bstatic -lirng -Bdynamic -lstdc++ -Bstatic -limf -Bdynamic -lm -lgcc_s -lgcc -Bstatic -lirc -Bdynamic -ldl -lgcc_s -lgcc -lc -lgcc_s -lgcc -Bstatic -lirc_s -Bdynamic /usr/local/install/gcc-13.1.0/lib/gcc/x86_64-pc-linux-gnu/13.1.0/crtend.o /lib/../lib64/crtn.o
/usr/bin/ld: /lib/../lib64/crt1.o: in function `_start':
(.text+0x1b): undefined reference to `main'
icpx: error: linker command failed with exit code 1 (use -v to see invocation)

Other potentially concerning errors that I systematically see in the ./configure output are these (just before the checking libtorch lines):

checking fftw3.h usability... no
checking fftw3.h presence... no
checking for fftw3.h... no
configure: WARNING: cannot enable __PLUMED_HAS_FFTW
checking for python... python
configure: Python executable is python
checking support for required python modules (python3, setuptools, cython)... no
configure: WARNING: cannot enable python interface

Does anyone have ideas about how to solve this? Is this due to a missing option, a wrong version of the compiler?
Please let me know what further tests or details I could do to solve this.

Attached: the config.log files obtained with mpic++, mpiicx, and mpiicc with Libtorch 2.3.0 CPU version.
plumed_conf_issues.zip

@andleb
Copy link

andleb commented Jul 18, 2024

I can answer for the last FFTW part: installing it from your package manager should do the trick; otherwise here's the link to the binaries.

@coparks2012
Copy link

Were you able to resolve this issue? I am also unable to install with libtorch. I receive

configure: WARNING: cannot enable __PLUMED_HAS_LIBTORCH

When following the install directions directly (I think).

`#===create environment
conda create -n openmm_plumed_ml python=3.12.5
source activate openmm_plumed_ml

#===install plumed w/libtorch
cd /home/ubuntu/opt/
mkdir libtorch
mkdir libtorch/download
cd ./libtorch/download/
wget https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-1.13.1%2Bcpu.zip 
unzip libtorch-cxx11-abi-shared-with-deps-1.13.1+cpu.zip
LIBTORCH=${PWD}/libtorch

echo "export CPATH=${LIBTORCH}/include/torch/csrc/api/include/:${LIBTORCH}/include/:${LIBTORCH}/include/torch:$CPATH" >> ${LIBTORCH}/sourceme.sh
echo "export INCLUDE=${LIBTORCH}/include/torch/csrc/api/include/:${LIBTORCH}/include/:${LIBTORCH}/include/torch:$INCLUDE" >> ${LIBTORCH}/sourceme.sh
echo "export LIBRARY_PATH=${LIBTORCH}/lib:$LIBRARY_PATH" >> ${LIBTORCH}/sourceme.sh
echo "export LD_LIBRARY_PATH=${LIBTORCH}/lib:$LD_LIBRARY_PATH" >> ${LIBTORCH}/sourceme.sh
. ${LIBTORCH}/sourceme.sh

(open new terminal to ensure bashrc is update)

cd /home/ubuntu/opt/
mkdir /home/ubuntu/opt/plumed
mkdir /home/ubuntu/opt/plumed/download
cd /home/ubuntu/opt/plumed/download
wget https://github.com/plumed/plumed2/releases/download/v2.9.1/plumed-2.9.1.tgz
wget https://github.com/plumed/plumed2/releases/download/v2.9.1/plumed-doc-2.9.1.tgz
wget https://github.com/plumed/plumed2/releases/download/v2.9.1/plumed-src-2.9.1.tgz
wget https://github.com/plumed/plumed2/releases/download/v2.9.1/plumed-test-2.9.1.tgz
tar xfz plumed-2.9.1.tgz
cd plumed-2.9.1

sudo ./configure --enable-libtorch --enable-modules=all --prefix=/usr/local`

@danesnick
Copy link

danesnick commented Oct 10, 2024

Hi all,

I've also been experiencing similar issues trying to build this on the following configurations:

  1. Rocky Linux 9 Machine, GCC 11.4.1, Libtorch Cuda 12.4
  2. Ubuntu 22.04 + CUDA 12.4 Container (Apptainer/Singularity) + Libtorch cuda 12.4
    Has anyone been able to correctly link Libtorch to Plumed 2.9?

Can provide more details in a follow-up comment if requested.

@danesnick
Copy link

@Esenmira A colleague of mine pointed out that Libtorch 2.0.0 is not supported with Plumed 2.9, as stated on this page:
https://www.plumed.org/doc-master/user-doc/html/_installation.html#installation-libtorch

in the following warning:

Warning
Libtorch APIs are still in beta phase, so there might be breaking changes in newer versions. Currently, versions between 1.8.* and 2.0.0 have been tested. Please note that if you want to link a different version it might be necessary to manually specify the required libraries within LIBS in configure.

I've made further progress in the build steps by switching to an older libtorch, but still trying to resolve other issues.

@j0cross
Copy link

j0cross commented Nov 5, 2024

Hi guys,

OS: Centos 7.6
GCC:11.5.0
OMPI: 4.0.3
CUDA: 11.6
LIBTORCH: libtorch-static-with-deps-1.13.1%2Bcu116.zip

We were able to compile plumed with a proper libtorch support, in case you need it this is the command line used:

./configure --enable-libtorch --enable-modules=all -prefix=/install-path/plumed/2.9.2--gnu--11.5.0 --enable-mpi CXX=mpicxx --enable-debug CXXFLAGS="-O3 -D_GLIBCXX_USE_CXX11_ABI=0" CPPFLAGS="-I/install-path/libtorch/1.13.1-static_CUDA_11.6/include/torch/csrc/api/include/ \
-I/install-path/libtorch/1.13.1-static_CUDA_11.6/include/ \
-I/install-path/gsl/2.6--gnu--11.5.0/include \
-I/install-path/openblas/0.3.13/include/openblas/ \
-I/install-path/libtorch/1.13.1-static_CUDA_11.6/include/torch/ \
-I/install-path/fftw/3.3.8--gnu--11.5.0/include/" LDFLAGS="-L/install-path/libtorch/1.13.1-static_CUDA_11.6/lib \
-L/install-path/fftw/3.3.8--gnu--11.5.0/lib/ \
-L/install-path/fftw/3.3.8--gnu--11.5.0/lib64/ \
-L/install-path/gsl/2.6--gnu--11.5.0/lib \
-L/install-path/openblas/0.3.13/lib64/ \
-ltorch -lc10 -Wl,-rpath,/install-path/libtorch/1.13.1-static_CUDA_11.6/lib" 2>&1 | tee configure.log

We notice this error related to the fftw3 libraries, we didn't solve it yet:

checking fftw3.h usability... yes
checking fftw3.h presence... yes
checking for fftw3.h... yes
**checking for library containing fftw_execute... no
configure: WARNING: cannot enable __PLUMED_HAS_FFTW**

Plumed 2.9.X doesn't look for libtorch_cuda, it checks only the cpu libraries:
checking libtorch with -ltorch_cpu -lc10... yes

2.10a/b and 2.11-dev also checks the libtorch_cuda.

Hope this can help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants