Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation fails on Cheyenne with GNU+MPT combination #7

Closed
uturuncoglu opened this issue Dec 4, 2019 · 76 comments
Closed

Installation fails on Cheyenne with GNU+MPT combination #7

uturuncoglu opened this issue Dec 4, 2019 · 76 comments
Assignees
Labels
bug Something isn't working

Comments

@uturuncoglu
Copy link
Collaborator

uturuncoglu commented Dec 4, 2019

I am trying to install NCEP LIBS with following module combination on Cheyenne but

  1. ncarenv/1.3
  2. gnu/9.1.0
  3. mpt/2.19
  4. netcdf-mpi/4.7.1
  5. pnetcdf/1.11.1
  6. ncarcompilers/0.5.0
  7. esmf-8.0.0-ncdfio-mpt-O
  8. cmake/3.14.4

it fails with following error

$ make VERBOSE=1
[  0%] Built target netcdf-fortran
[  5%] Built target NCEPLIBS-landsfcutil
[  5%] Built target hdf5
[  9%] Built target NCEPLIBS-g2
[ 10%] Performing build step for 'NCEPLIBS-nemsio'
[ 16%] Building Fortran object CMakeFiles/nemsio_v2.2.3.dir/src/nemsio_module_mpi.f90.o
/glade/u/apps/ch/opt/ncarcompilers/0.5.0/gnu/9.1.0/gfortran  -I/glade/u/apps/ch/opt/mpt/2.19/include -I/glade/u/apps/ch/opt/mpt/2.19/include/../lib -I/glade/work/turuncu/UFS/NCEP_LIBS_ALL_GNU/build-all/NCEPLIBS-nemsio/src/NCEPLIBS-nemsio-build/include  -O2 -fconvert=big-endian -ffree-form -fbacktrace  -O3 -DNDEBUG -O3 -Jinclude   -O2 -fconvert=big-endian -ffree-form -fbacktrace  -c /glade/work/turuncu/UFS/NCEP_LIBS_ALL_GNU/NCEPLIBS-nemsio/src/nemsio_module_mpi.f90 -o CMakeFiles/nemsio_v2.2.3.dir/src/nemsio_module_mpi.f90.o
f951: Fatal Error: Reading module ‘mpi’ at line 1 column 2: Unexpected EOF
compilation terminated.
CMakeFiles/nemsio_v2.2.3.dir/build.make:75: recipe for target 'CMakeFiles/nemsio_v2.2.3.dir/src/nemsio_module_mpi.f90.o' failed
make[5]: *** [CMakeFiles/nemsio_v2.2.3.dir/src/nemsio_module_mpi.f90.o] Error 1
make[5]: Leaving directory '/glade/work/turuncu/UFS/NCEP_LIBS_ALL_GNU/build-all/NCEPLIBS-nemsio/src/NCEPLIBS-nemsio-build'
CMakeFiles/Makefile2:72: recipe for target 'CMakeFiles/nemsio_v2.2.3.dir/all' failed
make[4]: *** [CMakeFiles/nemsio_v2.2.3.dir/all] Error 2
make[4]: Leaving directory '/glade/work/turuncu/UFS/NCEP_LIBS_ALL_GNU/build-all/NCEPLIBS-nemsio/src/NCEPLIBS-nemsio-build'
Makefile:129: recipe for target 'all' failed
make[3]: *** [all] Error 2
make[3]: Leaving directory '/glade/work/turuncu/UFS/NCEP_LIBS_ALL_GNU/build-all/NCEPLIBS-nemsio/src/NCEPLIBS-nemsio-build'
CMakeFiles/NCEPLIBS-nemsio.dir/build.make:111: recipe for target 'NCEPLIBS-nemsio/src/NCEPLIBS-nemsio-stamp/NCEPLIBS-nemsio-build' failed
make[2]: *** [NCEPLIBS-nemsio/src/NCEPLIBS-nemsio-stamp/NCEPLIBS-nemsio-build] Error 2
make[2]: Leaving directory '/glade/work/turuncu/UFS/NCEP_LIBS_ALL_GNU/build-all'
CMakeFiles/Makefile2:666: recipe for target 'CMakeFiles/NCEPLIBS-nemsio.dir/all' failed
make[1]: *** [CMakeFiles/NCEPLIBS-nemsio.dir/all] Error 2
make[1]: Leaving directory '/glade/work/turuncu/UFS/NCEP_LIBS_ALL_GNU/build-all'
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

the commands that is used to install lib are followings

module purge
module load ncarenv/1.3
module load gnu/9.1.0
module load mpt/2.19
module load netcdf-mpi/4.7.1
module load pnetcdf/1.11.1
module load ncarcompilers/0.5.0
module load cmake
module use /glade/work/turuncu/PROGS/modulefiles/esmfpkgs/gnu/9.1.0
module load esmf-8.0.0-ncdfio-mpt-O
export ESMF_LIB=/glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/gnu/9.1.0/lib/libO/Linux.gfortran.64.mpt.default
export ESMF_INC=/glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/gnu/9.1.0/mod/modO/Linux.gfortran.64.mpt.default

git clone https://github.com/NOAA-EMC/NCEPLIBS.git NCEP_LIBS_ALL
cd NCEP_LIBS_ALL
git checkout origin/full-stack
git submodule init
git submodule sync
git submodule update --recursive
git submodule foreach git submodule init
git submodule foreach git submodule sync
git submodule foreach git submodule update

mkdir build-all
cd build-all
cmake -DMPITYPE=mpt -DCMAKE_INSTALL_PREFIX=$PWD/install ..
make -j 20
@climbfuji
Copy link
Collaborator

Is anyone working on this? Otherwise I'll take a look.

@mark-a-potts
Copy link
Collaborator

I just looked at this and I think it is because the mpi.mod module under mpt is not compatible with gnu 9.1.0. I was able to get the project to build using openmpi and gnu 9.1.0 instead of mpt, however.

@uturuncoglu
Copy link
Collaborator Author

@mark-a-potts We are using MPT on Cheyenne as a default MPI. Do you think that it still fails with MPT?

@climbfuji
Copy link
Collaborator

@uturuncoglu @mark-a-potts We have been building and testing the UFS successfully with GNU 8.3.0 and MPT - do you want to try if this works? Note also the discussion about compiler versions in issue #13.

@mark-a-potts
Copy link
Collaborator

mark-a-potts commented Dec 17, 2019 via email

@mvertens
Copy link
Collaborator

mvertens commented Dec 17, 2019 via email

@mark-a-potts
Copy link
Collaborator

mark-a-potts commented Dec 17, 2019 via email

@arunchawla-NOAA
Copy link
Collaborator

Hello

Has this workaround been tested ? If this is documented then can we close this ticket?

@arunchawla-NOAA
Copy link
Collaborator

@climbfuji Can this ticket be closed?

@climbfuji
Copy link
Collaborator

I haven't tested this myself yet.

@uturuncoglu
Copy link
Collaborator Author

uturuncoglu commented Dec 30, 2019

@mark-a-potts I am plaining to test the model in an another platform but i am getting following error, when i try to install NCEPLIBS on Stampede2.

Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
fatal: clone of 'git@github.com:NOAA-EMC/netcdf-c.git' into submodule path '/scratch/01118/tg803972/PROGS/NCEPLIBS.dec30/netcdf' failed

I think that it is related with the entry in the .gitmodules file. The netcdf is configured to use ssh but others are fine and use https.

...
[submodule "NCEPLIBS-post"]
        path = NCEPLIBS-post
        url = https://github.com/climbfuji/EMC_post
        branch = update_ufs_release_1p0_macos_gnu
[submodule "netcdf"]
        path = netcdf
        url = git@github.com:NOAA-EMC/netcdf-c.git
        branch = update_ufs_release_1p0_macos_gnu
[submodule "UFS_UTILS"]
        path = UFS_UTILS
        url = https://github.com/climbfuji/UFS_UTILS.git
        branch = update_ufs_release_1p0_macos_gnu
...

So, i think that netcdf also need to use https to allow cloning without Git ssh setup.

@uturuncoglu
Copy link
Collaborator Author

uturuncoglu commented Dec 30, 2019

@mark-a-potts NCEPLIBS-bufr also gives following error. The hash might be wrong.

Submodule path 'NCEPLIBS-bacio': checked out 'bf2f2261e9f425e04874205fc106ae6a52bb5bb8'
error: no such remote ref 0c5aaf0efc7b2562ba5b3d8ed3473db8921f95f8
Fetched in submodule path 'NCEPLIBS-bufr', but it did not contain 0c5aaf0efc7b2562ba5b3d8ed3473db8921f95f8. Direct fetching of that commit failed.

@mark-a-potts
Copy link
Collaborator

mark-a-potts commented Dec 30, 2019 via email

@mark-a-potts
Copy link
Collaborator

mark-a-potts commented Dec 30, 2019 via email

@Hang-Lei-NOAA
Copy link
Collaborator

Hang-Lei-NOAA commented Dec 30, 2019 via email

@uturuncoglu
Copy link
Collaborator Author

@mark-a-potts NCEPLIBS-bufr seems to use develop branch. Do i need to use update_ufs_release_1p0_macos_gnu?

  • develop
    remotes/origin/HEAD -> origin/develop
    remotes/origin/develop
    remotes/origin/macos_gnu_build
    remotes/origin/master
    remotes/origin/spack-build
    remotes/origin/update_ufs_release_1p0_macos_gnu

@uturuncoglu
Copy link
Collaborator Author

Last commit for develop is

commit afaa8a002a366ebadc74db9c469255c125cec309
Author: Dexin.Zhang <dexin.zhang@noaa.gov>
Date:   Fri Oct 4 19:49:54 2019 +0000

    Unified build (20191004) script/makefile bufr

@llpcarson
Copy link
Collaborator

llpcarson commented Dec 30, 2019 via email

@uturuncoglu uturuncoglu reopened this Jan 10, 2020
@climbfuji
Copy link
Collaborator

@uturuncoglu this is not an error but a misleading warning imo. The compilation and installation should proceed just fine. The only thing you need to set is ESMFMKFILE. And the installation needs to be compatible with the standard install on the NOAA platforms, i.e. instead of having lib/libO/Linux.intel.64.mpt.default, mod/modO/Linux.intel.64.mpt.default and bin/binO/Linux.intel.64.mpt.default, you just have bin, mod and lib sitting next to each other. This can be achieved by setting

export ESMF_INSTALL_BINDIR=bin
export ESMF_INSTALL_LIBDIR=lib
export ESMF_INSTALL_MODDIR=mod

when you compile ESMF.

@uturuncoglu
Copy link
Collaborator Author

@climbfuji with this configuration. i am getting error in chgres installation. it seems that it could not find the correct ESMF installation or its module files.

[ 18%] Building Fortran object sorc/chgres_cube.fd/CMakeFiles/chgres_cube.exe.dir/model_grid.F90.o
/glade/work/turuncu/UFS/NCEPLIBS_ALL.jan10/UFS_UTILS/sorc/chgres_cube.fd/model_grid.F90(58): error #7002: Error in opening the compiled module file.  Check INCLUDE paths.   [ESMF]
 use esmf
-----^
/glade/work/turuncu/UFS/NCEPLIBS_ALL.jan10/UFS_UTILS/sorc/chgres_cube.fd/model_grid.F90(59): error #7002: Error in opening the compiled module file.  Check INCLUDE paths.   [ESMF_LOGPUBLICMOD]
 use ESMF_LogPublicMod

That could be related with my previous concern of mod directory. In this case, if i look at **CMakeFiles/UFS_UTILS.dir/build.make ** the ESMF_LIB is fine but ESMF_INC is wrong. The file points to /glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libO/Linux.intel.64.mpt.default/../mod but this directory does not exist in my installation. So, i think it expects to find mod/ directory in the same level with lib/ but this is not the case always. I think that is ESMF_INC is set, the build system need to pick them like ESMF_LIB. If there is no ESMF_INC environment variable, then it could assume some directory. What do you think? If you want to try, let me know and i could send the instructions of my failed case.

@uturuncoglu
Copy link
Collaborator Author

I could also try with following options

export ESMF_INSTALL_BINDIR=bin
export ESMF_INSTALL_LIBDIR=lib
export ESMF_INSTALL_MODDIR=mod

but this is not the way that we follow generally and it requires to reinstall all the ESMF modules from scratch.

---------------------------------------------------------------------------------- /glade/work/turuncu/PROGS/modulefiles -----------------------------------------------------------------------------------
   esmfpkgs/gnu/9.1.0/esmf-8.0.0-ncdfio-mpt-g           esmfpkgs/intel/18.0.5/esmf-8.0.0-ncdfio-mpt-O    (D)    esmfpkgs/intel/19.0.2/esmf-8.1.0b05-ncdfio-mpiuni-g
   esmfpkgs/gnu/9.1.0/esmf-8.0.0-ncdfio-mpt-O           esmfpkgs/intel/19.0.2/esmf-8.0.0-ncdfio-mpiuni-g        esmfpkgs/intel/19.0.2/esmf-8.1.0b05-ncdfio-mpiuni-O
   esmfpkgs/gnu/9.1.0/esmf-8.1.0b05-ncdfio-mpt-g        esmfpkgs/intel/19.0.2/esmf-8.0.0-ncdfio-mpiuni-O        esmfpkgs/intel/19.0.2/esmf-8.1.0b05-ncdfio-mpt-g
   esmfpkgs/gnu/9.1.0/esmf-8.1.0b05-ncdfio-mpt-O (D)    esmfpkgs/intel/19.0.2/esmf-8.0.0-ncdfio-mpt-g           esmfpkgs/intel/19.0.2/esmf-8.1.0b05-ncdfio-mpt-O    (D)
   esmfpkgs/intel/18.0.5/esmf-8.0.0-ncdfio-mpt-g        esmfpkgs/intel/19.0.2/esmf-8.0.0-ncdfio-mpt-O

@climbfuji
Copy link
Collaborator

@uturuncoglu we could try to build in some mechanism that searches for mod directories underneath the ESMF top-level installation directory, or make use of the other cmake variables that are set by ESMF's own FindESMF.cmake module. Do you want me to try that?

It is unfortunate that the default installation paths for ESMF have all these nested directories with compiler and library name in it instead of following the linux standard installation tree.

@climbfuji
Copy link
Collaborator

Add-on: the only variable that is and needs to be set is ESMFMKFILE. This file is picked up and searched by ESMF's FindESMF.cmake macro. No other variable should be set or searched, because ESMFMKFILE is the standard recommended by the ESMF team and is also what is used by the UFS weather model.

@uturuncoglu
Copy link
Collaborator Author

If i look at the ESMFMKFILE, the ESMF_F90COMPILEPATHS variables shows correct path of ESMF modules.

/glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/mod/modO/Linux.intel.64.mpt.default

Is it possible to query this variable and set ESMF_INC? That would probably fix the issue and would be more generic solution without having special ESMF installation by setting environment variables.

@uturuncoglu
Copy link
Collaborator Author

It is also possible to pass ESMF_F90COMPILEPATHS variable as ESMF_INC but i am not sure.

@climbfuji
Copy link
Collaborator

Let me try - I will use /glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libg/Linux.intel.64.mpt.default/esmf.mk because your lib0 directory is empty.

@uturuncoglu
Copy link
Collaborator Author

Thanks. Let me know, if you need help. BTW, I could see the files in

/glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libO/Linux.intel.64.mpt.default/

esmf.mk  libesmf.a  libesmf_fullylinked.so  libesmf.so  libesmftrace_preload.so  libesmftrace_static.a  preload.sh

@uturuncoglu
Copy link
Collaborator Author

uturuncoglu commented Jan 13, 2020

I am able to install NCEPLIBS on Stampede2. I did not test it with the model yet but i'll do it today and let you know.

@climbfuji
Copy link
Collaborator

climbfuji commented Jan 14, 2020

@uturuncoglu The fix in NOAA-EMC/NCEPLIBS#19 should work for you on Cheyenne/Stampede/.... Please test. Thanks!

@arunchawla-NOAA
Copy link
Collaborator

@uturuncoglu and @climbfuji once confirmed that this is working can we close this ticket?

@uturuncoglu
Copy link
Collaborator Author

It is tested and works on Stampede. I did not install it to Cheyenne yet.

@uturuncoglu
Copy link
Collaborator Author

@climbfuji I have a problem with GNU on Cheyenne with the latest version of the library (hash for superbuild is 2458bc2),

-- Build files have been written to: /glade/work/turuncu/UFS/NCEP_LIBS_ALL.jan22/build_all_gnu/NCEPLIBS-nemsio/src/NCEPLIBS-nemsio-build
[ 43%] Performing build step for 'NCEPLIBS-nemsio'
Scanning dependencies of target nemsio_v2.2.3
[ 16%] Building Fortran object CMakeFiles/nemsio_v2.2.3.dir/src/nemsio_openclose.f90.o
[ 33%] Building Fortran object CMakeFiles/nemsio_v2.2.3.dir/src/nemsio_read.f90.o
[ 50%] Building Fortran object CMakeFiles/nemsio_v2.2.3.dir/src/nemsio_write.f90.o
[ 66%] Building Fortran object CMakeFiles/nemsio_v2.2.3.dir/src/nemsio_module.f90.o
[ 83%] Building Fortran object CMakeFiles/nemsio_v2.2.3.dir/src/nemsio_module_mpi.f90.o
f951: Fatal Error: Reading module ‘mpi’ at line 1 column 2: Unexpected EOF
compilation terminated.
CMakeFiles/nemsio_v2.2.3.dir/build.make:75: recipe for target 'CMakeFiles/nemsio_v2.2.3.dir/src/nemsio_module_mpi.f90.o' failed
make[5]: *** [CMakeFiles/nemsio_v2.2.3.dir/src/nemsio_module_mpi.f90.o] Error 1
CMakeFiles/Makefile2:72: recipe for target 'CMakeFiles/nemsio_v2.2.3.dir/all' failed
make[4]: *** [CMakeFiles/nemsio_v2.2.3.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make[3]: *** [all] Error 2
CMakeFiles/NCEPLIBS-nemsio.dir/build.make:111: recipe for target 'NCEPLIBS-nemsio/src/NCEPLIBS-nemsio-stamp/NCEPLIBS-nemsio-build' failed
make[2]: *** [NCEPLIBS-nemsio/src/NCEPLIBS-nemsio-stamp/NCEPLIBS-nemsio-build] Error 2
CMakeFiles/Makefile2:667: recipe for target 'CMakeFiles/NCEPLIBS-nemsio.dir/all' failed
make[1]: *** [CMakeFiles/NCEPLIBS-nemsio.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

This is my environment,

Currently Loaded Modules:
  1) ncarenv/1.3   2) cmake/3.14.4   3) esmf-8.0.0-ncdfio-mpt-O   4) gnu/9.1.0   5) openblas/0.3.6   6) mpt/2.19   7) netcdf/4.7.3   8) ncarcompilers/0.5.0

@uturuncoglu
Copy link
Collaborator Author

@climbfuji I think i solved it. I just need to use following,

CC=mpicc FC=mpif90 CXX=mpicxx cmake -DMPITYPE=mpt -DCMAKE_INSTALL_PREFIX=$PWD/install ..

@climbfuji
Copy link
Collaborator

Yes, I think this was the solution/conclusion we came up with when the initial build failed on Cheyenne. Should be part of the documentation (do we have a place for machine-specific instructions)?

@uturuncoglu
Copy link
Collaborator Author

You mean, in the application or NCEPLIBS documentation?

@climbfuji
Copy link
Collaborator

I think NCEPLIBS - there should be general instructions followed by some machine-specific instructions. We will work on that.

@kgerheiser
Copy link

What is the problem? Compiling using the regular compiler and not mpicc/mpif90? nemsio includes MPI_Frotran_INCLUDE_DIRS so it seems strange that it doesn't find the module.

@uturuncoglu
Copy link
Collaborator Author

Yes, that would be great. Thanks for your help!

@climbfuji
Copy link
Collaborator

On Cheyenne, CISL use their own in-house MPI wrappers around MPT and other MPI implementations. I am not exactly sure why they are doing this (maybe because their MPT installation requires them to fix issues similar to what Ufuk was reporting), but definitely there is something weird with the system.

@uturuncoglu
Copy link
Collaborator Author

@climbfuji BTW, are you testing NCEPLIBS on Mac, if yes could you share your way to install libs. We would like to also test it in our side.

@climbfuji
Copy link
Collaborator

@climbfuji BTW, are you testing NCEPLIBS on Mac, if yes could you share your way to install libs. We would like to also test it in our side.

Yes, I am testing them. I need to ask you to hold off for a few days - I am still ironing out differences and best ways, looking at issues with the post-processor and the different macOS versions (Mojave versus Catalina). If you are looking for work, we also need to have folks working on various Linux distributions: Redhat/CentOS (7 and 8), Ubuntu (which versions?), possibly others.

@uturuncoglu
Copy link
Collaborator Author

We just need to understand the configuration on Mac and Linux to define those platforms in CIME as much as possible. If you have documentation about those installation, that would be great for us.

@rsdunlapiv
Copy link
Collaborator

Closing and opening a new ticket for the Mac/Linux configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

9 participants