-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Installation fails on Cheyenne with GNU+MPT combination #7
Comments
Is anyone working on this? Otherwise I'll take a look. |
I just looked at this and I think it is because the mpi.mod module under mpt is not compatible with gnu 9.1.0. I was able to get the project to build using openmpi and gnu 9.1.0 instead of mpt, however. |
@mark-a-potts We are using MPT on Cheyenne as a default MPI. Do you think that it still fails with MPT? |
@uturuncoglu @mark-a-potts We have been building and testing the UFS successfully with GNU 8.3.0 and MPT - do you want to try if this works? Note also the discussion about compiler versions in issue #13. |
I was in the midst of testing MPT with gnu 8.3.0 when I got booted off
of Cheyenne for taking up too much of the head node. I think that the
problem is with the MPT module on Cheyenne not being compatible with gnu
9.1.0, though.
-Mark
On 12/16/19 5:25 PM, Ufuk Turunçoğlu wrote:
@mark-a-potts <https://github.com/mark-a-potts> We are using MPT on
Cheyenne as a default MPI. Do you think that it still fails with MPT?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7?email_source=notifications&email_token=AH4Q2UV22URTE7CEVUFGPKDQY7533A5CNFSM4JVQC6UKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHAKNXY#issuecomment-566273759>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AH4Q2US5UIFO5PC4CVE2UPTQY7533ANCNFSM4JVQC6UA>.
--
Mark A. Potts, Ph.D.
Sr. HPC Software Developer
RedLine Performance Solutions, LLC
Phone 202-744-9469
Mark.Potts@noaa.gov
mpotts@redlineperf.com
|
@mark-a-potts - you need to use *qcmd* on cheyenne in order not to be
booted off. See
https://dailyb.cisl.ucar.edu/bulletins/cisl-adds-qcmd-script-launching-resource-intensive-compilation-jobs
.
…On Mon, Dec 16, 2019 at 7:46 PM Mark Potts ***@***.***> wrote:
I was in the midst of testing MPT with gnu 8.3.0 when I got booted off
of Cheyenne for taking up too much of the head node. I think that the
problem is with the MPT module on Cheyenne not being compatible with gnu
9.1.0, though.
-Mark
On 12/16/19 5:25 PM, Ufuk Turunçoğlu wrote:
>
> @mark-a-potts <https://github.com/mark-a-potts> We are using MPT on
> Cheyenne as a default MPI. Do you think that it still fails with MPT?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <
#7?email_source=notifications&email_token=AH4Q2UV22URTE7CEVUFGPKDQY7533A5CNFSM4JVQC6UKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHAKNXY#issuecomment-566273759>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AH4Q2US5UIFO5PC4CVE2UPTQY7533ANCNFSM4JVQC6UA
>.
>
--
Mark A. Potts, Ph.D.
Sr. HPC Software Developer
RedLine Performance Solutions, LLC
Phone 202-744-9469
***@***.***
***@***.***
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#7?email_source=notifications&email_token=AB4XCEZ2PAYDBTQKSCNHAYDQZA4QHA5CNFSM4JVQC6UKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHA6KMQ#issuecomment-566355250>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB4XCE2M7MTNALG6SXRCGFLQZA4QHANCNFSM4JVQC6UA>
.
|
Okay, I think I have figured out the fix/workaround. To get the right
includes, the build needs to use the mpif90 wrapper from mpt rather than
gfortran. So, if you set the following environment variables before
running the cmake command, things seem to work (for gnu 8.3.0, at least)--
export FC=mpif90
export CC=mpicc
export CXX=mpicxx
cmake -DMPITYPE=mpt -DCMAKE_INSTALL_PREFIX=$PWD/install ..
make
-M
On 12/16/19 5:38 PM, Dom Heinzeller wrote:
@uturuncoglu <https://github.com/uturuncoglu> @mark-a-potts
<https://github.com/mark-a-potts> We have been building and testing
the UFS successfully with GNU 8.3.0 and MPT - do you want to try if
this works? Note also the discussion about compiler versions in issue
#13 <#13>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7?email_source=notifications&email_token=AH4Q2UQ3SLDX5AQDILUNK6TQY77QFA5CNFSM4JVQC6UKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHALX3A#issuecomment-566279148>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AH4Q2UWRIZO2J22US67WTITQY77QFANCNFSM4JVQC6UA>.
--
Mark A. Potts, Ph.D.
Sr. HPC Software Developer
RedLine Performance Solutions, LLC
Phone 202-744-9469
Mark.Potts@noaa.gov
mpotts@redlineperf.com
|
Hello Has this workaround been tested ? If this is documented then can we close this ticket? |
@climbfuji Can this ticket be closed? |
I haven't tested this myself yet. |
@mark-a-potts I am plaining to test the model in an another platform but i am getting following error, when i try to install NCEPLIBS on Stampede2.
I think that it is related with the entry in the .gitmodules file. The netcdf is configured to use ssh but others are fine and use https.
So, i think that netcdf also need to use https to allow cloning without Git ssh setup. |
@mark-a-potts NCEPLIBS-bufr also gives following error. The hash might be wrong.
|
We should decide on using either ssh or https for the submodules (I
prefer ssh), but you should be able to pull the submodule if you upload
your id_rsa.pub key from $HOME/.ssh on stampede to your github account.
Alternatively, you can change the url in .gitmodules to use the https://
nomenclature instead of ssh.
-M
On 12/30/19 2:13 PM, Ufuk Turunçoğlu wrote:
@mark-a-potts <https://github.com/mark-a-potts> I am plaining to test
the model another platform but i am getting following error, when i
try to install NCEPLIBS on Stampede2.
|Permission denied (publickey). fatal: Could not read from remote
repository. Please make sure you have the correct access rights and
the repository exists. fatal: clone of
***@***.***:NOAA-EMC/netcdf-c.git' into submodule path
'/scratch/01118/tg803972/PROGS/NCEPLIBS.dec30/netcdf' failed |
I think that it is related with the entry in the *.gitmodules* file.
The *netcdf* is configured to use ssh but others are fine and used https.
|... [submodule "NCEPLIBS-post"] path = NCEPLIBS-post url =
https://github.com/climbfuji/EMC_post branch =
update_ufs_release_1p0_macos_gnu [submodule "netcdf"] path = netcdf
url = ***@***.***:NOAA-EMC/netcdf-c.git branch =
update_ufs_release_1p0_macos_gnu [submodule "UFS_UTILS"] path =
UFS_UTILS url = https://github.com/climbfuji/UFS_UTILS.git branch =
update_ufs_release_1p0_macos_gnu ... |
So, i think that netcdf also need to use https to allow cloning
without Git ssh setup.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7?email_source=notifications&email_token=AH4Q2UWSXPCC7LCRFDPVEBTQ3JB4NA5CNFSM4JVQC6UKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH27RDI#issuecomment-569768077>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AH4Q2UVIBBLAAEK2JQI7SKTQ3JB4NANCNFSM4JVQC6UA>.
--
Mark A. Potts, Ph.D.
Sr. HPC Software Developer
RedLine Performance Solutions, LLC
Phone 202-744-9469
Mark.Potts@noaa.gov
mpotts@redlineperf.com
|
Try running "git remote update" from NCEPLIBS-bufr. I just pushed that
commit up to github earlier today, so it is probably not in your local repo.
-M
On 12/30/19 2:18 PM, Ufuk Turunçoğlu wrote:
@mark-a-potts <https://github.com/mark-a-potts> NCEPLIBS-bufr also
gives following error. The hash might be wrong.
|Submodule path 'NCEPLIBS-bacio': checked out
'bf2f2261e9f425e04874205fc106ae6a52bb5bb8' error: no such remote ref
0c5aaf0efc7b2562ba5b3d8ed3473db8921f95f8 Fetched in submodule path
'NCEPLIBS-bufr', but it did not contain
0c5aaf0efc7b2562ba5b3d8ed3473db8921f95f8. Direct fetching of that
commit failed. |
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7?email_source=notifications&email_token=AH4Q2UWG7HIG2KKJE5C2COTQ3JCPLA5CNFSM4JVQC6UKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH274TA#issuecomment-569769548>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AH4Q2UVZCYWGJBKU4JSSQADQ3JCPLANCNFSM4JVQC6UA>.
--
Mark A. Potts, Ph.D.
Sr. HPC Software Developer
RedLine Performance Solutions, LLC
Phone 202-744-9469
Mark.Potts@noaa.gov
mpotts@redlineperf.com
|
For all libraries, you can use the ufs release branch rather than a single
head.
I suggest that you directly checkout the ufs_release_v1.0 branch under
the NCEPLIBS-bufr
repo.
Kyle has recently checked that all ufs_release branches under each NCEPLIBS
repo is working identical with the latest operational updates.
…On Mon, Dec 30, 2019 at 2:24 PM Mark Potts ***@***.***> wrote:
We should decide on using either ssh or https for the submodules (I
prefer ssh), but you should be able to pull the submodule if you upload
your id_rsa.pub key from $HOME/.ssh on stampede to your github account.
Alternatively, you can change the url in .gitmodules to use the https://
nomenclature instead of ssh.
-M
On 12/30/19 2:13 PM, Ufuk Turunçoğlu wrote:
>
> @mark-a-potts <https://github.com/mark-a-potts> I am plaining to test
> the model another platform but i am getting following error, when i
> try to install NCEPLIBS on Stampede2.
>
> |Permission denied (publickey). fatal: Could not read from remote
> repository. Please make sure you have the correct access rights and
> the repository exists. fatal: clone of
> ***@***.***:NOAA-EMC/netcdf-c.git' into submodule path
> '/scratch/01118/tg803972/PROGS/NCEPLIBS.dec30/netcdf' failed |
>
> I think that it is related with the entry in the *.gitmodules* file.
> The *netcdf* is configured to use ssh but others are fine and used https.
>
> |... [submodule "NCEPLIBS-post"] path = NCEPLIBS-post url =
> https://github.com/climbfuji/EMC_post branch =
> update_ufs_release_1p0_macos_gnu [submodule "netcdf"] path = netcdf
> url = ***@***.***:NOAA-EMC/netcdf-c.git branch =
> update_ufs_release_1p0_macos_gnu [submodule "UFS_UTILS"] path =
> UFS_UTILS url = https://github.com/climbfuji/UFS_UTILS.git branch =
> update_ufs_release_1p0_macos_gnu ... |
>
> So, i think that netcdf also need to use https to allow cloning
> without Git ssh setup.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <
#7?email_source=notifications&email_token=AH4Q2UWSXPCC7LCRFDPVEBTQ3JB4NA5CNFSM4JVQC6UKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH27RDI#issuecomment-569768077>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AH4Q2UVIBBLAAEK2JQI7SKTQ3JB4NANCNFSM4JVQC6UA
>.
>
--
Mark A. Potts, Ph.D.
Sr. HPC Software Developer
RedLine Performance Solutions, LLC
Phone 202-744-9469
***@***.***
***@***.***
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#7?email_source=notifications&email_token=AKWSMFHAGHSEYC3RDF4XDNTQ3JDHBA5CNFSM4JVQC6UKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH3AJFI#issuecomment-569771157>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWSMFCMD6Q3WGC4FOYUGI3Q3JDHBANCNFSM4JVQC6UA>
.
|
@mark-a-potts NCEPLIBS-bufr seems to use develop branch. Do i need to use update_ufs_release_1p0_macos_gnu?
|
Last commit for develop is
|
Mark - My recommendation would be to use the https access, so that
anonymous users can clone the submodules without requiring a GitHub
account. Makes it much simpler to teach beginning students, as well!
Laurie
On Mon, Dec 30, 2019 at 12:24 PM Mark Potts <notifications@github.com>
wrote:
… We should decide on using either ssh or https for the submodules (I
prefer ssh), but you should be able to pull the submodule if you upload
your id_rsa.pub key from $HOME/.ssh on stampede to your github account.
Alternatively, you can change the url in .gitmodules to use the https://
nomenclature instead of ssh.
-M
On 12/30/19 2:13 PM, Ufuk Turunçoğlu wrote:
>
> @mark-a-potts <https://github.com/mark-a-potts> I am plaining to test
> the model another platform but i am getting following error, when i
> try to install NCEPLIBS on Stampede2.
>
> |Permission denied (publickey). fatal: Could not read from remote
> repository. Please make sure you have the correct access rights and
> the repository exists. fatal: clone of
> ***@***.***:NOAA-EMC/netcdf-c.git' into submodule path
> '/scratch/01118/tg803972/PROGS/NCEPLIBS.dec30/netcdf' failed |
>
> I think that it is related with the entry in the *.gitmodules* file.
> The *netcdf* is configured to use ssh but others are fine and used https.
>
> |... [submodule "NCEPLIBS-post"] path = NCEPLIBS-post url =
> https://github.com/climbfuji/EMC_post branch =
> update_ufs_release_1p0_macos_gnu [submodule "netcdf"] path = netcdf
> url = ***@***.***:NOAA-EMC/netcdf-c.git branch =
> update_ufs_release_1p0_macos_gnu [submodule "UFS_UTILS"] path =
> UFS_UTILS url = https://github.com/climbfuji/UFS_UTILS.git branch =
> update_ufs_release_1p0_macos_gnu ... |
>
> So, i think that netcdf also need to use https to allow cloning
> without Git ssh setup.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <
#7?email_source=notifications&email_token=AH4Q2UWSXPCC7LCRFDPVEBTQ3JB4NA5CNFSM4JVQC6UKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH27RDI#issuecomment-569768077>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AH4Q2UVIBBLAAEK2JQI7SKTQ3JB4NANCNFSM4JVQC6UA
>.
>
--
Mark A. Potts, Ph.D.
Sr. HPC Software Developer
RedLine Performance Solutions, LLC
Phone 202-744-9469
***@***.***
***@***.***
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#7?email_source=notifications&email_token=AB2OWIXNWMRB7JXRNTQFQMTQ3JDHDA5CNFSM4JVQC6UKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH3AJFI#issuecomment-569771157>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB2OWIXPIHO7JTMZ3N6U4O3Q3JDHDANCNFSM4JVQC6UA>
.
|
@uturuncoglu this is not an error but a misleading warning imo. The compilation and installation should proceed just fine. The only thing you need to set is ESMFMKFILE. And the installation needs to be compatible with the standard install on the NOAA platforms, i.e. instead of having lib/libO/Linux.intel.64.mpt.default, mod/modO/Linux.intel.64.mpt.default and bin/binO/Linux.intel.64.mpt.default, you just have bin, mod and lib sitting next to each other. This can be achieved by setting
when you compile ESMF. |
@climbfuji with this configuration. i am getting error in chgres installation. it seems that it could not find the correct ESMF installation or its module files.
That could be related with my previous concern of mod directory. In this case, if i look at **CMakeFiles/UFS_UTILS.dir/build.make ** the ESMF_LIB is fine but ESMF_INC is wrong. The file points to /glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libO/Linux.intel.64.mpt.default/../mod but this directory does not exist in my installation. So, i think it expects to find mod/ directory in the same level with lib/ but this is not the case always. I think that is ESMF_INC is set, the build system need to pick them like ESMF_LIB. If there is no ESMF_INC environment variable, then it could assume some directory. What do you think? If you want to try, let me know and i could send the instructions of my failed case. |
I could also try with following options
but this is not the way that we follow generally and it requires to reinstall all the ESMF modules from scratch.
|
@uturuncoglu we could try to build in some mechanism that searches for mod directories underneath the ESMF top-level installation directory, or make use of the other cmake variables that are set by ESMF's own FindESMF.cmake module. Do you want me to try that? It is unfortunate that the default installation paths for ESMF have all these nested directories with compiler and library name in it instead of following the linux standard installation tree. |
Add-on: the only variable that is and needs to be set is ESMFMKFILE. This file is picked up and searched by ESMF's FindESMF.cmake macro. No other variable should be set or searched, because ESMFMKFILE is the standard recommended by the ESMF team and is also what is used by the UFS weather model. |
If i look at the ESMFMKFILE, the ESMF_F90COMPILEPATHS variables shows correct path of ESMF modules.
Is it possible to query this variable and set ESMF_INC? That would probably fix the issue and would be more generic solution without having special ESMF installation by setting environment variables. |
It is also possible to pass ESMF_F90COMPILEPATHS variable as ESMF_INC but i am not sure. |
Let me try - I will use /glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libg/Linux.intel.64.mpt.default/esmf.mk because your lib0 directory is empty. |
Thanks. Let me know, if you need help. BTW, I could see the files in /glade/work/turuncu/PROGS/esmf/8.0.0/mpt/2.19/intel/19.0.2/lib/libO/Linux.intel.64.mpt.default/
|
I am able to install NCEPLIBS on Stampede2. I did not test it with the model yet but i'll do it today and let you know. |
@uturuncoglu The fix in NOAA-EMC/NCEPLIBS#19 should work for you on Cheyenne/Stampede/.... Please test. Thanks! |
@uturuncoglu and @climbfuji once confirmed that this is working can we close this ticket? |
It is tested and works on Stampede. I did not install it to Cheyenne yet. |
@climbfuji I have a problem with GNU on Cheyenne with the latest version of the library (hash for superbuild is 2458bc2),
This is my environment,
|
@climbfuji I think i solved it. I just need to use following,
|
Yes, I think this was the solution/conclusion we came up with when the initial build failed on Cheyenne. Should be part of the documentation (do we have a place for machine-specific instructions)? |
You mean, in the application or NCEPLIBS documentation? |
I think NCEPLIBS - there should be general instructions followed by some machine-specific instructions. We will work on that. |
What is the problem? Compiling using the regular compiler and not |
Yes, that would be great. Thanks for your help! |
On Cheyenne, CISL use their own in-house MPI wrappers around MPT and other MPI implementations. I am not exactly sure why they are doing this (maybe because their MPT installation requires them to fix issues similar to what Ufuk was reporting), but definitely there is something weird with the system. |
@climbfuji BTW, are you testing NCEPLIBS on Mac, if yes could you share your way to install libs. We would like to also test it in our side. |
Yes, I am testing them. I need to ask you to hold off for a few days - I am still ironing out differences and best ways, looking at issues with the post-processor and the different macOS versions (Mojave versus Catalina). If you are looking for work, we also need to have folks working on various Linux distributions: Redhat/CentOS (7 and 8), Ubuntu (which versions?), possibly others. |
We just need to understand the configuration on Mac and Linux to define those platforms in CIME as much as possible. If you have documentation about those installation, that would be great for us. |
Closing and opening a new ticket for the Mac/Linux configuration. |
I am trying to install NCEP LIBS with following module combination on Cheyenne but
it fails with following error
the commands that is used to install lib are followings
The text was updated successfully, but these errors were encountered: