Port of WW3 to S4 #424

DavidHuber-NOAA · 2021-06-25T13:38:23Z

The WW3 model should be ported to the S4 cluster to support development efforts of the GFS and GDAS. A port of version production/GFS.v16 is ongoing and will be tested soon.

DavidHuber-NOAA · 2021-08-26T14:15:17Z

I've completed an initial port of WW3 to S4. Almost all of the regression tests pass, with the exception of a few of the tests utilizing parmetis, suggesting a problem with the build of that library. I suspect that since WW3 is built with -march=ivybridge that parmetis must also be built with that flag. I will be attempting that later today and try the regression tests again.

Similarly, the GFS tests all pass, with the exception of control_atmwav, which crashes during the WW3 portion of that test. More details on that test can be found on the GFS port issue (ufs-community/ufs-weather-model#738).

aliabdolali · 2021-08-26T14:18:29Z

@DavidHuber-NOAA Thanks for the update. The metis and parmetis for WW3 on our RDHPCs are compiled following
https://github.com/NOAA-EMC/WW3/wiki/FAQs-page#how-to-install-Metis-and-Parmetis
How did you compile them on S4?

DavidHuber-NOAA · 2021-08-26T14:29:53Z

@aliabdolali Thanks for the link. I had compiled both without declaring CFLAGS or specifying cc or cxx in the make command, so it looks like it built with mpicc instead of mpiicc. I will give that a try.

DavidHuber-NOAA · 2021-09-01T18:01:16Z

After fixing the install of parmetis, those tests now pass.

However, I'm having issues with the OASIS tests and two of the ww3_multi tests.

I believe the issues with OASIS are partly related to #440, but we are also limited on S4 to running with srun. This causes an additional issue since the OASIS calls appear to be ~~heterogeneous~~ CORRECTION multiple-program-multiple-data model invocations (i.e. run_test#L1690). ~~I'm not sure how to mimic this with srun. Is this possible?~~ ~~I see now that this can be achieved just by calling srun twice. However, I still cannot get these tests to pass.~~ CORRECTION: I can see now that this is achieved by running srun --multi-prog. These tests are now completing correctly.

The two ww3_multi tests that fail are called with
run_test -b slurm -c s4.intel -S -T -s MPI -s NO_PDLIB -w work_ma1 -m grdset_a1 -f -p srun -n 24 -o all ../model11 ww3_tp2.17
run_test -b slurm -c s4.intel -S -T -s MPI -s PDLIB -w work_mc1 -m grdset_c1 -f -p srun -n 24 -o all ../model11 ww3_tp2.17
Both fail when attempting to read the input file. The former attempts to read regtests/ww3_tp2.17/input/ww3_multi_grdset_a1.inp, but fails to read line 9 at this line of code. The first 10 lines of ww3_multi_grdset_a1.inp are

$
$ Input file to run with Inlet grid
$
1 0 F 1 T T
$
$'points'
$
$
 'inla'  'native' 'native' 'native' 'no' 'no' 'no' 'no'   1  1  0.00 1.00  F
$

The read fails when it attempts to read in a total of 10 strings, but there are only 7. I'm not sure where this input file is generated or copied from, nor do I know if this is an issue with the port or the input file. Could you advise if this input is correct or if there is a bug here?

DavidHuber-NOAA · 2021-09-03T15:20:37Z

@aliabdolali As a sanity check, I ran the test mentioned above (run_test -b slurm -c s4.intel -S -T -s MPI -s NO_PDLIB -w work_ma1 -m grdset_a1 -f -p srun -n 24 -o all ../model11 ww3_tp2.17) on Hera and I am seeing the same failure. The log file can be found here: /scratch1/NESDIS/nesdis-rdo2/David.Huber/gw_dev/sorc/fv3gfs.fd/WW3/regtests/bin/matrix11.out, with the errors occurring on lines 1078-1084.

Since this seems to be an issue with the test, I'd like to go ahead and push forward with a PR on porting WW3 to S4 so porting of the UFS and the global workflow can continue. I can then open a new issue to continue testing at a later time.

aliabdolali · 2021-09-03T15:33:53Z

Thanks @DavidHuber-NOAA Let me check it. I will be back to you in 30 min.

aliabdolali · 2021-09-03T16:16:19Z

Hi @DavidHuber-NOAA
It is a known issue and the fix is identified.
#442
Please go ahead with your PR.
What about OASIS tests? have you managed them?

DavidHuber-NOAA · 2021-09-03T17:54:54Z

@aliabdolali All but one of the OASIS tests passed:
run_test -b slurm -c s4.intel -S -T -s OASACM3 -w work_OASACM3 -f -p srun -n 24/2 -o netcdf ../model11 ww3_tp2.14

When -n 24/2 is passed to srun, it returns an error: Invalid numeric value "24/2" for number of tasks. Is the intention of this test to run with 12 tasks?

aliabdolali · 2021-09-03T18:34:04Z

Yes, on some platforms, it is acceptable, while I have seen the opposite as well.
so if you change it to 12, does it work?

DavidHuber-NOAA · 2021-09-03T19:46:38Z

@aliabdolali Yes, the test passes using 12. Interestingly, on Hera, the same line (with -n 24/2) runs 24 tasks. I could change the line in matrix.base to
$rtst -s OASACM3 -w work_OASACM3 -f -p $mpi -n $(( $np / 2 )) -o netcdf $ww3 ww3_tp2.14
which should come out as 12 (confirmed on Hera and S4).

aliabdolali · 2021-09-03T20:13:12Z

@DavidHuber-NOAA I am going to make an issue, and we will fix it in matrix.base in a separate PR. Thanks for checking Hera.

DavidHuber-NOAA added the enhancement New feature or request label Jun 25, 2021

JessicaMeixner-NOAA assigned DavidHuber-NOAA Jun 25, 2021

DavidHuber-NOAA mentioned this issue Aug 6, 2021

Port global-workflow to S4 NOAA-EMC/global-workflow#138

Closed

91 tasks

DavidHuber-NOAA added a commit to DavidHuber-NOAA/WW3 that referenced this issue Aug 26, 2021

Initial port of the WW3 model to S4 (NOAA-EMC#424)

299a30c

DavidHuber-NOAA added a commit to DavidHuber-NOAA/WW3 that referenced this issue Sep 1, 2021

Regression test scriptfixes for S4. (NOAA-EMC#424)

35daf1d

DavidHuber-NOAA added a commit to DavidHuber-NOAA/WW3 that referenced this issue Sep 3, 2021

Added support for OASIS tests with srun. (NOAA-EMC#424)

990aaec

DavidHuber-NOAA added a commit to DavidHuber-NOAA/WW3 that referenced this issue Sep 3, 2021

Reverted the OASACM3 test call in matrix.base. (NOAA-EMC#424)

ad227a3

DavidHuber-NOAA mentioned this issue Sep 3, 2021

Port WW3 to S4 #458

Merged

aliabdolali closed this as completed in #458 Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port of WW3 to S4 #424

Port of WW3 to S4 #424

DavidHuber-NOAA commented Jun 25, 2021

DavidHuber-NOAA commented Aug 26, 2021

aliabdolali commented Aug 26, 2021 •

edited

Loading

DavidHuber-NOAA commented Aug 26, 2021

DavidHuber-NOAA commented Sep 1, 2021 •

edited

Loading

DavidHuber-NOAA commented Sep 3, 2021

aliabdolali commented Sep 3, 2021

aliabdolali commented Sep 3, 2021

DavidHuber-NOAA commented Sep 3, 2021

aliabdolali commented Sep 3, 2021

DavidHuber-NOAA commented Sep 3, 2021

aliabdolali commented Sep 3, 2021

Port of WW3 to S4 #424

Port of WW3 to S4 #424

Comments

DavidHuber-NOAA commented Jun 25, 2021

DavidHuber-NOAA commented Aug 26, 2021

aliabdolali commented Aug 26, 2021 • edited Loading

DavidHuber-NOAA commented Aug 26, 2021

DavidHuber-NOAA commented Sep 1, 2021 • edited Loading

DavidHuber-NOAA commented Sep 3, 2021

aliabdolali commented Sep 3, 2021

aliabdolali commented Sep 3, 2021

DavidHuber-NOAA commented Sep 3, 2021

aliabdolali commented Sep 3, 2021

DavidHuber-NOAA commented Sep 3, 2021

aliabdolali commented Sep 3, 2021

aliabdolali commented Aug 26, 2021 •

edited

Loading

DavidHuber-NOAA commented Sep 1, 2021 •

edited

Loading