Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

b4b issue on mww3_test_03 with grdset_d2 #144

Open
mickaelaccensi opened this issue Jan 13, 2020 · 1 comment
Open

b4b issue on mww3_test_03 with grdset_d2 #144

mickaelaccensi opened this issue Jan 13, 2020 · 1 comment
Labels
help wanted Extra attention is needed

Comments

@mickaelaccensi
Copy link
Collaborator

mickaelaccensi commented Jan 13, 2020

it is not b4b when you change the number of mpi procs
you could reproduce it by running this regtest with values of mpi tasks multiple of the number of cpu per node
(here I have 28 cpus per node), so I ran -n=$N with N = 28, 56, 84, 112

bin/run_test -c datarmor_intel_debug -m grdset_d2 -n $N -p $MPI_LAUNCH -s PR3_UQ_MPI -w work_d2 -N -o netcdf ../model mww3_test_03

the differences are on all the out_grd.*, out_pnt.*and netcdf output files
this is due to load balancing
otherwise it's always b4b if you run twice the same test

@mickaelaccensi mickaelaccensi changed the title b4b issue on mww3_test_03 with grdset_d2 b4b issue on mww3_test_03 with grdset_d2 Jan 13, 2020
@mickaelaccensi mickaelaccensi added the help wanted Extra attention is needed label Jan 13, 2020
@JessicaMeixner-NOAA
Copy link
Collaborator

Moving comments from @ajhenrique from VLab issue 45652 on 10/5/18:

We have enough evidence on how to achieve bit reproducibility via the shortcut of grid redesign. 
This does not totally fix theissues in the code when using more complex grid configurations and 
load balancing strategies. For now, we solved the short-term issue of achieving required bit 
reproducibility for our operational code.

In the process of investigating options to modify the WW3 code to achieve bit reproducibility, MPI_BARRIERS
were attempted in parts of the code that manage the exchange of boundary data between grids. 
The logic in these sections do follow follow sequential processing of boundaries, and may have different
 exchange sequences in independent identical runs, which may lead to results that are not bit identical. 
Such MPI barriers did increase bit reproducibility, but no completely solve the issue, indicating a potential 
path to follow if this pursuit becomes relevant again. The caveat of MPI barriers was increasing substantially 
run time, something that would not fly in operations.

MPI barriers were added successfully to reduce the bit reproducibility issue in the code wmwavemd.ftn, 
sections 6.a.2 and 6.b as follows:

!
! 6.a.2 Point reached, set flag for all in group and cycle
!
IF ( FLAGOK ) THEN
!
! Add barriers to ensure that all grids are ready and all the data has
! been disseminated to all processes
!
call mpi_barrier (mpi_comm_mwave, ierr_mpi)

DO JJJ=1, INGRP(J,0)
                          FLEQOK(INGRP(J,JJJ)) = .TRUE.
                          END DO
                        DONE      = .TRUE.
!
                        IF ( INGRP(J,0) .GT. 1 ) GOTO 1111
                      END IF
!
                  END IF
!
! 6.b Call gathering routine, reset FLEQOK and cycle
!
                IF ( .NOT.FLEQOK(I) .AND. .NOT.PREGTE(I) ) THEN
                    IF ( MPI_COMM_GRD.NE.MPI_COMM_NULL )    &
                         CALL WMIOEG (I,FLAG)
                    PREGTE(I) = .TRUE.
                  END IF
!
                  call mpi_barrier (mpi_comm_mwave, ierr_mpi)
!
                IF ( FLEQOK(I) ) THEN
                    IF ( MPI_COMM_GRD.NE.MPI_COMM_NULL )    &
                         CALL WMIOEG ( I )
                    PREGTE(I) = .FALSE.
                    GRSTAT(I) = 5
                    FLEQOK(I) = .FALSE.
                    DONE      = .TRUE.
                  END IF
!
                  call mpi_barrier (mpi_comm_mwave, ierr_mpi)
!
! 6.c Stage data

@JessicaMeixner-NOAA JessicaMeixner-NOAA mentioned this issue Jan 23, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants