Skip to content

Commit

Permalink
ice_dyn_vp: allow for bit-for-bit reproducibility under bfbflag (#774)
Browse files Browse the repository at this point in the history
* doc: fix typo in index (bfbflag)

* doc: correct default value of 'maxits_nonlin'

The "Table of namelist options" in the user guide lists 'maxits_nonlin'
as having a default value of 1000, whereas its actual default is 4, both
in the namelist and in 'ice_init.F90'. This has been the case since the
original implementation of the implicit solver in f7fd063 (dynamics: add
implicit VP solver (#491), 2020-09-22).

Fix the documentation.

* doc: VP solver is validated with OpenMP

When the implicit VP solver was added in f7fd063 (dynamics: add implicit
VP solver (#491), 2020-09-22), it had not yet been tested with OpenMP
enabled.

The OpenMP implementation was carefully reviewed and then fixed in
d1e972a (Update OMP (#680), 2022-02-18), which lead to all runs of the
'decomp' suite completing and all restart tests passing. The 'bfbcomp'
tests are still failing, but this is due to the code not using the CICE
global sum implementation correctly, which will be fixed in the next
commits.

Update the documentation accordingly.

* ice_dyn_vp: activate OpenMP in 'dyn_prep2' loop

When the OpenMP implementation was reviewed and fixed in d1e972a (Update
OMP (#680), 2022-02-18), the 'PRIVATE' clause of the OpenMP directive
for the loop where 'dyn_prep2' is called in 'implicit_solver' was
corrected in line with what was done in 'ice_dyn_evp', but OpenMP was
left unactivated for this loop (the 'TCXOMP' was not changed to a real
'OMP' directive).

Activate OpenMP for this loop. All runs and restart tests of the
'decomp_suite' still pass with this change.

* machines: eccc : add ICE_MACHINE_MAXRUNLENGTH to ppp[56]

* machines: eccc: use PBS-enabled OpenMPI for 'ppp6_gnu'

The system installation of OpenMPI at /usr/mpi/gcc/openmpi-4.1.2a1/ is
not compiled with support for PBS. This leads to failures as the MPI
runtime does not have the same view of the number of available processors
as the job scheduler.

Use our own build of OpenMPI, compiled with PBS support, for the
'ppp6_gnu'  environment, which uses OpenMPI.

* machines: eccc: set I_MPI_FABRICS=ofi

Intel MPI 2021.5.1, which comes with oneAPI 2022.1.2, seems to have an
intermittent bug where a call to 'MPI_Waitall' fails with:

    Abort(17) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Waitall: See the MPI_ERROR field in MPI_Status for the error code

and no core dump is produced. This affects at least these cases of the
'decomp' suite:

- *_*_restart_gx3_16x2x1x1x800_droundrobin
- *_*_restart_gx3_16x2x2x2x200_droundrobin

This was reported to Intel and they suggested setting the variable
'I_MPI_FABRICS' to 'ofi' (the default being 'shm:ofi' [1]). This
disables shared memory transport and indeeds fixes the failures.

Set this variable for all ECCC machine files using Intel MPI.

[1] https://www.intel.com/content/www/us/en/develop/documentation/mpi-developer-reference-linux/top/environment-variable-reference/environment-variables-for-fabrics-control/communication-fabrics-control.html

* machines: eccc: set I_MPI_CBWR for BASEGEN/BASECOM runs

Intel MPI, in contrast to OpenMPI (as far as I was able to test, and see
[1], [2]), does not (by default) guarantee that repeated runs of the same
code on the same machine with the same number of MPI ranks yield the
same results when collective operations (e.g. 'MPI_ALLREDUCE') are used.

Since the VP solver uses MPI_ALLREDUCE in its algorithm, this leads to
repeated runs of the code giving different answers, and baseline
comparing runs with code built from the same commit failing.

When generating a baseline or comparing against an existing baseline,
set the environment variable 'I_MPI_CBWR' to 1 for ECCC machine files
using Intel MPI [3], so that (processor) topology-aware collective
algorithms are not used and results are reproducible.

Note that we do not need to set this variable on robert or underhill, on
which jobs have exclusive node access and thus job placement (on
processors) is guaranteed to be reproducible.

[1] https://stackoverflow.com/a/45916859/
[2] https://scicomp.stackexchange.com/a/2386/
[3] https://www.intel.com/content/www/us/en/develop/documentation/mpi-developer-reference-linux/top/environment-variable-reference/i-mpi-adjust-family-environment-variables.html#i-mpi-adjust-family-environment-variables_GUID-A5119508-5588-4CF5-9979-8D60831D1411

* ice_dyn_vp: fgmres: exit early if right-hand-side vector is zero

If starting a run with with "ice_ic='none'" (no ice), the linearized
problem for the ice velocity A x = b will have b = 0, since all terms in
the right hand side vector will be zero:

- strint[xy] is zero because the velocity is zero
- tau[xy] is zero because the ocean velocity is also zero
- [uv]vel_init is zero
- strair[xy] is zero because the concentration is zero
- strtlt[xy] is zero because the ocean velocity is zero

We thus have a linear system A x = b with b=0, so we
must have x=0.

In the FGMRES linear solver, this special case is not taken into
account, and so we end up with an all-zero initial residual since
workspace_[xy] is also zero because of the all-zero initial guess
'sol[xy]', which corresponds to the initial ice velocity. This then
leads to a division by zero when normalizing the first Arnoldi vector.

Fix this special case by computing the norm of the right-hand-side
vector before starting the iterations, and exiting early if it is zero.
This is in line with the GMRES implementation in SciPy [1].

[1] https://github.com/scipy/scipy/blob/651a9b717deb68adde9416072c1e1d5aa14a58a1/scipy/sparse/linalg/_isolve/iterative.py#L620-L628

Close: phil-blain#42

* ice_dyn_vp: add global_norm, global_dot_product functions

The VP solver uses a linear solver, FGMRES, as part of the non-linear
iteration. The FGMRES algorithm involves computing the norm of a
distributed vector field, thus performing global sums.

These norms are computed by first summing the squared X and Y components
of a vector field in subroutine 'calc_L2norm_squared', summing these
over the local blocks, and then doing a global (MPI) sum using
'global_sum'.

This approach does not lead to reproducible results when the MPI
distribution, or the number of local blocks, is changed, for reasons
explained in the "Reproducible sums" section of the Developer Guide
(mostly, floating point addition is not associative). This was partly
pointed out in [1] but I failed to realize it at the time.

Introduce a new function, 'global_dot_product', to encapsulate the
computation of the dot product of two grid vectors, each split into two
arrays (for the X and Y components).

Compute the reduction locally as is done in 'calc_L2norm_squared', but
throw away the result and use the existing 'global_sum' function when
'bfbflag' is active, passing it the temporary array used to compute the
element-by-element product.

This approach avoids a performance regression from the added work done
in 'global_sum', such that non-bfbflag runs are as fast as before.

Note that since 'global_sum' loops on the whole array (and not just ice
points as 'global_dot_product'), make sure to zero-initialize the 'prod'
local array.

Also add a 'global_norm' function implemented using
'global_dot_product'. Both functions will be used in subsequent commits
to ensure bit-for-bit reproducibility.

* ice_dyn_vp: use global_{norm,dot_product} for bit-for-bit output reproducibility

Make the results of the VP solver reproducible if desired by refactoring
the code to use the subroutines 'global_norm' and 'global_dot_product'
added in the previous commit.

The same pattern appears in the FGMRES solver (subroutine 'fgmres'), the
preconditioner 'pgmres' which uses the same algorithm, and the
Classical and Modified Gram-Schmidt algorithms in 'orthogonalize'.

These modifications do not change the number of global sums in the
fgmres, pgmres and the MGS algorithm. For the CGS algorithm, there is
(in theory) a slight performance impact as 'global_dot_product' is
called inside the loop, whereas previously we called
'global_allreduce_sum' after the loop to compute all 'initer' sums at
the same time.

To keep that optimization, we would have to implement a new interface
'global_allreduce_sum' which would take an array of shape
(nx_block,ny_block,max_blocks,k) and sum over their first three
dimensions before performing the global reduction over the k dimension.

We choose to not go that route for now mostly because anyway the CGS
algorithm is (by default) only used for the PGMRES preconditioner, and
so the cost should be relatively low as 'initer' corresponds to
'dim_pgmres' in the namelist, which should be kept low for efficiency
(default 5).

These changes lead to bit-for-bit reproducibility (the decomp_suite
passes) when using 'precond=ident' and 'precond=diag' along with
'bfbflag=reprosum'. 'precond=pgmres' is still not bit-for-bit because
some halo updates are skipped for efficiency. This will be addressed in
a following commit.

[1] #491 (comment)

* ice_dyn_vp: do not skip halo updates in 'pgmres' under 'bfbflag'

The 'pgmres' subroutine implements a separate GMRES solver and is used
as a preconditioner for the FGMRES linear solver. Since it is only a
preconditioner, it was decided to skip the halo updates after computing
the matrix-vector product (in 'matvec'), for efficiency.

This leads to non-reproducibility since the content of the non-updated
halos depend on the block / MPI distribution.

Add the required halo updates, but only perform them when we are
explicitely asking for bit-for-bit global sums, i.e. when 'bfbflag' is
set to something else than 'not'.

Adjust the interfaces of 'pgmres' and 'precondition' (from which
'pgmres' is called) to accept 'halo_info_mask', since it is needed for
masked updates.

Closes #518

* ice_dyn_vp: use global_{norm,dot_product} for bit-for-bit log reproducibility

In the previous commits we ensured bit-for-bit reproducibility of the
outputs when using the VP solver.

Some global norms computed during the nonlinear iteration still use the
same non-reproducible pattern of summing over blocks locally before
performing the reduction. However, these norms are used only to monitor
the convergence in the log file, as well as to exit the iteration when
the required convergence level is reached ('nlres_norm'). Only
'nlres_norm' could (in theory) influence the output, but it is unlikely
that a difference due to floating point errors would influence the 'if
(nlres_norm < tol_nl)' condition used to exist the nonlinear iteration.

Change these remaining cases to also use 'global_norm', leading to
bit-for-bit log reproducibility.

* ice_dyn_vp: remove unused subroutine and cleanup interfaces

The previous commit removed the last caller of 'calc_L2norm_squared'.
Remove the subroutine.

Also, do not compute 'sum_squared' in 'residual_vec', since the variable
'L2norm' which receives this value is also unused in 'anderson_solver'
since the previous commit. Remove that variable, and adjust the
interface of 'residual_vec' accordingly.

* ice_global_reductions: remove 'global_allreduce_sum'

In a previous commit, we removed the sole caller of
'global_allreduce_sum' (in ice_dyn_vp::orthogonalize). We do not
anticipate that function to be ued elsewhere in the code, so remove it
from ice_global_reductions. Update the 'sumchk' unit test accordingly.

* doc: mention VP solver is only reproducible using 'bfbflag'

The previous commits made sure that the model outputs as well as the log
file output are bit-for-bit reproducible when using the VP solver by
refactoring the code to use the existing 'global_sum' subroutine.

Add a note in the documentation mentioning that 'bfbflag' is required to
get bit-for-bit reproducible results under different decompositions /
MPI counts when using the VP solver.

Also, adjust the doc about 'bfbflag=lsum8' being the same as
'bfbflag=off' since this is not the case for the VP solver: in the first
case we use the scalar version of 'global_sum', in the second case we
use the array version.

* ice_dyn_vp: improve default parameters for VP solver

During QC testing of the previous commit, the 5 years QC test with the
updated VP solver failed twice with "bad departure points" after a few
years of simulation. Simply bumping the number of nonlinear iterations
(maxits_nonlin) from 4 to 5 makes these failures disappear and allow the
simulations to run to completion, suggesting the solution is not
converged enough with 4 iterations.

We also noticed that in these failing cases, the relative tolerance for
the linear solver (reltol_fmgres = 1E-2) is too small to be reached in
less than 50 iterations (maxits_fgmres), and that's the case at each
nonlinear iteration. Other papers mention a relative tolerance of 1E-1
for the linear solver, and using this value also allows both cases to
run to completion (even without changing maxits_nonlin).

Let's set the default tolerance for the linear solver to 1E-1, and let's
be conservative and bump the number of nonlinear iterations to 10. This
should give us a more converged solution and add robustness to the
default settings.
  • Loading branch information
phil-blain authored Oct 20, 2022
1 parent 2435fa7 commit 16b78da
Show file tree
Hide file tree
Showing 17 changed files with 240 additions and 373 deletions.
351 changes: 194 additions & 157 deletions cicecore/cicedynB/dynamics/ice_dyn_vp.F90

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions cicecore/cicedynB/general/ice_init.F90
Original file line number Diff line number Diff line change
Expand Up @@ -419,7 +419,7 @@ subroutine input_data
deltaminEVP = 1e-11_dbl_kind ! minimum delta for viscosities (EVP, Hunke 2001)
deltaminVP = 2e-9_dbl_kind ! minimum delta for viscosities (VP, Hibler 1979)
capping_method = 'max' ! method for capping of viscosities (max=Hibler 1979,sum=Kreyscher2000)
maxits_nonlin = 4 ! max nb of iteration for nonlinear solver
maxits_nonlin = 10 ! max nb of iteration for nonlinear solver
precond = 'pgmres' ! preconditioner for fgmres: 'ident' (identity), 'diag' (diagonal),
! 'pgmres' (Jacobi-preconditioned GMRES)
dim_fgmres = 50 ! size of fgmres Krylov subspace
Expand All @@ -431,7 +431,7 @@ subroutine input_data
monitor_pgmres = .false. ! print pgmres residual norm
ortho_type = 'mgs' ! orthogonalization procedure 'cgs' or 'mgs'
reltol_nonlin = 1e-8_dbl_kind ! nonlinear stopping criterion: reltol_nonlin*res(k=0)
reltol_fgmres = 1e-2_dbl_kind ! fgmres stopping criterion: reltol_fgmres*res(k)
reltol_fgmres = 1e-1_dbl_kind ! fgmres stopping criterion: reltol_fgmres*res(k)
reltol_pgmres = 1e-6_dbl_kind ! pgmres stopping criterion: reltol_pgmres*res(k)
algo_nonlin = 'picard' ! nonlinear algorithm: 'picard' (Picard iteration), 'anderson' (Anderson acceleration)
fpfunc_andacc = 1 ! fixed point function for Anderson acceleration:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@ module ice_global_reductions
private

public :: global_sum, &
global_allreduce_sum, &
global_sum_prod, &
global_maxval, &
global_minval
Expand All @@ -56,12 +55,6 @@ module ice_global_reductions
global_sum_scalar_int
end interface

interface global_allreduce_sum
module procedure global_allreduce_sum_vector_dbl!, &
! module procedure global_allreduce_sum_vector_real, & ! not yet implemented
! module procedure global_allreduce_sum_vector_int ! not yet implemented
end interface

interface global_sum_prod
module procedure global_sum_prod_dbl, &
global_sum_prod_real, &
Expand Down Expand Up @@ -707,68 +700,6 @@ function global_sum_scalar_int(scalar, dist) &

end function global_sum_scalar_int

!***********************************************************************

function global_allreduce_sum_vector_dbl(vector, dist) &
result(globalSums)

! Computes the global sums of sets of scalars (elements of 'vector')
! distributed across a parallel machine.
!
! This is actually the specific interface for the generic global_allreduce_sum
! function corresponding to double precision vectors. The generic
! interface is identical but will handle real and integer vectors.

real (dbl_kind), dimension(:), intent(in) :: &
vector ! vector whose components are to be summed

type (distrb), intent(in) :: &
dist ! block distribution

real (dbl_kind), dimension(size(vector)) :: &
globalSums ! resulting array of global sums

!-----------------------------------------------------------------------
!
! local variables
!
!-----------------------------------------------------------------------

integer (int_kind) :: &
numProcs, &! number of processor participating
numBlocks, &! number of local blocks
communicator, &! communicator for this distribution
numElem ! number of elements in vector

real (dbl_kind), dimension(:,:), allocatable :: &
work ! temporary local array

character(len=*), parameter :: subname = '(global_allreduce_sum_vector_dbl)'

!-----------------------------------------------------------------------
!
! get communicator for MPI calls
!
!-----------------------------------------------------------------------

call ice_distributionGet(dist, &
numLocalBlocks = numBlocks, &
nprocs = numProcs, &
communicator = communicator)

numElem = size(vector)
allocate(work(1,numElem))
work(1,:) = vector
globalSums = c0

call compute_sums_dbl(work,globalSums,communicator,numProcs)

deallocate(work)

!-----------------------------------------------------------------------

end function global_allreduce_sum_vector_dbl

!***********************************************************************

function global_sum_prod_dbl (array1, array2, dist, field_loc, &
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@ module ice_global_reductions
private

public :: global_sum, &
global_allreduce_sum, &
global_sum_prod, &
global_maxval, &
global_minval
Expand All @@ -57,12 +56,6 @@ module ice_global_reductions
global_sum_scalar_int
end interface

interface global_allreduce_sum
module procedure global_allreduce_sum_vector_dbl!, &
! module procedure global_allreduce_sum_vector_real, & ! not yet implemented
! module procedure global_allreduce_sum_vector_int ! not yet implemented
end interface

interface global_sum_prod
module procedure global_sum_prod_dbl, &
global_sum_prod_real, &
Expand Down Expand Up @@ -708,68 +701,6 @@ function global_sum_scalar_int(scalar, dist) &

end function global_sum_scalar_int

!***********************************************************************

function global_allreduce_sum_vector_dbl(vector, dist) &
result(globalSums)

! Computes the global sums of sets of scalars (elements of 'vector')
! distributed across a parallel machine.
!
! This is actually the specific interface for the generic global_allreduce_sum
! function corresponding to double precision vectors. The generic
! interface is identical but will handle real and integer vectors.

real (dbl_kind), dimension(:), intent(in) :: &
vector ! vector whose components are to be summed

type (distrb), intent(in) :: &
dist ! block distribution

real (dbl_kind), dimension(size(vector)) :: &
globalSums ! resulting array of global sums

!-----------------------------------------------------------------------
!
! local variables
!
!-----------------------------------------------------------------------

integer (int_kind) :: &
numProcs, &! number of processor participating
numBlocks, &! number of local blocks
communicator, &! communicator for this distribution
numElem ! number of elements in vector

real (dbl_kind), dimension(:,:), allocatable :: &
work ! temporary local array

character(len=*), parameter :: subname = '(global_allreduce_sum_vector_dbl)'

!-----------------------------------------------------------------------
!
! get communicator for MPI calls
!
!-----------------------------------------------------------------------

call ice_distributionGet(dist, &
numLocalBlocks = numBlocks, &
nprocs = numProcs, &
communicator = communicator)

numElem = size(vector)
allocate(work(1,numElem))
work(1,:) = vector
globalSums = c0

call compute_sums_dbl(work,globalSums,communicator,numProcs)

deallocate(work)

!-----------------------------------------------------------------------

end function global_allreduce_sum_vector_dbl

!***********************************************************************

function global_sum_prod_dbl (array1, array2, dist, field_loc, &
Expand Down
64 changes: 0 additions & 64 deletions cicecore/drivers/unittest/sumchk/sumchk.F90
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,6 @@ program sumchk
integer(int_kind),parameter :: ntests3 = 3
character(len=8) :: errorflag3(ntests3)
character(len=32) :: stringflag3(ntests3)
integer(int_kind),parameter :: ntests4 = 1
character(len=8) :: errorflag4(ntests4)
character(len=32) :: stringflag4(ntests4)

integer(int_kind) :: npes, ierr, ntask

Expand Down Expand Up @@ -100,7 +97,6 @@ program sumchk
errorflag1 = passflag
errorflag2 = passflag
errorflag3 = passflag
errorflag4 = passflag
npes = get_num_procs()

if (my_task == master_task) then
Expand Down Expand Up @@ -600,63 +596,6 @@ program sumchk
endif
enddo

! ---------------------------
! Test Vector Reductions
! ---------------------------

if (my_task == master_task) write(6,*) ' '

n = 1 ; stringflag4(n) = 'dble sum vector'
allocate(vec8(3))
allocate(sum8(3))

minval = -5.
maxval = 8.

vec8(1) = 1.

! fill one gridcell with a min and max value
ntask = max(npes-1,1)-1
if (my_task == ntask) then
vec8(1) = minval
endif
ntask = min(npes,2)-1
if (my_task == ntask) then
vec8(1) = maxval
endif
vec8(2) = 2. * vec8(1)
vec8(3) = 3. * vec8(1)

! compute correct results
if (npes == 1) then
minval = maxval
corval = maxval
else
corval = (npes - 2) * 1.0 + minval + maxval
endif

do k = 1,ntests4
string = stringflag4(k)
sum8 = -888e12
if (k == 1) then
sum8 = global_allreduce_sum(vec8, distrb_info)
else
call abort_ice(subname//' illegal k vector',file=__FILE__,line=__LINE__)
endif

if (my_task == master_task) then
write(6,'(1x,a,3g16.8)') string, sum8(1),sum8(2),sum8(3)
endif

if (sum8(1) /= corval .or. sum8(2) /= 2.*corval .or. sum8(3) /= 3.*corval) then
errorflag4(k) = failflag
errorflag0 = failflag
if (my_task == master_task) then
write(6,*) '**** ERROR ', sum8(1),sum8(2),sum8(3),corval
endif
endif
enddo

! ---------------------------

if (my_task == master_task) then
Expand All @@ -670,9 +609,6 @@ program sumchk
do k = 1,ntests3
write(6,*) errorflag3(k),stringflag3(k)
enddo
do k = 1,ntests4
write(6,*) errorflag4(k),stringflag4(k)
enddo
write(6,*) ' '
write(6,*) 'SUMCHK COMPLETED SUCCESSFULLY'
if (errorflag0 == passflag) then
Expand Down
4 changes: 2 additions & 2 deletions configuration/scripts/ice_in
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@
kridge = 1
ktransport = 1
ssh_stress = 'geostrophic'
maxits_nonlin = 4
maxits_nonlin = 10
precond = 'pgmres'
dim_fgmres = 50
dim_pgmres = 5
Expand All @@ -178,7 +178,7 @@
monitor_pgmres = .false.
ortho_type = 'mgs'
reltol_nonlin = 1e-8
reltol_fgmres = 1e-2
reltol_fgmres = 1e-1
reltol_pgmres = 1e-6
algo_nonlin = 'picard'
use_mean_vrel = .true.
Expand Down
7 changes: 7 additions & 0 deletions configuration/scripts/machines/env.ppp5_intel
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,12 @@ source $ssmuse -d /fs/ssm/main/opt/intelcomp/inteloneapi-2022.1.2/intelcomp+mpi+
# module load -s icc mpi
setenv FOR_DUMP_CORE_FILE 1
setenv I_MPI_DEBUG_COREDUMP 1
# Reproducible collectives
if (${ICE_BASEGEN} != ${ICE_SPVAL} || ${ICE_BASECOM} != ${ICE_SPVAL}) then
setenv I_MPI_CBWR 1
endif
# Stop being buggy
setenv I_MPI_FABRICS ofi
# NetCDF
source $ssmuse -d main/opt/hdf5-netcdf4/serial/shared/inteloneapi-2022.1.2/01

Expand All @@ -32,6 +38,7 @@ setenv ICE_MACHINE_MAKE make
setenv ICE_MACHINE_WKDIR ~/data/ppp5/cice/runs/
setenv ICE_MACHINE_INPUTDATA /space/hall5/sitestore/eccc/cmd/e/sice500/
setenv ICE_MACHINE_BASELINE ~/data/ppp5/cice/baselines/
setenv ICE_MACHINE_MAXRUNLENGTH 6
setenv ICE_MACHINE_SUBMIT qsub
setenv ICE_MACHINE_TPNODE 80
setenv ICE_MACHINE_ACCT unused
Expand Down
3 changes: 2 additions & 1 deletion configuration/scripts/machines/env.ppp6_gnu
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ endif
if ("$inp" != "-nomodules") then

# OpenMPI
source /usr/mpi/gcc/openmpi-4.1.2a1/bin/mpivars.csh
setenv PATH "/home/phb001/.local_rhel-8-icelake-64_gcc/bin:$PATH"

# OpenMP
setenv OMP_STACKSIZE 64M
Expand All @@ -21,6 +21,7 @@ setenv ICE_MACHINE_MAKE make
setenv ICE_MACHINE_WKDIR ~/data/site6/cice/runs/
setenv ICE_MACHINE_INPUTDATA /space/hall6/sitestore/eccc/cmd/e/sice500/
setenv ICE_MACHINE_BASELINE ~/data/site6/cice/baselines/
setenv ICE_MACHINE_MAXRUNLENGTH 6
setenv ICE_MACHINE_SUBMIT qsub
setenv ICE_MACHINE_TPNODE 80
setenv ICE_MACHINE_ACCT unused
Expand Down
7 changes: 7 additions & 0 deletions configuration/scripts/machines/env.ppp6_gnu-impi
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,12 @@ setenv I_MPI_F90 gfortran
setenv I_MPI_FC gfortran
setenv I_MPI_CC gcc
setenv I_MPI_CXX g++
# Reproducible collectives
if (${ICE_BASEGEN} != ${ICE_SPVAL} || ${ICE_BASECOM} != ${ICE_SPVAL}) then
setenv I_MPI_CBWR 1
endif
# Stop being buggy
setenv I_MPI_FABRICS ofi

# OpenMP
setenv OMP_STACKSIZE 64M
Expand All @@ -30,6 +36,7 @@ setenv ICE_MACHINE_MAKE make
setenv ICE_MACHINE_WKDIR ~/data/site6/cice/runs/
setenv ICE_MACHINE_INPUTDATA /space/hall6/sitestore/eccc/cmd/e/sice500/
setenv ICE_MACHINE_BASELINE ~/data/site6/cice/baselines/
setenv ICE_MACHINE_MAXRUNLENGTH 6
setenv ICE_MACHINE_SUBMIT qsub
setenv ICE_MACHINE_TPNODE 80
setenv ICE_MACHINE_ACCT unused
Expand Down
7 changes: 7 additions & 0 deletions configuration/scripts/machines/env.ppp6_intel
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,12 @@ source $ssmuse -d /fs/ssm/main/opt/intelcomp/inteloneapi-2022.1.2/intelcomp+mpi+
# module load -s icc mpi
setenv FOR_DUMP_CORE_FILE 1
setenv I_MPI_DEBUG_COREDUMP 1
# Reproducible collectives
if (${ICE_BASEGEN} != ${ICE_SPVAL} || ${ICE_BASECOM} != ${ICE_SPVAL}) then
setenv I_MPI_CBWR 1
endif
# Stop being buggy
setenv I_MPI_FABRICS ofi
# NetCDF
source $ssmuse -d main/opt/hdf5-netcdf4/serial/shared/inteloneapi-2022.1.2/01

Expand All @@ -32,6 +38,7 @@ setenv ICE_MACHINE_MAKE make
setenv ICE_MACHINE_WKDIR ~/data/ppp6/cice/runs/
setenv ICE_MACHINE_INPUTDATA /space/hall6/sitestore/eccc/cmd/e/sice500/
setenv ICE_MACHINE_BASELINE ~/data/ppp6/cice/baselines/
setenv ICE_MACHINE_MAXRUNLENGTH 6
setenv ICE_MACHINE_SUBMIT qsub
setenv ICE_MACHINE_TPNODE 80
setenv ICE_MACHINE_ACCT unused
Expand Down
Loading

0 comments on commit 16b78da

Please sign in to comment.