Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grid_imask created with SCRIPNC contains garbage values #1328

Open
DeniseWorthen opened this issue Dec 9, 2024 · 34 comments
Open

grid_imask created with SCRIPNC contains garbage values #1328

DeniseWorthen opened this issue Dec 9, 2024 · 34 comments
Labels
bug Something isn't working

Comments

@DeniseWorthen
Copy link
Contributor

DeniseWorthen commented Dec 9, 2024

Describe the bug
When generating the SCRIPNC file in w3gridmd, the grid_imask variable is defined with dimension size GRID1_SIZE. When allocated and filled, it is only valid for GRID1_DIMS(1). It should be allocated and filled to size GRID1_SIZE. This bug results in a grid_imask field which is valid only on the first row of the domain.

For the mesh cap, having non-valid values in the grid_imask propagates to the ESMF Unstructured mesh. For structured grids, the mesh cap will replace the mask obtained from the unstructured ESMF mesh if it does not currently contain a mask (ie, if the current values are ==1 everywhere). In this case however, the meshmin is neither 0 or 1 (it is in fact -2147483647 for the glo_025.SCRIP.nc). This means that destination and source masking in CMEPS is not working correctly since it relies on a mask value of either 0 or 1 and mask in the mesh sent to CMEPS has a single row (j=1) which can be used as a mask.

Note: The code segment below needs to be read knowing that for structured meshes, the mesh that WW3 sends to the mediator (ie, the mesh that the export fields are built on) has the land points at the end of the distribution. So, the entire mask is set 0, and then the points which are WW3 "sea" points (ie, the values 1:nseal_cpl) are set to 1.

if (.not. unstr_mesh) then
! obtain the mesh mask and find the minimum value across all PEs
call ESMF_MeshGet(EMesh, elementDistgrid=Distgrid, rc=rc)
if (ChkErr(rc,__LINE__,u_FILE_u)) return
call ESMF_DistGridGet(Distgrid, localDe=0, elementCount=ncnt, rc=rc)
if (ChkErr(rc,__LINE__,u_FILE_u)) return
allocate(meshmask(ncnt))
elemMaskArray = ESMF_ArrayCreate(Distgrid, farrayPtr=meshmask, rc=rc)
if (chkerr(rc,__LINE__,u_FILE_u)) return
call ESMF_MeshGet(Emesh, elemMaskArray=elemMaskArray, rc=rc)
if (ChkErr(rc,__LINE__,u_FILE_u)) return
call ESMF_VMAllFullReduce(vm, sendData=meshmask, recvData=maskmin, count=ncnt, &
reduceflag=ESMF_REDUCE_MIN, rc=rc)
if (ChkErr(rc,__LINE__,u_FILE_u)) return
if (maskmin == 1) then
! replace mesh mask with internal mask
meshmask(:) = 0
meshmask(1:nseal_cpl) = 1
call ESMF_MeshSet(mesh=EMesh, elementMask=meshmask, rc=rc)
if (chkerr(rc,__LINE__,u_FILE_u)) return
end if
)
This is not an issue with rectilinear SCRIP files created from NCO (https://ufs-weather-model.readthedocs.io/en/develop/InputsOutputs.html#mesh-generation).

To Reproduce

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

@DeniseWorthen DeniseWorthen added the bug Something isn't working label Dec 9, 2024
@JessicaMeixner-NOAA
Copy link
Collaborator

To clarify - as long as we've created the structured meshes used with NUOPC to be via NCO as described here: https://ufs-weather-model.readthedocs.io/en/develop/InputsOutputs.html#mesh-generation we would not run into this issue?

@DeniseWorthen
Copy link
Contributor Author

DeniseWorthen commented Dec 9, 2024

Yes, that should work. The SCRIP file can be generated like so

 ncremap -g test.SCRIP.nc -G latlon=721,1440#snwe=-90.0,-90.0,-0.125,359.875#lat_typ=uni#lat_drc=s2n

EDIT: typo in above...should be -90:90, not -90:-90

@JessicaMeixner-NOAA
Copy link
Collaborator

Okay I believe that's how we have been generating things before so hopefully most things are okay as of now - but should definitely get this fixed!

@DeniseWorthen
Copy link
Contributor Author

In UFS, the meshes for HAFS were generated in this way, the only other rectilinear structured mesh is used in the atmwav_control_noaero_p8 test. I'm pretty sure I added that myself, and I would have generated the SCRIP/mesh using NCO.

@DeniseWorthen
Copy link
Contributor Author

Note also that generating via NCO will put the grid centers at the min/max values of +/- 89.875 and the corners will now be at +/- 90.

With the current mesh.glo_025.nc, the domain has (incorrectly, I believe) the centers at +/- 90.0 and the corners at +/- 90.125.

@DeniseWorthen
Copy link
Contributor Author

DeniseWorthen commented Dec 10, 2024

It turns out you cannot create an evenly divided rectilinear 0.25 resolution w/ 721 latitudes. If you try, the bottom left grid box corners are at -90.00000, -90.00000,-89.75035,-89.75035 and the center is at -89.87517.

Only by using 720 latitudes can you generate an exact 0.25 resolution, giving for the bottom left grid box corners -90.00000, -90.00000, -89.75000, -89.75000 and a center at -89.875.

The thing to remember when generating w/ NCO is that the snwe values are the values of the domain's bounding box---so the outer-most corners of the entire domain.

When specifying rectilinear in the inp file, the values appear to be the center lat and lon of the first (lower left) grid box.

@NickSzapiro-NOAA
Copy link
Contributor

@JessicaMeixner-NOAA Maybe GEFS (EP6) needs fix for glo_025 mesh? Running GEFS RT with:

diff --git a/model/src/wav_comp_nuopc.F90 b/model/src/wav_comp_nuopc.F90
index 4280b3b1..9382e54a 100644
--- a/model/src/wav_comp_nuopc.F90
+++ b/model/src/wav_comp_nuopc.F90
@@ -925,6 +925,9 @@ contains
       call ESMF_VMAllFullReduce(vm, sendData=meshmask, recvData=maskmin, count=ncnt, &
            reduceflag=ESMF_REDUCE_MIN, rc=rc)
       if (ChkErr(rc,__LINE__,u_FILE_u)) return
+      ! debug
+      maskmin = 1

changes baselines.

@JessicaMeixner-NOAA
Copy link
Collaborator

Tagging @sbanihash as she is the the waves GEFS POC

I'm not sure who (or how) the mesh.glo_025 mesh was created. I'm fairly sure I created the HAFS ones, but I don't remember for the GEFS. But if we need an update, we should do that!

@JessicaMeixner-NOAA
Copy link
Collaborator

Also @NickSzapiro-NOAA - to get the new mesh, I"m assuming this bug needs fixed asap?

@NickSzapiro-NOAA
Copy link
Contributor

NickSzapiro-NOAA commented Dec 12, 2024

Denise knows better but I think we need to replace glo_025 with a new one with different dimensions (and SCRIP made via nco commands)

@JessicaMeixner-NOAA
Copy link
Collaborator

Do you want @sbanihash and I to recreate a mesh file and test?

@NickSzapiro-NOAA
Copy link
Contributor

I think so. Right @DeniseWorthen?

@JessicaMeixner-NOAA
Copy link
Collaborator

JessicaMeixner-NOAA commented Dec 12, 2024

Job script: /scratch1/NCEPDEV/climate/Jessica.Meixner/MakeMeshesWW3CMEPS/makemeshww3_rocky8_025.job

#!/bin/sh --login

#SBATCH --nodes=1
#SBATCH -q batch
#SBATCH -t 08:00:00
#SBATCH -A marine-cpu
#SBATCH -J ww3hafsmesh
#SBATCH -o makemeshrocky8.out

  module purge
  module use /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/envs/unified-env-rocky8/install/modulefiles/Core
  module load stack-intel/2021.5.0
  module load stack-intel-oneapi-mpi/2021.5.1
  module load cmake/3.23.1
  module load libpng/1.6.37
  module load zlib/1.2.13
  module load jasper/2.0.32
  module load hdf5/1.14.0
  module load netcdf-c/4.9.2
  module load netcdf-fortran/4.6.1
  module load bacio/2.4.1
  module load g2/3.4.5
  module load w3emc/2.10.0
  module load esmf/8.5.0

  module load nco
 

echo "ncremap:"
ncremap -g glo_025.SCRIP.nc -G latlon=721,1440#snwe=-90.0,90.0,-0.125,359.875#lat_typ=uni#lat_drc=s2n


echo "esmf script:" 
srun -n 1 ESMF_Scrip2Unstruct glo_025.SCRIP.nc mesh.glo_025.nc 0

scrip inbetween file: /scratch1/NCEPDEV/climate/Jessica.Meixner/MakeMeshesWW3CMEPS/glo_025.SCRIP.nc

mesh file: /scratch1/NCEPDEV/climate/Jessica.Meixner/MakeMeshesWW3CMEPS/mesh.glo_025.nc

Haven't tested yet... but sharing now all the same

@JessicaMeixner-NOAA
Copy link
Collaborator

@DeniseWorthen - Okay i think i need to recreate the glo_900 grid that I have made recently... I don't have the scripts to check the HAFS, but why did i think you had to do latitudes: (first point)-dy/2:(last point)-dy/2? similar to how the longitudes are done?

@DeniseWorthen
Copy link
Contributor Author

DeniseWorthen commented Dec 12, 2024

The correct, 025 degree rectilinear grid should be created with only 720 latitudes via the NCO command.

It turns out you cannot create an evenly divided rectilinear 0.25 resolution w/ 721 latitudes. If you try, the bottom left grid box corners are at -90.00000, -90.00000,-89.75035,-89.75035 and the center is at -89.87517.

Only by using 720 latitudes can you generate an exact 0.25 resolution, giving for the bottom left grid box corners -90.00000, -90.00000, -89.75000, -89.75000 and a center at -89.875.

The thing to remember when generating w/ NCO is that the snwe values are the values of the domain's bounding box---so the outer-most corners of the entire domain.

When specifying rectilinear in the inp file, the values appear to be the center lat and lon of the first (lower left) grid box.

However, using 720 will change the dimensions of the mesh and then the question is how much disruption that causes for the case Nick is trying to replicate (ie, the CMEPS restarts will not work, since now the WW3 domain will be different.)

@DeniseWorthen
Copy link
Contributor Author

DeniseWorthen commented Dec 12, 2024

@JessicaMeixner-NOAA The HAFS meshes I believe were all created by me, since they are rectlinear. Let me try to dig up the SCRIP file (amazed if it still exists....)

@JessicaMeixner-NOAA
Copy link
Collaborator

@DeniseWorthen that makes sense why I can't find the script to create them.

@DeniseWorthen
Copy link
Contributor Author

DeniseWorthen commented Dec 12, 2024

One option for "fixing" the mesh but maintaining (I think) the current ICs/warmstarts etc is to use NCO on the existing mesh and simply replace all values of mask with 1. That doesn't require any code changes, but would change all answers (I think).

@DeniseWorthen
Copy link
Contributor Author

See /scratch1/NCEPDEV/nems/Denise.Worthen/WORK/meshes

@JessicaMeixner-NOAA
Copy link
Collaborator

Thanks @DeniseWorthen - I plan to look at this tomorrow morning with fresh eyes.

@JessicaMeixner-NOAA
Copy link
Collaborator

Okay - looking through things again today and one thing I want to do is make sure all of the in-use mesh files are okay, so I've created a list of all known mesh files And now attempting to look into if these are consistently made, which ones have issues, etc.

@JessicaMeixner-NOAA
Copy link
Collaborator

I've looked quite a bit at the other meshes - and I think I can confirm that the only mesh that was created and has an issue is the glo_025 mesh. @DeniseWorthen your comment of:

One option for "fixing" the mesh but maintaining (I think) the current ICs/warmstarts etc is to use NCO on the existing mesh and simply replace all values of mask with 1. That doesn't require any code changes, but would change all answers (I think).

Based on looking at the other grids, it looks like this would be the output of code changes? And so might be worth pursuing anyways? Especially since these answer changes would be the same either way? Or - are we saying that this is a poorly defined grid and we should perhaps go back to step 1 with this? (Which would have a list of challenges as well). I've looked at what needs to be done code change wise and can try to implement next week?

@DeniseWorthen
Copy link
Contributor Author

I've also looked at a few other the others in the spreadsheet and only the glo_025 is bad.

However, if I use an identical ww3_grid.inp.glo_100 and the "create moddefs" script, I produce a SCRIP and mesh file which a) has bad grid_imask value in the SCRIP and b) does not contain an elementArea in the mesh. So something is different about how these files are getting created but w/o the actual SCRIP file that was used for the one in the "fix" directory, I don't know what it is.

Three things:

  1. I think that just printing out the sizes in w3gridmd indicates that the grid1_imask isn't dimensioned correctly so I think a code fix needs to be made in any case. It would be good to get that confirmed.

  2. My suggestion about using NCO to "fix" the grid_imask in the current grid would resolve Nick's reproducibility issue even w/o 1). I don't know whether it matters to gefs that the grid would change and they might see answer changes from that. I'm not sure they would though, because the reproducibility problem Nick was seeing was arising over what should have been land on the wave grid...

  3. I personally would not have designed the grid this way, but it apparently has run w/o issue (?). The disruption of making a new one might just be too much. That's a call I can't make.

@DeniseWorthen
Copy link
Contributor Author

DeniseWorthen commented Dec 17, 2024

An elementMask=1 version of the mesh.glo_025.nc file is at /work2/noaa/stmp/dworthen/mesh.glo_025.nc. I generated it using

ncap2 -s 'tmp=0*elementMask+1;elementMask=tmp' mesh.glo_025.nc
ncks -O -x -v tmp mesh.glo_025.nc mesh.glo_025.nc

@NickSzapiro-NOAA Please try this one in your RT test.

@sbanihash
Copy link
Collaborator

sbanihash commented Dec 17, 2024 via email

@NickSzapiro-NOAA
Copy link
Contributor

With the updated wave element mask, cpld_nothr_gefs test passes on hercules (as well as control, restart, dcp as before).

Answers change across components vs. the current mesh file. I will look at maps of diffs but runs are available at:
/work/noaa/nems/nszapiro/tasks/GEFS_RT/elementMask/ufs-weather-model/tests where
run_dir/cpld_{control,dcp,nothr,restart}_gefs_intel use the updated mask
run_dir/cpld_wavmask_ep5_gefs_intel is like control but with the original mesh file as in g-w

@NickSzapiro-NOAA
Copy link
Contributor

NickSzapiro-NOAA commented Dec 19, 2024

Some quick ncviews of ncdiffs of new-old elementMask tests seem to suggest that this is worth fixing (with caveat that this is only one case). Please let me know if proper quality figures, more depth,... would be helpful.

Differences in ww3 mapsta are from land and North of 82N. Differences in FV3 fields after its first timestep are evident along these domain boundaries (for surface T,u,v,...). These differences grow over the forecast, including a coherent impact over the Black Sea.

ww3 mapsta:
dwavemask_mapsta_ww3 hi 2021-03-25-32400

FV3 near-surface T,u after first timestep:
dwavemask_tmpsfc_sfcf000

dwavemask_u10_sfcf000

MOM6 SST after 3 hours:
dwavemask_sst_ocn_2021_03_25_09

FV3 Tsfc after 12 hours:
dwavemask_tmpsfc_sfcf012

@JessicaMeixner-NOAA
Copy link
Collaborator

We see differences in surface roughness which is likely directly impacted by the mask issue and causing all the other changes:

Screenshot from 2024-12-19 13-24-42

I agree this looks like this needs to be fixed.

@DeniseWorthen
Copy link
Contributor Author

It's a mess---I think it is a combination of the WAV mask being bad + we're using a srcMask=1 on the ATM side.

Debugging the grid imprint issue with bilinear mapping, tells me we'll need to switch the scrMask=undefined for ATM (in other words, the ATM fields are valid everywhere when used as a source). See the figure in NOAA-EMC/CMEPS#129

@NickSzapiro-NOAA
Copy link
Contributor

NickSzapiro-NOAA commented Jan 10, 2025

@JessicaMeixner-NOAA @sbanihash cpld_control_gefs RT runs on hercules with 120 tasks (and 2 ESMF-managed threads) for WW3 but fails with less. The fails segfault in
EMesh = ESMF_MeshCreate(filename=trim(cvalue), fileformat=ESMF_FILEFORMAT_ESMFMESH, elementDistgrid=Distgrid,rc=rc)
and it's not clear WW3 should need so many tasks. Would it be possible to get mesh, mod_def, inp files for a modified glo_025 mesh without the extra-polar latitudes to re-test?

@JessicaMeixner-NOAA
Copy link
Collaborator

@NickSzapiro-NOAA do you mean by cutting off the whole grid at say 80 north instead of having it as part of the grid?

@NickSzapiro-NOAA
Copy link
Contributor

I'm not so familiar but is it possible to go -90 to 90 at the rectilinear corners (instead of the cell centers)? If not, maybe the same 82.5N cutoff

@JessicaMeixner-NOAA
Copy link
Collaborator

It's possible to do either for testing. Using that in GEFS is a different question. I can create those, but wont have time today and will be at AMS next week, so not sure when I'd get a chance to do this.

@NickSzapiro-NOAA
Copy link
Contributor

@JessicaMeixner-NOAA This is for testing to track down this sensitivity to # of WAV_tasks. Happy to test when available. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants