Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MOM5/CM2M (experiment CM2.1p1 run error ) #369

Open
Mubashardogar opened this issue Jul 18, 2022 · 25 comments
Open

MOM5/CM2M (experiment CM2.1p1 run error ) #369

Mubashardogar opened this issue Jul 18, 2022 · 25 comments

Comments

@Mubashardogar
Copy link

Dear MOM5 Users,

I have successfully compiled the MOM5 model for type CM2M using Hokkaido University's supercomputer with ifort compiler environment (https://www.hucc.hokudai.ac.jp/en/supercomputer/sc-overview/). The environment (environs.hu) and mkmf.template.hu file that was used for model compilation are attached. I used compiler Flag "-convert big_endian" to get rid of model run error which I also get like several MOM users which were reported in earlier email threads (i.e., FATAL from PE 0: read_fv_rst:: resolution inconsistent). However, I got the following error when I tried to run the test experiment CM2.1p1 using npes 45 (i.e., ocean_npes=30, atmos_npes=15). I used two HPC nodes of the Hokkaido University Grand Chariot supercomputer (each node has 40 cores). Please suggest how to overcome this model run error.

FATAL from PE 34: ==>Error from coupler_types_mod (CT_spawn_1d_3d): Disordered k-dimension index bound list 1 0
Best regards,
Dogar

environs.hu.txt
ERROR_File.sh..o13.txt
mkmf.template.hu.txt

@russfiedler
Copy link
Collaborator

Thanks for that. It looks like it's crashing during the initial testing steps. Could you recompile and run with traceback information so we can see where the error is being triggered?

-g -traceback should do the trick.

@Mubashardogar
Copy link
Author

Mubashardogar commented Jul 18, 2022 via email

@russfiedler
Copy link
Collaborator

russfiedler commented Jul 18, 2022

In the compilation template script. Make sure you you clean out the old objects and binaries first. At the end of your output file you should be getting some routines and line numbers rather than just addresses if you've done it correctly.

@Mubashardogar
Copy link
Author

Mubashardogar commented Jul 18, 2022 via email

@russfiedler
Copy link
Collaborator

Also in FFLAGS. LDFLAGS shouldn't need it but it won't hurt.

@Mubashardogar
Copy link
Author

Mubashardogar commented Jul 18, 2022 via email

@russfiedler
Copy link
Collaborator

Yes, that looks good. I'll have a look at your results tomorrow.

@Mubashardogar
Copy link
Author

Mubashardogar commented Jul 18, 2022 via email

@russfiedler
Copy link
Collaborator

@Mubashardogar unfortunately, there doesn't seem to be an attachement.

@Mubashardogar
Copy link
Author

Dear Russ,

Please find attached the error file.
Best regards,
Dogar
CM2P1run_ERROR.sh.o1319576.txt

@russfiedler
Copy link
Collaborator

That's great.
@aidanheerdegen It seems like an array with a zero length dimension isn't being handled correctly in the coupler for gas exchange. Something that might be a 2D array is being treated as 3D.

@Mubashardogar
Copy link
Author

Dear Russ,

Did you manage and figured out the problem. Kindly advise how to fix this error? I look forward to your kind response.

Thank you and best regards
Dogar

@aidanheerdegen
Copy link
Contributor

I am looking into it.

@aidanheerdegen
Copy link
Contributor

I don't think I will have time to get resolution today, or this week.

In the meantime @Mubashardogar you could try checking out this commit fe8bdad

git checkout fe8bdad82

and compile and use that executable. The change in the code that is throwing the error was an update to FMS just after that commit. I doubt anything else that has changed since then is critical for you, considering you are running an old standard configuration.

@Mubashardogar
Copy link
Author

Dear Aidan,

I understand that it will take time to fix. However, meanwhile, if I understood correctly, you want me to take an older version of MOM5 (before this update to FMS) and use this one and compile the model again. Could you kindly give the download link directly pointing to this older version, so that I should not do any mistakes while downloading the version you are referring to?

Best regards,
Dogar

@aidanheerdegen
Copy link
Contributor

@russfiedler I reproduced the error on gadi.

So this check in the FMS update
https://github.com/mom-ocean/MOM5/blob/master/src/shared/coupler/coupler_types.F90#L1342-L1344

doesn't exist in the previous version:

if (var_out%num_bcs .ne. 0) then !{
call mpp_error(FATAL, trim(error_header) // ' Number of output fields is non-zero')
endif !}
var_out%num_bcs = var_in%num_bcs
!
! Return if no input fields
!
if (var_in%num_bcs .ne. 0) then !{

CT_spawn_1d_3d is called here:
https://github.com/mom-ocean/MOM5/blob/master/src/shared/coupler/coupler_types.F90#L1000

call CT_spawn_1d_3d(var_in, var_out,  (/ is, is, ie, ie /), (/ js, js, je, je /), (/1, kd/), suffix)

with kdim = (/1, kd/)which implies 1 > kd.

That value of kd comes from the size of the 3rd dimension of the Ice%ice_mask

https://github.com/mom-ocean/MOM5/blob/master/src/coupler/flux_exchange.F90#L1066-L1067

    kd = size(Ice%ice_mask,3)
    call coupler_type_copy(ex_gas_fields_ice, Ice%ocean_fields, is, ie, js, je, kd,     &
         'ice_flux', Ice%axes, Time, suffix = '_ice')

Any ideas why Ice%ice_mask might have a zero-sized third dimension at this point?

@russfiedler
Copy link
Collaborator

@aidanheerdegen It looks like an optimisation/scope problem and horrible use of global variables. km isn't initialised until the call to set_ice_grid at line 505 in ice_grid.F90. This change to the value of km isn't seen by the compiler and I bet the allocations beginning at line 411 in ice_model_init are being done out of order.
@Mubashardogar First, try compiling without the OpenMP compiler flags. They may be causing the problem.
Otherwise add the line
km=num_part before the calls to set_ocean_grid

@aidanheerdegen
Copy link
Contributor

I understand that it will take time to fix. However, meanwhile, if I understood correctly, you want me to take an older version of MOM5 (before this update to FMS) and use this one and compile the model again. Could you kindly give the download link directly pointing to this older version, so that I should not do any mistakes while downloading the version you are referring to?

If you cloned the MOM5 repo, running the command I gave above in your MOM5 code directory should be sufficient for your requirements.

@aidanheerdegen
Copy link
Contributor

@aidanheerdegen It looks like an optimisation/scope problem and horrible use of global variables. km isn't initialised until the call to set_ice_grid at line 505 in ice_grid.F90. This change to the value of km isn't seen by the compiler and I bet the allocations beginning at line 411 in ice_model_init are being done out of order.

Ahh, I see. This broadcast take care of propagating the values from the ice model initialisation to other ice PEs

call mpp_broadcast_domain(Ice%domain)
call mpp_broadcast_domain(Ocean%domain)

before the problematic call to flux_exchange_init

call flux_exchange_init ( Time, Atm, Land, Ice, Ocean, Ocean_state,&

but there is no synchronisation because those broadcasts are only in within each domain (ice and ocean).

Is that a bug? If the ocean PEs need information from the ice domain then it needs to broadcast that info to the ocean PEs.

@Mubashardogar
Copy link
Author

Mubashardogar commented Jul 21, 2022

@russfiedler Dear Russ, I compiled the model again by removing all compiler flags, i.e., "-qopenmp" from my mkmf.template file, however, I again get the error. Should I add km=num_part in the file "src/mom5/ocean_core/ocean_grids.F90 at Line 241 after subroutine "set_ocean_grid_size(Grid, grid_file, grid_name)"?

Best regards,
Dogar

@Mubashardogar
Copy link
Author

@aidanheerdegen, I followed your steps and compiled the model again after applying the command "git checkout fe8bdad". Now I run the model again. This time model reached the end and displayed the message "end_of_run" as shown in the attached file. However, several errors and warnings are listed in this output file. Moreover, there is no output data *tar files (containing History and Ascii files, etc) produced. Is it because some input data files are missing? Did I miss some steps?
Best regards,
Dogar

CM2P1run.sh.o1323202.txt

@aidanheerdegen
Copy link
Contributor

The model ran fine, but the runtime is only very short (21s) for testing purposes. You will probably need to increase the run length before you get any diagnostic output, as that is generally done at a frequency of daily, monthly and/or annually.

The output files are netCDF, so you will have files ending in .nc.

@Mubashardogar
Copy link
Author

Mubashardogar commented Jul 22, 2022

@aidanheerdegen Dear Aidan,
Thank you so much. Ok, I will increase the test run length. Should I change the date in input.nml by changing the date (current_date =1,1,1,0,0,0,) or does it have to be done in ../bin/time_stamp.csh file? I couldn't find end date and number of submissions, etc. Also, please guide me if I want to increase the number of processors/cores from npes=45 to npes=60 then should I use the layout for ocean_npes=30 and atmos_pes=30?

What are the warnings and potential error messages (e.g., diag_manager_end: total_ocean_evap NOT available) in the output model run file that I attached earlier (messages are also copied below). Moreover, where can I get the input data (e.g., aerosol data especially volcanic aerosol input forcing data) as I am interested to do realistic simulations for the period 1950-2021, etc?

NOTE from PE 0: aerosol_mod: inconsistent nml settings -- not using aerosol timeseries but requesting interannual variation of aerosol amount for so4_anthro -- this aerosol will NOT exhibit interannual variation
WARNING from PE 15: diag_util_mod::opening_file: module/field_name (generic_cfc/sfc_flux_cfc_12) NOT registered
NOTE from PE 0: Potential error in diag_manager_end: drag_moist NOT available, check if output interval > runlength. Netcdf fill_values are written
WARNING from PE 15: diag_util_mod::opening_file: module/field_name (ocean_model/eta_nonsteric_global) NOT registered
NOTE from PE 15: Potential error in diag_manager_end: total_ocean_evap NOT available, check if output interval > runlength. Netcdf fill_values are written
/bin/ls: No match.

Best regards,
Dogar

@aidanheerdegen
Copy link
Contributor

Issues in this repository are for code related problems only. There is ample documentation on running the configuring the model here

https://mom-ocean.github.io

If you have problems after that the google group is probably the best option.

@Mubashardogar
Copy link
Author

Mubashardogar commented Dec 8, 2022

Dear @russfiedler @aidanheerdegen,

I did an experiment using the MOM5/CM2.1 model that is a continuation of my earlier experiment. Just to remind you, I followed the above steps recommended by @aidanheerdegen and compiled the model after applying the command "git checkout fe8bdad". My model was running fine with control settings.

Now, I want to see the effect of volcanic aerosols. Therefore, I made the required changes in the namelist "&aerosolrad_package_nml" (please see attached namelist "input.nml.txt"). Also please look at the log file and error file. In the error file I got the following message:

FATAL from PE 12: shortwave_driver_mod: cannot calculate volcanic sw heating when volcanic sw aerosols are not activated

Where should I activate volcanic sw aerosols? I have one more question. In the "&aerosolrad_package_nml", I activated "sw" and "lw" volcanic aerosols as follows, but it seems the model is not calculating it. Please advise, on how to fix these issues?

&aerosolrad_package_nml
volcanic_dataset_entry = 1991, 1, 1, 0, 0, 0,
using_volcanic_lw_files = .true.,
lw_ext_filename = "extlw_data.nc"
lw_ext_root = "extlw"
lw_asy_filename = "asmlw_data.nc"
lw_asy_root = "asmlw "
lw_ssa_filename = "omglw_data.nc"
lw_ssa_root = "omglw"
using_volcanic_sw_files = .true.,
sw_ext_filename = "extsw_data.nc"
sw_ext_root = "extsw"
sw_ssa_filename = "omgsw_data.nc"
sw_ssa_root = "omgsw"
sw_asy_filename = "asmsw_data.nc"
sw_asy_root = "asmsw"
do_lwaerosol = .true.,
do_swaerosol = .true.,
aerosol_data_set = 'shettle_fenn',
optical_filename = "aerosol.optical.dat",

Best regards,
Dogar
CM2P1run_ERROR.sh.o1509435.txt
input.nml.txt
logfile.000000.out.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants