-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MERRA2 aerosol options for UFS and the coupled model #200
Comments
new tests needed for changing CCPP=NO, interval = '24:00:00', export FHMAX_GFS_00=384, for DELL, Orion, HERA, and a few of more cases. Is cycled testing needed? is "Waves" needed to be on? |
@AnningCheng-NOAA For the Merra2 changes please run 2 tests on each platform: Test 1 - combine several settings into a single 2.5 cycle cycled test:
That should invoke the parts of the system I need to see tested for this. Test 2 - a 1.5 cycle cycled test on each platform with RUN_CCPP=NO to make sure adding support for Merra2 does not break the remaining support for IPD (I will be dropping IPD support in coming months but not just yet) Let me know if you run into any issues with either test that you need help with. Thanks! |
I was just told that it's known that waves don't work with CCPP on. Therefore test 1 can be done with DO_WAVE=NO and I am ok with DO_WAVE=NO in config.base when RUN_CCPP=YES. |
If the changes Anning made to the model is based on the latest develop
branch, IPD has been removed from the code. He can only test with
RUN_CCPP=YES.
The workflow supporting GFS.v16 and IPD should be tagged. It is probably
time to remove all IPD related definitions in the workflow and move to only
support CCPP as soon as possible since all developments (physics and
coupled model etc) will be using CCPP.
Fanglin
…On Wed, Feb 3, 2021 at 11:28 AM Kate Friedman ***@***.***> wrote:
@AnningCheng-NOAA <https://github.com/AnningCheng-NOAA> For the Merra2
changes please run 2 tests on each platform:
Test 1 - combine several settings into a single 2.5 cycle cycled test:
1. interval=24 (gfs_cyc=1), make sure one of the full cycles in the
test is 00z (suggest starting with 18z half cycle), feel free to set
gfs_cyc=4 when running setup scripts though if there are no reasons to not
run the gfs for 06z, 12z, or 18z
2. FHMAX_GFS_00=384 (to make sure you don't hit walltime)
3. DO_WAVE=YES
That should invoke the parts of the system I need to see tested for this.
Test 2 - a 1.5 cycle cycled test on each platform with RUN_CCPP=NO to make
sure adding support for Merra2 does not break the remaining support for IPD
(I will be dropping IPD support in coming months but not just yet)
Let me know if you run into any issues with either test that you need help
with. Thanks!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#200 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKY5N2KI65763SQHM3QACQLS5F2SXANCNFSM4UICEJMA>
.
--
*Fanglin Yang, Ph.D.*
*Chief, Model Physics Group*
*Modeling and Data Assimilation Branch*
*NOAA/NWS/NCEP Environmental Modeling Center*
*https://www.emc.ncep.noaa.gov/gmb/wx24fy/fyang/
<https://www.emc.ncep.noaa.gov/gmb/wx24fy/fyang/>*
|
I was hoping to get the final v16 changes from NCO into develop before removing the final IPD support but with the implementation delay I see I can't do that...so yes I guess it's time to remove all IPD related definitions in the workflow. I'll update my PR review comments for this. Thanks! |
@AnningCheng-NOAA Ok since waves don't work with CCPP right now I adjust my test request: please run a 2.5 cycle cycled test on each platform with RUN_CCPP=YES, DO_WAVE=NO, and FHMAX_GFS_00=384 so we can make sure it is ok in cycled mode. No test #2 anymore. Thanks! Sorry for the confusion on my end! |
HI, Kate:
Please let me know when you copy merra2 and aer_data to the main $FIX_DIR.
Thank you,
Anning
…On Wed, Feb 3, 2021 at 1:01 PM Kate Friedman ***@***.***> wrote:
@AnningCheng-NOAA <https://github.com/AnningCheng-NOAA> Ok since waves
don't work with CCPP right now I adjust my test request: please run a 2.5
cycle cycled test on each platform with RUN_CCPP=YES, DO_WAVE=NO, and
FHMAX_GFS_00=384 so we can make sure it is ok in cycled mode. No test #2
<#2> anymore. Thanks!
Sorry for the confusion on my end!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#200 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALQPMIP366VB7CSDTACPHP3S5GFQLANCNFSM4UICEJMA>
.
|
@AnningCheng-NOAA I've started pulling them into the WCOSS-Dell $FIX_DIR ($FIX_DIR/fix_aer and $FIX_DIR/fix_lut) from your WCOSS_Dell set. Then I will rsync them to the FIX_DIRs on the Crays, Hera, Jet, and Orion...and make a new HPSS tarball of $FIX_DIR. I see the fix_aer files are quite large so the copy/rsyncs will take a while. Will report back when done, thanks! |
Hi, Kate:
I have just noticed that the MERRA2 data at DELL is dated back to 2017 and
out of date. I have just update the dataset and you can see that the date
produced is December 24th. I am afraid that you need to repull the data
again. Sorry for any inconvenience.
Anning
…On Thu, Feb 4, 2021 at 10:23 AM Kate Friedman ***@***.***> wrote:
@AnningCheng-NOAA <https://github.com/AnningCheng-NOAA> I've started
pulling them into the WCOSS-Dell $FIX_DIR ($FIX_DIR/fix_aer and
$FIX_DIR/fix_lut) from your WCOSS_Dell set. Then I will rsync them to the
FIX_DIRs on the Crays, Hera, Jet, and Orion...and make a new HPSS tarball
of $FIX_DIR. I see the fix_aer files are quite large so the copy/rsyncs
will take a while. Will report back when done, thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#200 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALQPMIIM7JOUZUMBGRIGSLDS5K3U7ANCNFSM4UICEJMA>
.
|
@AnningCheng-NOAA I see your Mars set is now dated February 4th and the Venus set (what I pulled from) is December 24th. I should pull from your Mars set then? Please confirm, thanks! |
Kate, I have just copied MERRA2 from DELL to Mars. So they should be
consistent, exactly the same although the date is different.
…On Thu, Feb 4, 2021 at 11:32 AM Kate Friedman ***@***.***> wrote:
@AnningCheng-NOAA <https://github.com/AnningCheng-NOAA> I see your Mars
set is now dated February 4th and the Venus set (what I pulled from) is
December 24th. I should pull from your Mars set then? Please confirm,
thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#200 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALQPMIODND2KCSF3UU6QDK3S5LD2BANCNFSM4UICEJMA>
.
|
@AnningCheng-NOAA The new fix files are now in all $FIX_DIRs on WCOSS-Dell, WCOSS-Cray, Hera, and rzdm. I'm copying them to Orion and Jet this morning. Below is the listing of them on Hera under $FIX_DIR/fix_aer and $FIX_DIR/fix_lut. I'm also putting a fresh copy of $FIX_DIR on HPSS for our archival. You can now remove the paths to FIX_AER and FIX_LUT in config.base.emc.dyn, thanks.
|
Kate, thanks! I submitted a cycling test (C768L127 for gfs and C384L127 for
ensemble) in orion yesterday. It took a very long waiting time, Could you
take a look to see if too much resources have been requested?
expdir: /work/noaa/global/acheng/para_gfs/mcyc1,
rotdir: /work/noaa/stmp/acheng/ROTDIRS/mcyc1
…On Fri, Feb 5, 2021 at 9:15 AM Kate Friedman ***@***.***> wrote:
@AnningCheng-NOAA <https://github.com/AnningCheng-NOAA> The new fix files
are now in all $FIX_DIRs on WCOSS-Dell, WCOSS-Cray, Hera, and rzdm. I'm
copying them to Orion and Jet this morning. Below is the listing of them on
Hera under $FIX_DIR/fix_aer and $FIX_DIR/fix_lut. I'm also putting a fresh
copy of $FIX_DIR on HPSS for our archival. You can now remove the paths to
FIX_AER and FIX_LUT in config.base.emc.dyn, thanks.
-bash-4.2$ ll /scratch1/NCEPDEV/global/glopara/fix/
total 108
-rwxr-xr-x 1 glopara global 160 Oct 3 2019 0readme
drwxr-sr-x 2 glopara global 4096 Feb 4 17:05 fix_aer
drwxr-sr-x 5 glopara global 61440 Dec 2 18:18 fix_am
drwxr-sr-x 5 glopara global 4096 Jun 10 2019 fix_chem
drwxr-sr-x 10 glopara global 4096 Jul 28 2017 fix_fv3
drwxr-sr-x 10 glopara global 4096 Dec 31 2017 fix_fv3_gmted2010
drwxr-xr-x 6 glopara global 4096 Dec 13 2019 fix_gldas
drwxr-sr-x 2 glopara global 4096 Feb 4 15:37 fix_lut
drwxr-sr-x 2 glopara global 4096 Aug 31 14:11 fix_orog
drwxr-sr-x 2 glopara global 4096 Sep 13 2019 fix_sfc_climo
drwxr-sr-x 4 glopara global 4096 May 11 2018 fix_verif
drwxr-sr-x 2 glopara global 4096 Oct 26 14:59 fix_wave_gfs
-bash-4.2$ ll /scratch1/NCEPDEV/global/glopara/fix/fix_aer/
total 12693072
-rwxr-xr-x 1 glopara global 1018901352 Feb 4 15:49 merra2.aerclim.2003-2014.m01.nc
-rwxr-xr-x 1 glopara global 1018901352 Feb 4 15:50 merra2.aerclim.2003-2014.m02.nc
-rwxr-xr-x 1 glopara global 1018901352 Feb 4 15:43 merra2.aerclim.2003-2014.m03.nc
-rwxr-xr-x 1 glopara global 1018901352 Feb 4 15:53 merra2.aerclim.2003-2014.m04.nc
-rwxr-xr-x 1 glopara global 1018901352 Feb 4 15:41 merra2.aerclim.2003-2014.m05.nc
-rwxr-xr-x 1 glopara global 1018901352 Feb 4 15:54 merra2.aerclim.2003-2014.m06.nc
-rwxr-xr-x 1 glopara global 1018901352 Feb 4 15:47 merra2.aerclim.2003-2014.m07.nc
-rwxr-xr-x 1 glopara global 1018901352 Feb 4 15:50 merra2.aerclim.2003-2014.m08.nc
-rwxr-xr-x 1 glopara global 1018901352 Feb 4 15:42 merra2.aerclim.2003-2014.m09.nc
-rwxr-xr-x 1 glopara global 1018901352 Feb 4 15:52 merra2.aerclim.2003-2014.m10.nc
-rwxr-xr-x 1 glopara global 1018901352 Feb 4 15:46 merra2.aerclim.2003-2014.m11.nc
-rwxr-xr-x 1 glopara global 1018901352 Feb 4 15:44 merra2.aerclim.2003-2014.m12.nc
-rwxr-xr-x 1 glopara global 64218936 Feb 4 15:52 merra2C.aerclim.2003-2014.m01.nc
-rwxr-xr-x 1 glopara global 64218936 Feb 4 15:52 merra2C.aerclim.2003-2014.m02.nc
-rwxr-xr-x 1 glopara global 64218936 Feb 4 15:46 merra2C.aerclim.2003-2014.m03.nc
-rwxr-xr-x 1 glopara global 64218936 Feb 4 15:45 merra2C.aerclim.2003-2014.m04.nc
-rwxr-xr-x 1 glopara global 64218936 Feb 4 15:54 merra2C.aerclim.2003-2014.m05.nc
-rwxr-xr-x 1 glopara global 64218936 Feb 4 15:54 merra2C.aerclim.2003-2014.m06.nc
-rwxr-xr-x 1 glopara global 64218936 Feb 4 15:47 merra2C.aerclim.2003-2014.m07.nc
-rwxr-xr-x 1 glopara global 64218936 Feb 4 15:52 merra2C.aerclim.2003-2014.m08.nc
-rwxr-xr-x 1 glopara global 64218936 Feb 4 15:45 merra2C.aerclim.2003-2014.m09.nc
-rwxr-xr-x 1 glopara global 64218936 Feb 4 15:46 merra2C.aerclim.2003-2014.m10.nc
-rwxr-xr-x 1 glopara global 64218936 Feb 4 15:45 merra2C.aerclim.2003-2014.m11.nc
-rwxr-xr-x 1 glopara global 64218936 Feb 4 15:45 merra2C.aerclim.2003-2014.m12.nc
-bash-4.2$ ll /scratch1/NCEPDEV/global/glopara/fix/fix_lut/
total 73428
-rwxr-xr-x 1 glopara global 202000 Jun 24 2019 optics_BC.v1_3.dat
-rwxr-xr-x 1 glopara global 461637 Jun 24 2019 optics_DU.v15_3.dat
-rwxr-xr-x 1 glopara global 73711072 Jun 24 2019 optics_DU.v15_3.nc
-rwxr-xr-x 1 glopara global 202000 Jun 24 2019 optics_OC.v1_3.dat
-rwxr-xr-x 1 glopara global 502753 Jun 24 2019 optics_SS.v3_3.dat
-rwxr-xr-x 1 glopara global 101749 Jun 24 2019 optics_SU.v1_3.dat
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#200 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALQPMIKS7BE7PFAISATJRJDS5P4QPANCNFSM4UICEJMA>
.
|
Anning, I took a look and your jobs have appropriate resource requests. My most recent C768C384L127 test on Orion used more resources since I ran with waves on so yours are fewer so it should be good. It's possible the queues are very busy and/or the compute account allocation you're using is nearing full. I see you're using fv3-cpu, it looks like it's close to its allocation (via saccount_params command):
You could try another compute account if you have access to another (check via saccount_params command) but if the queues are busy you'll keep waiting. |
@AnningCheng-NOAA @yangfanglin FYI after discussing the upcoming commit plan for develop with the other global-workflow code managers we have decided we are going to hold this work and PR #254 for a bit (~2-3 weeks). Since this PR moves the ufs-weather-model version forward to one that supports hpc-stack we want to get the other hpc-stack changes into develop before this. Please complete current testing and keep your branch synced with develop changes. You may leave the PR open. Thanks! |
HI, Kate:
I am running the cycling test at Hera. There is a error: "aircftbias_in
not found"
at /scratch1/NCEPDEV/stmp2/Anning.Cheng/ROTDIRS/mcyc/logs/2020020406/gdaseobs.log.
Exp dir: /scratch2/NCEPDEV/climate/Anning.Cheng/para_gfs/mcyc
Do you have any idea to fix the problem? Thanks!
Anning
…On Tue, Feb 9, 2021 at 11:59 AM Kate Friedman ***@***.***> wrote:
@AnningCheng-NOAA <https://github.com/AnningCheng-NOAA> @yangfanglin
<https://github.com/yangfanglin> FYI after discussing the upcoming commit
plan for develop with the other global-workflow code managers we have
decided we are going to hold this work and PR #254
<#254> for a bit (~2-3
weeks). Since this PR moves the ufs-weather-model version forward to one
that supports hpc-stack we want to get the other hpc-stack changes into
develop before this. Please complete current testing and keep your branch
synced with develop changes. You may leave the PR open. Thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#200 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALQPMINWLYHKTPPLLFEYBTDS6FSX5ANCNFSM4UICEJMA>
.
|
You need some IC files for the analysis, the missing file is one of them. This is missing: /scratch1/NCEPDEV/stmp2/Anning.Cheng/ROTDIRS/mcyc/gdas.20200204/00/atmos/gdas.t00z.abias_air Where did you get your ICs? You'll need to pull out the following files that came from the same or a companion tarball:
Point me to your IC source and I'll see where those four files are. Thanks! |
HI, Kate:
my ICs is at /scratch1/NCEPDEV/stmp2/Anning.Cheng/ROTDIRS/mcyc/. I
used UFS_UTILS to produce the cold start ICs. has the doc to produce the
ICs changed?
Anning
…On Tue, Feb 9, 2021 at 1:25 PM Kate Friedman ***@***.***> wrote:
You need some IC files for the analysis, the missing file is one of them.
This is missing:
/scratch1/NCEPDEV/stmp2/Anning.Cheng/ROTDIRS/mcyc/gdas.20200204/00/atmos/gdas.t00z.abias_air
Where did you get your ICs? You'll need to pull out the following files
that came from the same or a companion tarball:
- gdas.t00z.abias
- gdas.t00z.abias_air
- gdas.t00z.abias_pc
- gdas.t00z.radstat
Point me to your IC source and I'll see where those four files are. Thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#200 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALQPMIIRP7P7P257V7ZTDQTS6F4YZANCNFSM4UICEJMA>
.
|
I have just found those files
at /scratch1/NCEPDEV/stmp2/Anning.Cheng/ROTDIRS/mcyc/gdas.20200204/00, but
do not know if they are at the right path.
On Tue, Feb 9, 2021 at 1:32 PM Anning Cheng - NOAA Affiliate <
anning.cheng@noaa.gov> wrote:
… HI, Kate:
my ICs is at /scratch1/NCEPDEV/stmp2/Anning.Cheng/ROTDIRS/mcyc/. I
used UFS_UTILS to produce the cold start ICs. has the doc to produce the
ICs changed?
Anning
On Tue, Feb 9, 2021 at 1:25 PM Kate Friedman ***@***.***>
wrote:
> You need some IC files for the analysis, the missing file is one of them.
> This is missing:
>
>
> /scratch1/NCEPDEV/stmp2/Anning.Cheng/ROTDIRS/mcyc/gdas.20200204/00/atmos/gdas.t00z.abias_air
>
> Where did you get your ICs? You'll need to pull out the following files
> that came from the same or a companion tarball:
>
> - gdas.t00z.abias
> - gdas.t00z.abias_air
> - gdas.t00z.abias_pc
> - gdas.t00z.radstat
>
> Point me to your IC source and I'll see where those four files are.
> Thanks!
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#200 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ALQPMIIRP7P7P257V7ZTDQTS6F4YZANCNFSM4UICEJMA>
> .
>
|
Ah I see you have the files there, they are just not in the atmos folder:
Move all of those files (abias, abias_air, abias_int, abias_pc, loginc.txt, radstat) down into that atmos folder. Then retry your failed jobs. |
HI, Kate:
The cycling test of the workflow works well at Hera. But encounter a mpi
error from GSI at
Orion: /work/noaa/stmp/acheng/ROTDIRS/mcyc/logs/2020020406/gdasanal.log
The exp dir: /work/noaa/global/acheng/para_gfs/mcyc.
Could you take a look and find if anything is missing?
Thank you!
Anning
…On Tue, Feb 9, 2021 at 1:35 PM Kate Friedman ***@***.***> wrote:
Ah I see you have the files there, they are just not in the atmos folder:
-bash-4.2$ ll /scratch1/NCEPDEV/stmp2/Anning.Cheng/ROTDIRS/mcyc/gdas.20200204/00/
total 3286744
drwxr-xr-x 5 Anning.Cheng climate 20480 Feb 9 16:31 atmos
-rw-r--r-- 1 Anning.Cheng stmp 859665 Feb 9 07:24 gdas.t00z.abias
-rw-r--r-- 1 Anning.Cheng stmp 1082939 Feb 9 07:24 gdas.t00z.abias_air
-rw-r--r-- 1 Anning.Cheng stmp 859665 Feb 9 07:24 gdas.t00z.abias_int
-rw-r--r-- 1 Anning.Cheng stmp 917490 Feb 9 07:24 gdas.t00z.abias_pc
-rw-r--r-- 1 Anning.Cheng stmp 0 Feb 9 07:24 gdas.t00z.loginc.txt
-rwxr-x--- 1 Anning.Cheng stmp 3361832960 Feb 9 07:32 gdas.t00z.radstat
Move all of those files (abias, abias_air, abias_int, abias_pc,
loginc.txt, radstat) down into that atmos folder. Then retry your failed
jobs.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#200 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALQPMIIXSHUCSJPKSSPNPPTS6F56NANCNFSM4UICEJMA>
.
|
@AnningCheng-NOAA The cause of the error isn't jumping out at me, we usually see that error in the forecast jobs and not the analysis. @CatherineThomas-NOAA @CoryMartin-NOAA would you mind taking a look at Anning's failed analysis job on Orion? See log below. He is testing the system after adding support for MERRA2. Thanks! /work/noaa/stmp/acheng/ROTDIRS/mcyc/logs/2020020406/gdasanal.log |
I took a look, not totally sure but it seems like there is a problem reading the netCDF surface forecast files. Is there anything different in the sfcfNNN.nc files in this run than in a standard version? Are you able to rerun the gdasfcst from the previous cycle and try it again? This looks like the error we were having before where the model would write out 'bad' netCDF files that were then unreadable by GSI. |
I was just getting ready to say the same thing. The values of tref in the sfcfNNN.nc files look reasonable at least. @KateFriedman-NOAA does Orion have similar netCDF problems as Hera? |
@CatherineThomas-NOAA Not a frequently as Hera but yes. I looked back at my Orion runs since last May and found HDF errors in the efcs jobs of a CCPP run (last November) and in analysis jobs while I was testing port2orion last June. No HDF errors in any of the short cycled runs I've done since then. I'm starting to test the full system using hpc-stack so I'm keeping my eye out for these errors on both machines. |
Hi, Kate, There is no this error at Hera. I am rerunning the forecast to
see if this error insists as suggested by Cory and Catherine.
…On Thu, Feb 18, 2021 at 12:03 PM Kate Friedman ***@***.***> wrote:
does Orion have similar netCDF problems as Hera?
@CatherineThomas-NOAA <https://github.com/CatherineThomas-NOAA> Not a
frequently as Hera but yes. I looked back at my Orion runs since last May
and found HDF errors in the efcs jobs of a CCPP run (last November) and in
analysis jobs while I was testing port2orion last June. No HDF errors in
any of the short cycled runs I've done since then. I'm starting to test the
full system using hpc-stack so I'm keeping my eye out for these errors on
both machines.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#200 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALQPMII4M5HJYIMNEFKPTCTS7VB6VANCNFSM4UICEJMA>
.
|
All, the error is still there by rerunning the forecasts. I made a run
without MERRA2 and still get the
same error: /work/noaa/stmp/acheng/ROTDIRS/mcyco/logs/2020020406/gdasanal.log.
The rundir is /work/noaa/global/acheng/para_gfs/mcyco.
Kate, how is your test going?
On Thu, Feb 18, 2021 at 12:15 PM Anning Cheng - NOAA Affiliate <
anning.cheng@noaa.gov> wrote:
… Hi, Kate, There is no this error at Hera. I am rerunning the forecast to
see if this error insists as suggested by Cory and Catherine.
On Thu, Feb 18, 2021 at 12:03 PM Kate Friedman ***@***.***>
wrote:
> does Orion have similar netCDF problems as Hera?
>
> @CatherineThomas-NOAA <https://github.com/CatherineThomas-NOAA> Not a
> frequently as Hera but yes. I looked back at my Orion runs since last May
> and found HDF errors in the efcs jobs of a CCPP run (last November) and in
> analysis jobs while I was testing port2orion last June. No HDF errors in
> any of the short cycled runs I've done since then. I'm starting to test the
> full system using hpc-stack so I'm keeping my eye out for these errors on
> both machines.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#200 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ALQPMII4M5HJYIMNEFKPTCTS7VB6VANCNFSM4UICEJMA>
> .
>
|
My run of the system (feature/hpc-stack) on Hera using components all building with hpc-stack was successful. I did not see any HDF5 errors...but I only ran 2.5 cycles so far. The GSI master doesn't yet support hpc-stack on other machines so I can't perform the same test on Orion yet. @CatherineThomas-NOAA @CoryMartin-NOAA Is there a GSI branch with stack support for Orion that I can try? Thanks! |
Please see NOAA-EMC/GSI issue #110
<NOAA-EMC/GSI#110> for the status of hpc-stack on
non-Hera platforms. In addition to Hera, hpc-stack builds now exist for
WCOSS_D, Orion, and Jet. This is beta development. Use at your own
risk. No support is provided if you encounter problems.
…On Mon, Feb 22, 2021 at 9:40 AM Kate Friedman ***@***.***> wrote:
My run of the system (feature/hpc-stack) on Hera using components all
building with hpc-stack was successful. I did not see any HDF5 errors...but
I only ran 2.5 cycles so far. The GSI master doesn't yet support hpc-stack
on other machines so I can't perform the same test on Orion yet.
@CatherineThomas-NOAA <https://github.com/CatherineThomas-NOAA>
@CoryMartin-NOAA <https://github.com/CoryMartin-NOAA> Is there a GSI
branch with stack support for Orion that I can try? Thanks!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#200 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGNN635K6RLKW7UJD7ECZ6DTAJUFTANCNFSM4UICEJMA>
.
|
Thanks @RussTreadon-NOAA ! I'll try that branch on Orion and WCOSS-Dell to test global-workflow feature/hpc-stack. |
Ran stand-along GSI script in 3dvar mode on Orion using Anning's files for
2020020406 gdas case. global_gsi.x built from NOAA-EMC/GSI master and
forked hpc-stack branch ran to completion. Given this, submit stand-alone
GSI script using NOAA-EMC/GSI master in 4denvar mode. Job is pending in
the batch queue. Anning's run uses NOAA-EMC/GSI tag gfsda.v16.0.0.
Could try checking out master in
/work/noaa/global/acheng/gfsv16_ccpp/sorc/gsi.fd and rebuilding DA using
/work/noaa/global/acheng/gfsv16_ccpp/sorc/build_gsi.sh.
…On Mon, Feb 22, 2021 at 9:52 AM Kate Friedman ***@***.***> wrote:
Thanks @RussTreadon-NOAA <https://github.com/RussTreadon-NOAA> ! I'll try
that branch on Orion and WCOSS-Dell to test global-workflow
feature/hpc-stack.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#200 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGNN632BCNXJGP3X3QDCSLLTAJVTZANCNFSM4UICEJMA>
.
|
Note that
when it executes gdasanal. In contrast, /work/noaa/global/acheng/gfsv16_ccpp/sorc/gsi.fd builds DA with
The NOAA-EMC/GSI master also builds DA on Orion using Might the difference between the workflow build and run modules cause problems? |
FYI, a stand-alone GSI run script successfully ran the 2020020406 case on Orion using a |
Russ, glad to know. I will rebuild gsi and make a try. Where is your run
dir and submit dir?
…On Mon, Feb 22, 2021 at 7:56 PM RussTreadon-NOAA ***@***.***> wrote:
FYI, a stand-alone GSI run script successfully ran the 2020020406 case on
Orion using a global_gsi.x built from NOAA-EMC/GSI tag
release/gfsda.v16.0.0. The run script loads modulefile.ProdGSI.orion
found in gsi.fd/modulefiles.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#200 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALQPMIICBDE5DAGRLASC7LTTAL4LXANCNFSM4UICEJMA>
.
|
I used a stand-alone script, rungsi1534L127_debug.sh, not the rocoto
workflow. The script is in /work/noaa/da/Russ.Treadon/git/gsi/scripts.
File gsi1534.o1343532 in the same directory is the job log file. The job
ran in /work/noaa/stmp/rtreadon/tmp766/gfsda.v16.0.0.2020020406. I
submitted the script again using your global_gsi.x and your fix. This job
is waiting in the queue due to today's (2/23) maintenance. If this job
runs OK, the investigation shifts to the workflow side of things. I
wonder about the module mismatch. The modules the workflow loads to run
the gfsda.v16.0.0 global_gsi.x are not the same modules used to build
global_gsi.x. I guess I could mimic this mismatch in
rungsi1534L127_debug.sh and see if the global_gsi.x then fails.
On Tue, Feb 23, 2021 at 9:12 AM AnningCheng-NOAA <notifications@github.com>
wrote:
… Russ, glad to know. I will rebuild gsi and make a try. Where is your run
dir and submit dir?
On Mon, Feb 22, 2021 at 7:56 PM RussTreadon-NOAA ***@***.***
>
wrote:
> FYI, a stand-alone GSI run script successfully ran the 2020020406 case on
> Orion using a global_gsi.x built from NOAA-EMC/GSI tag
> release/gfsda.v16.0.0. The run script loads modulefile.ProdGSI.orion
> found in gsi.fd/modulefiles.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <
#200 (comment)
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/ALQPMIICBDE5DAGRLASC7LTTAL4LXANCNFSM4UICEJMA
>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#200 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGNN632BWB5254BFNHMDNJLTAOZU7ANCNFSM4UICEJMA>
.
|
Russ, I did rebuild global_gsi.x using the netcdfp/4.7.4 (in
gfsv16_ccpp/modulefiles/module_base.orion) and tried yesterday, but was not
successful.
On Tue, Feb 23, 2021 at 9:31 AM RussTreadon-NOAA <notifications@github.com>
wrote:
… I used a stand-alone script, rungsi1534L127_debug.sh, not the rocoto
workflow. The script is in /work/noaa/da/Russ.Treadon/git/gsi/scripts.
File gsi1534.o1343532 in the same directory is the job log file. The job
ran in /work/noaa/stmp/rtreadon/tmp766/gfsda.v16.0.0.2020020406. I
submitted the script again using your global_gsi.x and your fix. This job
is waiting in the queue due to today's (2/23) maintenance. If this job
runs OK, the investigation shifts to the workflow side of things. I
wonder about the module mismatch. The modules the workflow loads to run
the gfsda.v16.0.0 global_gsi.x are not the same modules used to build
global_gsi.x. I guess I could mimic this mismatch in
rungsi1534L127_debug.sh and see if the global_gsi.x then fails.
On Tue, Feb 23, 2021 at 9:12 AM AnningCheng-NOAA ***@***.***
>
wrote:
> Russ, glad to know. I will rebuild gsi and make a try. Where is your run
> dir and submit dir?
>
> On Mon, Feb 22, 2021 at 7:56 PM RussTreadon-NOAA <
***@***.***
> >
> wrote:
>
> > FYI, a stand-alone GSI run script successfully ran the 2020020406 case
on
> > Orion using a global_gsi.x built from NOAA-EMC/GSI tag
> > release/gfsda.v16.0.0. The run script loads modulefile.ProdGSI.orion
> > found in gsi.fd/modulefiles.
> >
> > —
> > You are receiving this because you were mentioned.
> > Reply to this email directly, view it on GitHub
> > <
>
#200 (comment)
> >,
> > or unsubscribe
> > <
>
https://github.com/notifications/unsubscribe-auth/ALQPMIICBDE5DAGRLASC7LTTAL4LXANCNFSM4UICEJMA
> >
> > .
> >
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <
#200 (comment)
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AGNN632BWB5254BFNHMDNJLTAOZU7ANCNFSM4UICEJMA
>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#200 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALQPMINVZJXVS2GKT2QKEBDTAO34JANCNFSM4UICEJMA>
.
|
I don't think this is the correct approach. The tests I ran indicate that
the gfsda.v16.0.0 build is OK. The workflow appears to be the problem,
not gfsda.v16.0.0. The modules the workflow loads when it runs anal differ
from the modules used to build global_gsi.x. You should try a test in
which you build global_gsi.x with the gfsda.v16.0.0 modules as is and then
modify the workflow modules. This is only a test since changing the
workflow modules may break other apps.
On Tue, Feb 23, 2021 at 9:39 AM AnningCheng-NOAA <notifications@github.com>
wrote:
… Russ, I did rebuild global_gsi.x using the netcdfp/4.7.4 (in
gfsv16_ccpp/modulefiles/module_base.orion) and tried yesterday, but was not
successful.
On Tue, Feb 23, 2021 at 9:31 AM RussTreadon-NOAA ***@***.***
>
wrote:
> I used a stand-alone script, rungsi1534L127_debug.sh, not the rocoto
> workflow. The script is in /work/noaa/da/Russ.Treadon/git/gsi/scripts.
> File gsi1534.o1343532 in the same directory is the job log file. The job
> ran in /work/noaa/stmp/rtreadon/tmp766/gfsda.v16.0.0.2020020406. I
> submitted the script again using your global_gsi.x and your fix. This job
> is waiting in the queue due to today's (2/23) maintenance. If this job
> runs OK, the investigation shifts to the workflow side of things. I
> wonder about the module mismatch. The modules the workflow loads to run
> the gfsda.v16.0.0 global_gsi.x are not the same modules used to build
> global_gsi.x. I guess I could mimic this mismatch in
> rungsi1534L127_debug.sh and see if the global_gsi.x then fails.
>
> On Tue, Feb 23, 2021 at 9:12 AM AnningCheng-NOAA <
***@***.***
> >
> wrote:
>
> > Russ, glad to know. I will rebuild gsi and make a try. Where is your
run
> > dir and submit dir?
> >
> > On Mon, Feb 22, 2021 at 7:56 PM RussTreadon-NOAA <
> ***@***.***
> > >
> > wrote:
> >
> > > FYI, a stand-alone GSI run script successfully ran the 2020020406
case
> on
> > > Orion using a global_gsi.x built from NOAA-EMC/GSI tag
> > > release/gfsda.v16.0.0. The run script loads modulefile.ProdGSI.orion
> > > found in gsi.fd/modulefiles.
> > >
> > > —
> > > You are receiving this because you were mentioned.
> > > Reply to this email directly, view it on GitHub
> > > <
> >
>
#200 (comment)
> > >,
> > > or unsubscribe
> > > <
> >
>
https://github.com/notifications/unsubscribe-auth/ALQPMIICBDE5DAGRLASC7LTTAL4LXANCNFSM4UICEJMA
> > >
> > > .
> > >
> >
> > —
> > You are receiving this because you were mentioned.
> > Reply to this email directly, view it on GitHub
> > <
>
#200 (comment)
> >,
> > or unsubscribe
> > <
>
https://github.com/notifications/unsubscribe-auth/AGNN632BWB5254BFNHMDNJLTAOZU7ANCNFSM4UICEJMA
> >
> > .
> >
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <
#200 (comment)
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/ALQPMINVZJXVS2GKT2QKEBDTAO34JANCNFSM4UICEJMA
>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#200 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGNN637OHQX3NRSJBU7UOLDTAO42LANCNFSM4UICEJMA>
.
|
The following test has been run on Orion.
The job successfully ran up to the specified 1 hour wall clock limit. The global_gsi.x was 2/3 of the way through the second outer loop when the system killed the job. No netcdf or hdf5 errors in job log file. Anning's run used 125 nodes for gdasanal. I reverted back to this, regenerated the xml, and resubmitted the 2021020406 gdasanal. The job is waiting in the queue. |
Russ, and all: I merged the workflow with the latest version, recompiled
the code, submitted mcyc, but has not resubmitted mcyco. The workflow is
running well now for both MERRA2 (mcyc) and OPAC (mcyco). I guess it is due
to the merging. Thank you! your comments are welcome.
…On Wed, Feb 24, 2021 at 10:19 AM RussTreadon-NOAA ***@***.***> wrote:
The following test has been run on Orion.
- copy "/work/noaa/global/acheng/para_gfs/mcyco" to
"/work/noaa/da/Russ.Treadon/para_gfs/mcyco". Update to run under in my PTMP
using acheng HOMEgfs
- populate "/work/noaa/stmp/rtreadon/ROTDIRS/mcyco" with files from
"/work/noaa/stmp//acheng/ROTDIRS/mcyco"
- rocotorewind and rocotoboot 2021020406 gdasanal. Job requested 125
nodes with lengthy estimated queue wait time so scancel and reduce analysis
job to 50 nodes and resubmit
The job successfully ran up to the specified 1 hour wall clock limit. The
global_gsi.x was 2/3 of the way through the second outer loop when the
system killed the job. No netcdf or hdf5 errors in job log file.
Anning's run used 125 nodes for gdasanal. I reverted back to this,
regenerated the xml, and resubmitted the 2021020406 gdasanal. The job is
waiting in the queue.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#200 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALQPMIOS3NLVYGXIVVAKS5TTAUKIHANCNFSM4UICEJMA>
.
|
My rerun of mcyco gdasanal for 2020020406 using 125 nodes ran overnight without any errors. The previous 50 node job was terminated after hitting to one hour wall clock. Based on minimization stats in the log file it was reproducing the output from the 125 node job. This makes sense. GSI results do not vary with task count. The queue wait time for a 50 node job is less than a 125 node job. You should examine resource settings in your parallel. You might get better throughput if you reduce the node (task) count and appropriately increase the wall clock limit. Based on your comments, Anning, it seems the gdasanal problem was not DA but something in the workflow or compilation. Is this correct? |
Yes, that is correct.
…On Thu, Feb 25, 2021 at 7:15 AM RussTreadon-NOAA ***@***.***> wrote:
My rerun of mcyco gdasanal for 2020020406 using 125 nodes ran overnight
without any errors. The previous 50 node job was terminated after hitting
to one hour wall clock. Based on minimization stats in the log file it was
reproducing the output from the 125 node job. This makes sense. GSI results
do not vary with task count. The queue wait time for a 50 node job is less
than a 125 node job. You should examine resource settings in your parallel.
You might get better throughput if you reduce the node (task) count and
appropriately increase the wall clock limit.
Based on your comments, Anning, it seems the gdasanal problem was not DA
but something in the workflow or compilation. Is this correct?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#200 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALQPMII4OLJOH7LJKYNWL6TTAY5N3ANCNFSM4UICEJMA>
.
|
Thanks for the confirmation. I'll stand down on this issue. |
PR #254 has been submitted and has closed this issue. Thank you @AnningCheng-NOAA for this addition and thank you @lgannoaa for testing/reviewing! Will send announcement to glopara listserv shortly. |
add 0.5 degree by 0.625 degree by 72 level ten-year MERRA2 aerosol climatological data as an option to replace the 5 degree by 5 degree OPAC aerosol data to drive radiation and microphysics. Initial tests have been performed using CCPP SCM. One year C768L127 free forecast run is being performed in DELL, HERA, and Orion, respectively.
The text was updated successfully, but these errors were encountered: