-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: fix bounds fix_bound_violations = .true. seems to be required for ifort #709
Comments
no bounds fails with module intel/2024.0.2 (ifort (IFORT) 2021.11.1 20231117) without fp-model precise
no bounds fails with ifx intel-oneapi/2024.0.2 ifx (IFX) 2024.0.2 20231213 without fp-model precise same across core counts.
|
Helen,
I have a strong sense of deja-vu about this. Have we possibly identified
things before where fp-precise was required for various intel versions?
Is fix_bound_violations needed to get the cases with fp-precise to run
successfully?
Do the cases that do not duplicate across PE count duplicate when the same
PE count is run repeatedly?
Jeff
…On Tue, Aug 6, 2024 at 1:27 PM Helen Kershaw ***@***.***> wrote:
no bounds fails with module intel/2024.0.2 (ifort (IFORT) 2021.11.1
20231117) without fp-model precise
8.83596691025763 ;
8.26235748376639 ;
8.41808494868261 ;
ifx intel-oneapi/2024.0.2 ifx (IFX) 2024.0.2 20231213 without fp-model
precise same across core counts.
7.67172341333618 ;
7.67172341333618 ;
7.67172341333618 ;
7.67172341333618 ;
—
Reply to this email directly, view it on GitHub
<#709 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANDHUISJS57B5XLBROQ7YKLZQEPQVAVCNFSM6AAAAABMC6DF4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZRHE4TENJYGI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
yup this is a reoccurrence of what I was seeing on my old laptop with ifort. It would be cooler if I'd recorded the version on my now dead laptop. fix_bounds_violations does not seem to be needed with fp-model precise (haven't got it to fail (yet)) |
…ntered and gives a 'Failed to converge for quantile' when quantile==0 see issue #709 Q. Is this a floating point comparison error? (plus short circuting the if statement)
Note on B Gaubert's cam-chem(?) runs. These were done with fix_bound_violations = .true. rather than fix_bound_violations = .false. as originally thought. So clamping, rather than probit enforcing the bounds. ( sd == 0 so you never transform into (or back out of) probit space.) /glade/derecho/scratch/hkershaw/DART/CAM-out-of-bounds/Rean_run is using the reanalysis runs #749 |
Note I have not separated out varying results across pe counts (QCEFF vs no QCEFF vs what would be expected). |
🐛 Your bug may already be reported!
Please search on the issue tracker before creating a new issue.
Describe the bug
/glade/derecho/scratch/hkershaw/DART/Bugs/bgunn_qceff/DART/models/lorenz_96_tracer_advection/work
Following https://github.com/NCAR/DART/blob/l96_tracer_tests/models/lorenz_96_tracer_advection/work/TESTS/TEST_DRIVER.csh
reported by Ben Gunn: (thanks @Benjamin-Gunn !)
https://github.com/Benjamin-Gunn/DART/blob/l96_tracer_tests/models/lorenz_96_tracer_advection/work/TESTS/TEST_DRIVER.csh
qceff_table_filename = 'one_below_qceff_table.csv'
&filter_nml
inf_flavor = 5, 5,
&model_nml
model_size = 120,
forcing = 8.0,
delta_t = 0.05,
mean_velocity = 0.0,
pert_velocity_multiplier = 5.0,
diffusion_coef = 0.0,
e_folding = 0.25,
sink_rate = 0.1,
source_rate = 100.0,
point_tracer_source_rate = 5.0,
positive_tracer = .false.,
bound_above_is_one = .true.,
time_step_days = 0,
time_step_seconds = 3600,
/
What was the expected outcome?
not expected
fix_bound_violations = .true.
to be required so often.What actually happened?
Failures for "Ensemble member greater than upper bound first check" at various pe counts.
You can set:
&probit_transform_nml
fix_bound_violations = .true.
/
however, you still get different answers across mpi counts.
varying pe count:
7.95979093017264 ;
8.02126025256388 ;
8.55748257662756 ;
varying pe count with -fp-model-precise
8.62082489125036 ;
8.62082489125036 ;
8.62082489125036 ;
not sure how different is ok with the varying pe count.
Note: I cannot reproduce the bounds violations with -fp-model-precise
Todo @HKershaw intel/2024.0.2, ifx, vs gfortran
Error Message
3 mpi tasks: (also happens with 8,7 (without post_inf), 40(without post_inf))
Here is the code:
DART/assimilation_code/modules/assimilation/bnrh_distribution_mod.f90
Lines 292 to 300 in 75cf8dc
Which model(s) are you working with?
lorenz_96_tracer advaction.
/glade/derecho/scratch/hkershaw/DART/Bugs/bgunn_qceff/DART/models/lorenz_96_tracer_advection/work
Version of DART
v11.5.1
Have you modified the DART code?
No
Build information
Please describe:
The text was updated successfully, but these errors were encountered: