-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_cb05 CVODE convergence fails when MPI=OFF and different rates #143
Comments
Hi @cguzman95, For the MPI=OFF tests are you still compiling with mpiifort? My guess is that there could be a bug in the test. A couple things to try/note:
|
Yes.
In ebi yes, in KPP I'm not sure. But CAMP and KPP should be independent. It could be an error on KPP that stops the execution of CAMP? But it's strange that only appears when MPI is ON... |
I'm not sure what you mean by KPP stopping the execution of CAMP. Is this in the test? It seems from the output that KPP just printed the convergence failure message and let the test continue, but it would be possible for KPP to just exit the whole test (although I don't think it does this). The fact that there are NaN rates in KPP seems like there must be some problem with the way the initial conditions are being passed to KPP. I would check the test code, particularly for blocks affected by |
Thanks. From your deduction and the hints I apported, seems is only a problem of KPP. But I must add one more clue (the one that bring me here): When executing test_cb05_monarch (the same cb05 with all the monarch input),with MPI ON and MPI OFF the results differs during the EBI comparison with CAMP. With MPI ON the test passes succesfully: But with MPI=OFF: The message of convergence failed could be perfectly from KPP, but the problem is that now the test fails with different results in CAMP by only disabling the MPI flag. I think it's something related with photo_rates because the test works fine if you set these rates to zero. |
ah, ok - yeah seems like it could be a problem. could you somehow output the photolysis rates during the solving? to compare with EBI and between the MPI=ON/OFF? I would also fix the KPP problem, so you can compare among the three, because EBI includes parameterizations that aren't in KPP or CAMP that can affect the results. |
it also could be that whatever the problem is with KPP and MPI=OFF is also a problem with getting conditions to EBI or CAMP, but that the problem is showing up as a difference in results rather than a solver failure |
Printing the This can sounds strange, but the error is not happening on mn4, only on p9 . CMake flags for mn4 are:
Change the C and F flags to the same than p9 configuration doesn't make a change. Maybe is an error with the gcc compiler? Or an error that only shows gcc? |
Another discoverement, more related with the MONARCH bug but also related with the photolysis rate: (with MPI=ON and photo_rates=X),I just enable the FAILURE_DETAIL flag, and in test_cb05 CVODE returns an error of convergence: "mxstep steps taken before reaching tout.", even when the final results are pretty similar from EBI. Seems that when the photolysis rates are not homogeneus (0 or 0.01), CVODE doesn't converge. Me and Oriol are checking the setting of photolysis rates, maybe some values are wrong. |
Hi @mattldawson,
Let me put in context: This error can be easily view through the branch
chem_mod_testcb05_monarch
. This branch adds different photo_rates to test_cb05 (extracted from a monarch experiment), and has also an extra test_cb05 file to test_cb05 with all the monarch input values (same photo_Rates, temp, press, timestep and concs).CMake flags::
Then, testing test_cb05 with:
It converges with a expected difference respect on EBI.
But using same config and MPI=OFF:
I'm not sure if is a error from test_cb05 or from CAMP.
It also happens by running the file test_cb05_monarch (wich has the complete monarch config). As an extra detail (maybe this is produced by another bug or by the same one, I'm not sure), using this config with MPI=ON it takes a lot on converge the first time-step (~3 seconds) :
The text was updated successfully, but these errors were encountered: