-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating to Nicil v2.1 #115
Conversation
nonideal test suite yields inconsistent results between my mac, my repo, and this repo. This issue is necessarily challenging to track down, so a work in progress. |
…ndition was causing test failures
The MPI failure appears to be due to the cluster and not the code. Note that the MPI tests pass on my mac. The issue outlined in my initial comment 2 was a result of the race condition that existed in the calculation of divcurlB. After further investigation, the error appeared both using and not using MPI. When we calculate divcurlB after we calculate density, then all tests consistently pass, with MPI and not-MPI providing the same result. The error caused by the race condition can be up to a few percent. At the moment, we divcurlB is always calculated after density, but I have left in the original code currently turned off with a switch so that we can use the fast calculation if required. divcurlB is required for non-ideal MHD and for diagnostics, so it might be worth using the new routine only when using non-ideal MHD. The attached graph shows the result of the wave-dampening test, where the symbols represent calculating divcurlB in densityiterate with the race condition (once); calculating divcurlB in densityiterate, but calling the subroutine twice in a row (twice); calculating divcurlB after densityiterate as in the current version pushed to the repo (mhd). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the update here. However, we need to fix the race condition in a more elegant way.
One other idea here is that it might be nice to include NICIL in phantom using the git submodules feature -- i.e. the subdirectory is just the NICIL git repo updated to a particular tag. This would save having to have copies of NICIL code in the phantom repo and would make it easier to stay in sync.
There's a lot going on in this pull request so would be good to separate some of the issues if possible...
src/main/dens.F90
Outdated
@@ -1608,7 +1590,7 @@ subroutine store_results(icall,cell,getdv,getdb,realviscosity,stressmax,xyzh,& | |||
! | |||
! store div B, curl B and related quantities | |||
! | |||
if (mhd .and. iamgasi) then | |||
if (mhd .and. iamgasi .and. mhd_racecondition) then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh, surely we can do without a flag for a bug...
src/lib/NICIL/src/nicil_real4.F90
Outdated
@@ -0,0 +1,228 @@ | |||
!----------------------------------------------------------------------! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like overkill, surely would be better to just send dummy variables to/from nicil and translate back to double/single precision in phantom rather than creating a whole dummy interface?
the point is not that we want to use single precision in phantom, but that precision in phantom can be changed (e.g. to real*16 in principle also)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but it inexplicably does not compile when I exclude the dummy interface. I get many
Warning: Possible change of value in conversion from REAL(8) to REAL(4) at (1) [-Wconversion]
errors before it just stops. Annoyingly, these errors persist for many subroutines in Phantom itself and the code still compiles. If I remove a few of the offending lines, then the compilation will just stop on earlier ones. These lines have nothing to do with the nicil-phantom interface, and there are no kind-declarations within nicil itself (although Nicil's Makefile does force it to be compiled in double precision.
Any advice on how to fix this more cleanly is appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretty sure these warnings would just be masking another problem. However, happy to leave it for the moment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I've tried to track it down, but with no luck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problem solved (had to force some parameters to be real*8); commit made and the dummy file has been deleted.
build/Makefile
Outdated
@@ -1781,7 +1785,7 @@ SOURCES= physcon.f90 ${CONFIG} ${SRCKERNEL} io.F90 units.f90 boundary.f90 \ | |||
mpi_dens.F90 mpi_force.F90 stack.F90 mpi_derivs.F90 kdtree.F90 linklist_kdtree.F90 ${SRCTURB} \ | |||
${SRCPHOTO} ${SRCINJECT} ${SRCKROME} memory.F90 ${SRCREADWRITE_DUMPS} \ | |||
quitdump.f90 ptmass.F90 \ | |||
readwrite_infile.F90 dens.F90 force.F90 utils_deriv.f90 deriv.F90 energies.F90 sort_particles.F90 \ | |||
readwrite_infile.F90 dens.F90 mhd_divcurlB.F90 force.F90 utils_deriv.f90 deriv.F90 energies.F90 sort_particles.F90 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is mhd_divcurlB.f90 ??
@@ -2785,7 +2783,7 @@ subroutine finish_cell_and_store_results(icall,cell,fxyzu,xyzh,vxyzu,poten,dt,dv | |||
endif | |||
#endif | |||
if (mhd_nonideal) then | |||
call nimhd_get_dudt(dudtnonideal,etaohmi,etaambii,rhoi,curlBi,Bxyzi(1:3)) | |||
call nicil_get_dudt_nimhd(dudtnonideal,etaohmi,etaambii,rhoi,curlBi,Bxyzi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason this belongs in the nicil library rather than being a phantom routine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just always has been a Nicil subroutine. Anything used to calculate non-ideal MHD terms in a method-independent way is included in nicil itself.
src/main/mhd_divcurlB.F90
Outdated
! this needs to be performed after the density is updated to prevent race | ||
! conditions. For simplicity, we have copied dens.F90, and stripped out | ||
! everything that is not related to the calculation of divcurlB. | ||
! To calculate divcurlB in dens (permitting the race condition), set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit yuck. The point is that we do density iterations, so that after a couple of iterations the density is already known inside compute_density, so the density is initially just a guess, but on the 2nd or 3rd iteration it is known to the accuracy of tolh. We should NOT need a completely separate routine to evaluate div B and curl B here...
src/main/mhd_divcurlB.F90
Outdated
|
||
end subroutine store_results | ||
!-------------------------------------------------------------------------- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a LOT of repeated code in the above, I cannot merge this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reply covers all previous comments about the bug & new routine.
I remember that you, Terry and I discussed the calculation of divcurlB when we made the switch to evolving B/rho rather than B. At the time, we agreed that we could continue to include the calculation in dens since the error introduced would be small. The comment in dens implies that the error is due to hi not being converged, but the issue is that hj might not have been updated yet, so we are using densities at inconsistent times. Since this is a race condition, the results are not reproducible.
Indeed, the error seem to be less than a percent, and only appeared when I updated nicil since it was failing the testsuite, and I figured it would be better to delve into the problem rather than simply increase the tolerance on the relevant tests.
Due to the small error, and that divcurlB is used only for diagnostics, the revised cleaning timestep and non-ideal MHD, I left the original code in with a flag so we can use the fast code if we don't need accuracy. We might even want to set the flag such that the old version is used except for only non-ideal MHD.
I would be happy to find a more elegant solution, but at the moment, I don't know what that is. If we want the accuracy, then we need to calculate divcurlB after the density update is complete, and that is what the new subroutine does. Given the MPI commands, it was simpler to use dens as a template rather than starting from scratch; I understand this is yuck, but I don't see a way around it if we want to maintain accuracy. This new structure -- update density, update divcurlB, calculate forces -- is the same as what is implemented in sphNG.
I am open to suggestions on how to clean this up.
mpi failure is fixed, but issues pending from review are blocking merge at present |
Re git modules: Re 'A lot going on': |
You can create multiple branches on your fork and pull request each of them as required. Then any new commits will go only to the pull request for that branch. (In general, it is seen as good practice to always work on branches rather than master.) |
…ferent things each time
… timestep in hydro sims
Commit f016c85 failed due to internal git issues, not my code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just one final change here and I think we're good to merge. Namely, I think fast_divcurlb should be true by default for ideal MHD, since this should have almost no effect. Happy if you want to make it false for non-ideal MHD where we use curl B directly in the induction equation.
src/lib/NICIL/src/nicil_real4.F90
Outdated
@@ -0,0 +1,228 @@ | |||
!----------------------------------------------------------------------! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretty sure these warnings would just be masking another problem. However, happy to leave it for the moment
@@ -149,6 +149,14 @@ subroutine derivs(icall,npart,nactive,xyzh,vxyzu,fxyzu,fext,divcurlv,divcurlB,& | |||
if (icall==1) then | |||
call densityiterate(1,npart,nactive,xyzh,vxyzu,divcurlv,divcurlB,Bevol,& | |||
stressmax,fxyzu,fext,alphaind,gradh,rad,radprop,dvdx) | |||
if (.not. fast_divcurlB) then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is definitely better than cutting-and-pasting code from the densityiterate routine, if you really really think we need the .not.fast_divcurlB option then this seems like the least-worst solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll modify it so it's only true for non-ideal MHD. Despite it being a very small error, I would prefer to leave this in for non-ideal MHD, since I will (hopefully) shortly be running some non-ideal sims in Phantom with Ian.
if (use_massfrac) then | ||
call write_inopt(massfrac_X, 'massfrac_X', 'Hydrogen mass fraction',iunit) | ||
call write_inopt(massfrac_Y, 'massfrac_Y', 'Helium mass fraction',iunit) | ||
endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity, are the mass fractions still parameters for the thermal ionisation and if so how are they set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
massfrac have been removed, since it was typically calculated based upon abundances set within nicil. In the newer version, instead of proxy ions whose mass is calculated based upon abundances of a few species, we explicitly model the ionisiation of several chemical species (see appendix A of Wurster 2021).
src/main/config.F90
Outdated
#else | ||
logical, parameter :: mhd = .false. | ||
logical, parameter :: fast_divcurlB = .true. ! do not toggle; multiple calculations controlled by this flag, and this is faster when excluding MHD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest we NOT make fast_divcurlb false by default in MHD. The only thing this is used for in ideal MHD is the divergence cleaning where it does not matter if we get div b wrong by a few percent. Would be better to do this only if non-ideal MHD is used, which is a much more restricted use-case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
Requested changes have been made, and all tests pass. |
thanks for the update @jameswurster! |
This commit updates the non-ideal MHD library to Nicil v2.1 (see Appendix A of Wurster (2021).
Notes:
2a) This inexplicably fails the MPI pipeline test; it appears that there is a memory allocation issue upon execution, so I don't think it has anything to do with my changes. The code passes the MPI test suite when run locally on my Mac. Edit: when tested at pull request, the MPI test passed
2b) In test_nonidealmhd, we get different values for dt_courant for MPI=yes vs MPI=no. I am adding this to the open issue