-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nan produced by Molecular Transfer near edge of box #467
Comments
Ok, so why are we getting infinite energy for one of the trials?
yes, you will have to do this. But first, we need to figure out what we are even getting an infinite energy. That should not happen. It would be better to set the weight of a very bad configuration to zero and keep it in the sum. |
One of the randomly generated trial coordinates is identical to a system coordinate, resulting in a distSq of 0. Then when we calculate the LJ energy inline double FFParticle::CalcEn(const double distSq, const uint index) const { rRat2 is undefined, which returns +Nan. This could be prevented with a simpler change of this if (forcefield.rCutSq < distSq) to this if (forcefield.rCutSq < distSq) The exact same coordinate to double precision is being generated at random, and makes me think the RNG isn't being incremented. Since this a restarted GOMC simulation, but not restart from checkpoint GOMC simulation, and an intseed is provided, GOMC reuses the same random numbers since both start from step 0. Basically on GOMC run 1, step 12 successfully transferred a molecule into position xyz (which is a function of step # and seed). Then on GOMC run 2, it attempted to transfer a molecule into position xyz (which is a function of step # and seed), which is the identical position as the molecule already there. Hence distSq == 0, and we divide by distSq in the LJ Particle inter calculation. I recommend we do a find/replace of if (forcefield.rCutSq < distSq) with if (forcefield.rCutSq < distSq) Also if (forcefield.rCutCoulombSq[b] < distSq) with if (forcefield.rCutCoulombSq[b] < distSq) For IEEE floats, division of a finite nonzero float by 0 is well-defined and results in +infinity (if the value was >zero) or -infinity (if the value was less than zero). The result of 0.0/0.0 is NaN. If you use integers, the behaviour is undefined. GOMC uses the c++ exp() function, which doesn't mind INF values, it's just NAN than cause GOMC to crash. |
Alternatively, we could print out an error informing the user they are reusing random numbers and their simulation is going to be correlated with the previous GOMC run they are restarting from. The error message would include instructions to pass in "InitStep lastStepPreviousRunGeneratedARestartFile" or "Checkpoint true checkpoint.file" in the conf file. |
I think it's a bad idea to allow someone to restart the simulation with the same random sequence/seed that was used at step 0. I can't think of a good reason for wanting to do this, so it would be done only by mistake. I think it would be better to generate an error in this case or just handle it in the code without user-intervention if possible. In any case, it would be a good idea to handle the degenerate case of distSq == 0.0. It seems, though, that any value that results in rejecting the move would be fine, so Infinity works. |
I agree with @LSchwiebert. Restarting with the same random number sequence as a prior simulation is bad practice, unless the goal is to debug something. But in that case, one would be using the checkpoint functionality, and expecting that behavior. BTW, for production runs for publications, we should be starting each simulation with a unique random number seed (e.g. seeded by the system time, or by a well-defined, unique seed). |
I think we should generate an error message and terminate if the user is restarting with the same random seed as described above. Alternatively, we could generate a warning if the user is running in expert mode, but we should be able to work around this for debugging purposes, such as using a checkpoint file or specifying the step as above, so that we get consistent random numbers for testing. We should include the information on how to fix the config file as mentioned above. |
In other words, I'm fine with generating an error in all cases, even expert mode. |
Describe the bug
The molecular transfer move when calculating the weights of the CBMC trials, sums all the weights of each trial. The problem is one of the trials has a value of Nan for inter energy and infinite real space energy, producing a nan weight causing GOMC to crash.
I was able to prevent this from happening by only adding the finite weights to the sum
double stepWeight = 0;
uint numFiniteLJTrials = calculateStepWeight(nLJTrials,
stepWeight,
ljWeights,
data->ff.beta,
inter,
real);
uint winner = prng.PickWeighted(ljWeights, nLJTrials, stepWeight);
newMol.UpdateOverlap(overlap[winner]);
newMol.MultWeight(stepWeight / numFiniteLJTrials);
normalizedStepWeight = sum/numberOfFiniteWeights.
I only did this for the DCSingle file. unfortunately these types of summations are peppered throughout the DC files, .
Also, I am concerned that I may need to track the numberOfFiniteWeights to maintain detailed balance.
To Reproduce
Running the attached GCMC simulation of the water shell with tip3 water.
bugFix_bad_CBMC_trial_Shouldnt_Kill_GOMC.zip
Expected behavior
If a trial produces an an extremely poor energy, weight it with weight 0; instead of crashing GOMC.
Screenshots
Green sphere indicates position of growing molecule.
Input files
build dev version and run with gomc_production....conf
Please complete the following information:
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: