-
Notifications
You must be signed in to change notification settings - Fork 868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quasi-RRHO Thermochemistry Analysis Module #2028
Conversation
from pymatgen.io.gaussian import GaussianOutput | ||
from pymatgen.io.qchem.outputs import QCOutput | ||
|
||
test_dir = os.path.join(os.path.dirname(__file__), "..", "..", "..", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using module level test_dir, pls inherit from PymatgenTest for unit tests and then use self.test_dir instead. This will ensure correct behavior. Thanks.
@arepstein is this PR ready for review to be merged? |
I believe so. The only pending update in my mind is automated detection of the rotational symmetry number, but I think it's okay as an input parameter for now. |
pymatgen/analysis/quasirrho.py
Outdated
from pymatgen.io.qchem.outputs import QCOutput | ||
|
||
# Define useful constants | ||
kb = 1.380662E-23 # J/K |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these already defined in pymatgen.core.units?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like yes! I was not aware of pymatgen units -- will fix and commit the changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of the units I'm using are slightly different and don't lend themselves well to the subclasses in pymatgen.core.units. For example, converting amuangs^2 to kgm^2 for moments of inertia. Would you recommend still using Units subclasses when I can?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend supplementing the existing units if you don't have the ones you want.
To be specific, I'm talking about the units constants in pymatgen.core.units specifically, rather than suggesting you use the classes in that file.
The reason for this recommendation is simply that if we expand our unit functionality later (we've explored a few options here), it's a lot easier for us to spot where units are being used in pymatgen if unit constants are imported from the same place.
Tried to use more built-in units. Added functionality for checking if linear and adjusting rotational entropy accordingly. Added testing for linear molecule
Hey @arepstein , I get to know this PR in today's subgroup meeting. I am just wondering how the functionalities here comparing to those in the Goodvibes package https://github.com/bobbypaton/GoodVibes? |
Thanks for pointing this out, I didn't know about Goodvibes when I put in this PR. This PR should be identical to the Grimme Quasi-RRHO approximation for the entropy that's implemented in GoodVibes, but does not include any other methods that GoodVibes implements. One big difference I see in terms of implementation into pymatgen infrastructure is that this PR can calculate Quasi-RRHO entropeis for Q-Chem output files, Gaussian output files, or manual input parameters, which is useful for Atomate integration. It would be a good idea to check that this agrees with GoodVibes. |
for more information, see https://pre-commit.ci
Tried to make avg_mom_inertia an internal function
Moved get_avg_mom_inertia outside the class
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together @arepstein ! I used it recently and I think it's a very useful addition to the molecular DFT infrastructure in pymatgen. I made a few comments; if you can pull in the latest pymatgen and address these hopefully we can get @janosh or another maintainer to review soon; I know this one has been open for a while.
pymatgen/analysis/quasirrho.py
Outdated
# Define useful conversion factors | ||
kcal2hartree = 0.0015936 # kcal/mol to hartree/mol | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to define this conversion factor using scipy.constants
? If not, just make sure you carry enough decimal places to not lose precision due to the very different magnitudes of Ha and kcal.
pymatgen/analysis/quasirrho.py
Outdated
class QuasiRRHO: | ||
""" | ||
Class to calculate thermochemistry using Grimme's Quasi-RRHO approximation. | ||
All outputs are in atomic units. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be explicit about what atomic units are? For thickheaded engineers like myself, I have to go and google this :)
|
||
class QuasiRRHO: | ||
""" | ||
Class to calculate thermochemistry using Grimme's Quasi-RRHO approximation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May add the citation to the paper in this docstring so that it's visible in, e.g., Jupyter Notebook and documentation pages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pymatgen/analysis/quasirrho.py
Outdated
:param output: Requires input of a Gaussian output file, | ||
QChem output file, or dictionary of necessary inputs: | ||
{"mol": Molecule, "mult": spin multiplicity (int), | ||
"frequencies": list of vibrational frequencies [a.u.], | ||
elec_energy": electronic energy [a.u.]} | ||
:param sigma_r (int): Rotational symmetry number | ||
:param temp (float): Temperature [K] | ||
:param press (float): Pressure [Pa] | ||
:param conc (float): Solvent concentration [M] | ||
:param v0 (float): Cutoff frequency for Quasi-RRHO method [cm^1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, please convert to a google style docstring
Since all of the outputs are stored as class attributes, it would also be especially nice to have a Attributes
section that describes each output, with units
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, you could define python @property
methods for each of the relevant outputs (e.g., g_corrected
, h_corrected`, which would let you give each one an explicit docstring. Each property would not do anything other than contain the docstring, e.g.
@property
def g_corrected(self):
'''
Corrected free energy in Ha
'''
return self.g_corrected
so the only benefit to this structure would be visibility of the outputs to users
See also some small edits I made when I used the code, in a PR against your fork: arepstein#1 |
I will also note for other reviewers that although the GoodVibes package contains similar functionality, it only accepts Gaussian output files, and hence is not useful with our high-throughput Q-Chem infrastructure. So this PR is valuable because it gives us those capabilities. |
@arepstein Is this still WIP? |
Hey @rkingsbury and @janosh, I've incorporated Ryan's edits and recommendations. While making these edits it became clear that a Class might not be the best choice for QuasiRRHO. As it stands now, it might be better as a simple function. Ideally, I think it could be nice to have a That being said, hopefully QuasiRRHO is still useful for now and can be a starting point for later changes if desired. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @arepstein and nice work on this. Should be a good addition to pymatgen.
pymatgen/analysis/quasirrho.py
Outdated
from pymatgen.io.qchem.outputs import QCOutput | ||
|
||
# Define useful constants | ||
kb = kb_ev * const.eV # Pymatgen kb [J/k] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[J/k] -> [J/K]
kcal2hartree = 1000 * const.calorie / const.value("Hartree energy") / const.Avogadro | ||
|
||
|
||
def get_avg_mom_inertia(mol): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required for this PR, but I wonder if this should be made a @Property of the Molecule
class? Any strong opinions one way or another?
|
||
class QuasiRRHO: | ||
""" | ||
Class to calculate thermochemistry using Grimme's Quasi-RRHO approximation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pymatgen/analysis/quasirrho.py
Outdated
ev *= R | ||
etot = (et + er + ev) * kcal2hartree / 1000 | ||
self.h_corrected = etot + R * self.temp * kcal2hartree / 1000 | ||
molarity_corr = R_ha * self.temp * np.log(R_volume * self.temp * self.conc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the molarity correction is RT log (RTC)
. Shouldn't it just be RT log(C)
? I may be ignorant, but please double check / confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this formulation implicitly assumes a standard state of 1 M. It might be good to state that in the docstring in case someone is using a calc with a different standard state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following Wheeler's script for the concentration correction. I believe it comes from a thermodynamic cycle for cluster continuum models that leads to an extra RT log(R_volume T)
. It's equations 7 and 8 in the Ho and Coote pKa paper.
Thanks for the point about the 1M assumption. I will add that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for clarifying. So looking at that section of the paper:
I'm pretty sure the extra -RT log [R~T]
term is there to adjust the reference states (the paragraph above and the different units of R and R~ are the clues). The solvation energy assumes 1 mol/L in both gas and solution, so you need a correction for that difference. The numerical value of R~T is 24.43 atm / (mol/L), which is the same value one might use to make a standard state correction.
So I think it should be RT ln C
, and I'm glad we have the standard state in the docstring to hopefully call attention to the need to be careful interpreting the number. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your explanation of the source of the extra term makes sense to me, but I don't understand why we shouldn't include the standard state correction when providing a concentration-corrected free energy in solvent. If RTln(concentration) uses mol/L, shouldn't we also correct for the 1 atm standard state used in the DFT code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point. To me what makes it tricky is that you only need to do the standard state correction when you are writing a reaction energy between species with different reference states, but this module just computes a single correction for a single species. (in other words, the reason for the correction is to make sure all species use the same reference state). We don't necessarily know that users of this class are going to take this free energy and combine it with a gas phase energy.
The other small issue is that I think not every DFT code necessarily uses the same reference state. But I think both Q-Chem and Gaussian use 1 atm for gas phase calcs, and as long as we document what we're doing I'm not too worried about that.
Personally I think it will be more transparent and less confusing to users to just do RT ln C and let them adjust as needed if they have to change reference states. But again, as long as we document what's going on it could be OK to build it in.
If you want to build it in, I suggest writing it as ( RT log ( C) - RT log (1/24.55) )
and adding a comment line to make explicitly clear that you're dividing C by the molarity of an ideal gas at 1 atm.
Do others like @samblau or @espottesmith or @materialsproject/second-foundation have thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely agree that we should be more specific in the documentation, and I'll put that in. Maybe with a reference to describe it in detail if I can find a really good one. I also think of the free energy in implicit solvent as already containing an assumed change in free energy from vacuum to solution, and would thus include the standard state correction when using implicit solvent. Is that correct?
I would also appreciate input from everyone you tagged!
I think another option would be to just entirely remove the concentration correction for now. It is somewhat just a legacy from converting a script that was useful to me into Pymatgen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely agree that we should be more specific in the documentation, and I'll put that in. Maybe with a reference to describe it in detail if I can find a really good one. I also think of the free energy in implicit solvent as already containing an assumed change in free energy from vacuum to solution, and would thus include the standard state correction when using implicit solvent. Is that correct?
I just think of the free energy of an isolated molecule as "DFT energy (~ 0K enthalpy) plus T-dependent enthalpy and entropy".
My understanding is that implicit solvent models adjust the free energy for interactions between the solute and the medium but do not account for changes in standard state (vac -> solution). Unless we're talking about formation free energy or a reaction energy, the free energy of an isolated molecule begs the question "relative to what"? And you don't have to introduce a standard/reference state until you answer that question. I know some solvent models output a solvation (reaction) energy, but I believe that implicitly assumes the same reference state for both vacuum and solvent. There's nothing inherently wrong with that but it's often not what the end user needs for comparison to experiment.
I think another option would be to just entirely remove the concentration correction for now. It is somewhat just a legacy from converting a script that was useful to me into Pymatgen.
Yeah, the more I think about it, that might be the cleanest thing to do here. This module is for calculating QRRHO thermochemistry, which (unless I'm mistaken) does not specifically have anything to do with an implicit solvent model and does not make any assumptions about the concentration; it's just a way of dampening low-frequency contributions to rot and vib entropy. So in the same way that standard thermochemistry output in Gaussian/Q-Chem doesn't make any adjustments for concentration, maybe this shouldn't either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not entirely convinced, mainly because implicit solvent models are made to capture solvation free energies, but I definitely understand what you're saying. I'd love to have this conversation offline more too, given how fundamentally important it is to get these corrections correct, but I agree that that can be beyond the scope of this PR. I'll remove it for now and maybe if Pymatgen wants to start recommending free energy treatments we can implement more general thermochemistry!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to discuss offline!
fixed typo in units added duecredit decorator
pymatgen/analysis/quasirrho.py
Outdated
@@ -83,7 +84,7 @@ class QuasiRRHO: | |||
Attributes: | |||
temp (float): Temperature [K] | |||
press (float): Pressure [Pa] | |||
conc (float): Solvent concentration [M] | |||
conc (float): Solvent concentration. Assumes 1M unless specified [M] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add "Concentration correction is reference to a 1 M standard state."
Removed solvent concentration corrections from the code so the class only deals with QuasiRRHO corrections
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #2028 +/- ##
=========================================
Coverage ? 74.06%
=========================================
Files ? 230
Lines ? 69403
Branches ? 16161
=========================================
Hits ? 51403
Misses ? 14957
Partials ? 3043 ☔ View full report in Codecov by Sentry. |
I think this is ready to merge, right @arepstein ? @janosh can you remove the 'stale' label? |
Yes, I believe ready to merge! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's been a recent change of the pymatgen
test folder structure. The file test_files/molecules/co2.log
now needs to in tests/files/...
. Would also be good to gzip
this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR @arepstein! 👍
Thanks @janosh and @rkingsbury for all the help, edits, and revisions! |
Summary
Added quasirrho.py to the analysis subpackage to calculate the Quasi-RRHO free energy from a Gaussian or QChem frequency calculation.
Additional dependencies introduced (if any)
*None
TODO (if any)
Before a pull request can be merged, the following items must be checked:
Run pycodestyle and flake8
on your local machine.
Run pydocstyle on your code.
to type check your code.
Note that the CI system will run all the above checks. But it will be much more
efficient if you already fix most errors prior to submitting the PR. It is
highly recommended that you use the pre-commit hook provided in the pymatgen
repository. Simply
cp pre-commit .git/hooks
and a check will be run prior toallowing commits.