Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Molecular Basis #175

Closed
wants to merge 14 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
177 changes: 167 additions & 10 deletions iodata/basis.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@

import attr
import numpy as np
from .overlap_cartpure import tfs

from .attrutils import validate_shape

Expand All @@ -37,11 +38,13 @@

def _alsolist(f):
"""Wrap a function to accepts also list as first argument and then return list."""

@wraps(f)
def wrapper(firsts, *args, **kwargs):
if isinstance(firsts, (Integral, str)):
return f(firsts, *args, **kwargs)
return [f(first, *args, **kwargs) for first in firsts]

return wrapper


Expand Down Expand Up @@ -88,7 +91,7 @@ def angmom_its(angmom: Union[int, List[int]]) -> Union[str, List[str]]:
@attr.s(auto_attribs=True, slots=True,
on_setattr=[attr.setters.validate, attr.setters.convert])
class Shell:
"""A shell in a molecular basis representing (generalized) contractions with the same exponents.
"""Contracted shell representing (generalized) contractions with the same exponents.

Attributes
----------
Expand All @@ -102,14 +105,19 @@ class Shell:
and 'p' for pure. Pure functions are only allowed for angmom>1.
The length equals the number of contractions: len(angmoms)=ncon.
exponents
The array containing the exponents of the primitives, with shape (nprim,).
The array containing the exponents of the primitives, with shape (nprim,),
where nprim is the number of primitive contracted shells.
coeffs
The array containing the coefficients of the normalized primitives in each contraction;
shape = (nprim, ncon).
These coefficients assume that the primitives are L2 (orbitals) or L1
(densities) normalized, but contractions are not necessarily normalized.
(This depends on the code which generated the contractions.)

Notes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot overlap between this section and the documentation in https://github.com/theochem/iodata/blob/master/doc/basis.rst
Could you refer to this page instead of adding this Notes section? It is impractical to maintain the same documentation in different places. A link can be created as follows:

Basis set conventions and terminology are documented in :ref:`basis_conventions`.

If something needs to be updated in basis.rst, feel free to improve it.

-----
Basis set conventions and terminology are documented in :ref:`basis_conventions`.

"""

icenter: int
Expand All @@ -119,7 +127,7 @@ class Shell:
coeffs: np.ndarray = attr.ib(validator=validate_shape(("exponents", 0), ("kinds", 0)))

@property
def nbasis(self) -> int: # noqa: D401
def nbasis(self) -> int: # noqa: D401
"""Number of basis functions (e.g. 3 for a P shell and 4 for an SP shell)."""
result = 0
for angmom, kind in zip(self.angmoms, self.kinds):
Expand All @@ -132,20 +140,23 @@ def nbasis(self) -> int: # noqa: D401
return result

@property
def nprim(self) -> int: # noqa: D401
"""Number of primitives, also known as the contraction length."""
def nprim(self) -> int: # noqa: D401
"""Number of primitives in the contracted shells, also known as the contraction length"""
return len(self.exponents)

@property
def ncon(self) -> int: # noqa: D401
"""Number of contractions. This is usually 1; e.g., it would be 2 for an SP shell."""
def ncon(self) -> int: # noqa: D401
"""Number of contractions with distinct angular momentum numbers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be updated to correctly reflect all possible scenarios. Some basis sets also use generalized contractions in which the same angular momentum is occurring several times, e.g. ANO basis sets or the ones used in CP2K.


This is usually 1; e.g., it would be 2 for an SP shell.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be helpful to refer to the term "segmented" in case this is 1.

"""
return len(self.angmoms)


@attr.s(auto_attribs=True, slots=True,
on_setattr=[attr.setters.validate, attr.setters.convert])
class MolecularBasis:
"""A complete molecular orbital or density basis set.
"""A complete molecular orbital or density basis set as a collection of contracted shells.

Attributes
----------
Expand Down Expand Up @@ -190,19 +201,32 @@ class MolecularBasis:
The normalization convention of primitives, which can be 'L2' (orbitals) or 'L1'
(densities) normalized.

Methods
-------
get_segmented
Return molecular basis object that is segmented.
get_decontracted
Return molecular basis object that is decontracted.
convert_kind
Return molecular basis object whose kinds are the same type.

Notes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above. I'd suggest to refer to the basis set documentation page instead of adding notes here. (It might be useful to add a glossary to that document.)

-----
Basis set conventions and terminology are documented in :ref:`basis_conventions`.

"""

shells: List[Shell]
conventions: Dict[str, str]
primitive_normalization: str

@property
def nbasis(self) -> int: # noqa: D401
def nbasis(self) -> int: # noqa: D401
"""Number of basis functions."""
return sum(shell.nbasis for shell in self.shells)

def get_segmented(self):
"""Unroll generalized contractions."""
"""Convert Generalized Molecular Basis to Segmented Molecular Basis."""
shells = []
for shell in self.shells:
for angmom, kind, coeffs in zip(shell.angmoms, shell.kinds, shell.coeffs.T):
Expand All @@ -211,6 +235,139 @@ def get_segmented(self):
# pylint: disable=no-member
return attr.evolve(self, shells=shells)

def get_decontracted(self):
r"""Convert to Decontracted Molecular Basis from a Molecular Basis."""
shells = []
for shell in self.shells:
for i, (angmom, kind) in enumerate(zip(shell.angmoms, shell.kinds)):
for exponent, coeff in zip(shell.exponents, shell.coeffs[:, i]):
shells.append(
Shell(shell.icenter, [angmom], [kind], np.array([exponent]),
coeff.reshape(-1, 1))
)
# pylint: disable=no-member
return attr.evolve(self, shells=shells)

def convert_kind(self, to: str):
r"""
Convert all contractions from Pure (or Cartesian) kind to Cartesian (or Pure).

Parameters
----------
to
Specifying which one to convert to. It is either "c" for Cartesian or "p" for Pure.

Raises
------
ValueError :
If "to" is not recognized to be either "c" or "p".

Notes
-----
- Some of these conversion will cause the contraction coefficients to be
different across different shells. In such a scenario,
this function will average all coefficients so that the shell-type inside a contracted
shell will have the same coefficients.
Comment on lines +267 to +270
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this problem only show up for generalized contractions?


See Also
--------
convert_primitive_kind : Converts primitive shells without averaging.

"""
if to not in ("c", "p"):
raise ValueError("The to string was not recognized: %s" % to)

shells = []
for shell in self.shells:
kind_new = []
coeffs_new = []

for angmom, kind, coeffs in zip(shell.angmoms, shell.kinds, shell.coeffs.T):
# Need to convert the kind of shell. S-type doesn't need to change.
if kind != to and angmom >= 2:
if kind == "c":
b = np.repeat(coeffs[np.newaxis], tfs[angmom].shape[1], axis=0)
coeff = tfs[angmom].dot(b)
# Take the mean, since the coefficients vary between primitives.
coeff = np.mean(coeff, axis=0)
elif kind == "p":
b = np.repeat(coeffs[np.newaxis], tfs[angmom].shape[0], axis=0)
# Solve it using least-squared projection.
coeff = np.linalg.lstsq(tfs[angmom], b, rcond=None)[0]
# Take the mean, since the coefficients vary between primitives.
coeff = np.mean(coeff, axis=0)
kind_new.append(to)
coeffs_new.append(coeff)
Comment on lines +287 to +300
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should the coefficients be updated? The transformation matrices are normally used to transform integrals (or MO coefficients). I don't see how they appear in the contraction coefficients. (I may be missing something.)

# No need to convert.
else:
kind_new.append(to)
coeffs_new.append(coeffs)

shells.append(
Shell(
shell.icenter, shell.angmoms, kind_new, shell.exponents, np.array(coeffs_new).T
)
)
# pylint: disable=no-member
return attr.evolve(self, shells=shells)


def convert_primitive_kind(angmom: int, kind: str, coeffs: np.ndarray, to: str) -> np.ndarray:
r"""
Convert coefficients in Cartesian to Pure or vice-versa of a Primitive shell.

Parameters
----------
angmom
Integer representing the total angular momentum number.
kind
String representing "p" for Pure primitives and "c" for Cartesian primitives.
coeffs
Coefficients of the contraction of type `kind`. Shape is (N,) where
N is the number of primitives with angular momentum `angmom` and type `kind`.
For example, angmom=2 and kind="c" implies N=6,
The order should match the conventions.
to
Specifying which one to convert to. It is either "c" for Cartesian or "p" for Pure.

Returns
-------
new_coeffs
Coefficients in the new basis (either Cartesian or Pure).

Examples
--------
Suppose one has the following contraction with angular momentum 2 and kind is Cartesian with
integers (2, 0, 0) and (1, 1, 0):

.. math::
c_1 x^2 y^0 z^0 e^{-\alpha r^2} + c_2 x y z^0 e^{-\alpha r^2}

Then the coefficients of this in the right convention [xx, xy, xz, yy, yz, zz] is

>> coeff = [c_1, c_2, 0, 0, 0, 0]

The basis function conventions dictionary are documented in :ref:`basis_conventions and
in the MolecularBasis class documentation.

To convert this Cartesian contraction to Pure type, one would do

>> new_coeffs = convert_primitive_kind(1, "c", coeff, "p")

"""
if to not in ("p", "c"):
raise ValueError("The to string was not recognized: %s" % to)
if kind not in ("p", "c"):
raise ValueError("The kind string was not recognized: %s" % kind)
if to != kind:
if kind == "c":
coeff = tfs[angmom].dot(coeffs)
else:
# Solve it using least-squared projection.
coeff = np.linalg.lstsq(tfs[angmom], coeffs, rcond=None)[0]
return coeff
return coeffs


def convert_convention_shell(conv1: List[str], conv2: List[str], reverse=False) \
-> Tuple[np.ndarray, np.ndarray]:
Expand Down
65 changes: 64 additions & 1 deletion iodata/overlap.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,9 @@
from .basis import convert_conventions, iter_cart_alphabet, MolecularBasis
from .basis import HORTON2_CONVENTIONS as OVERLAP_CONVENTIONS

__all__ = ['OVERLAP_CONVENTIONS', 'compute_overlap', 'gob_cart_normalization']
__all__ = [
'OVERLAP_CONVENTIONS', 'compute_overlap', 'gob_cart_normalization', "convert_vector_basis"
]


# pylint: disable=too-many-nested-blocks,too-many-statements
Expand Down Expand Up @@ -223,3 +225,64 @@ def gob_cart_normalization(alpha: np.ndarray, n: np.ndarray) -> np.ndarray:
"""
return np.sqrt((4 * alpha) ** sum(n) * (2 * alpha / np.pi) ** 1.5
/ np.prod(factorial2(2 * n - 1)))


def convert_vector_basis(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this is a general way of transforming MO coefficients from one basis to another, but it is overkill (and not always numerically stable) for the type of basis set modifications in this PR. Moreover, it assumes that the type of overlap matrix needed (between basis1 and basis2) can be computed, which is not the case (yet) with IOData alone.

coeffs1: np.ndarray,
basis2_overlap: np.ndarray,
basis21_overlap: np.ndarray
) -> np.ndarray:
r"""
Convert a vector from basis 1 of size M to another basis 2 of size N.

Parameters
----------
coeffs1:
Coefficients of the vector expanded in basis set 1. Shape is (M,).
basis2_overlap:
Symmetric matrix whose entries are the inner-product between basis vectors
inside basis set 2. Shape is (N, N).
basis21_overlap:
The overlap matrix between basis set 1 and basis set 2. Shape is (N, M).

Returns
-------
coeffs2 :
Coefficients of the vector expanded in basis set 2. Shape is (N,).

Raises
------
ValueError :
If shapes of the matrices don't match the requirements in the docs.
LinAlgError :
If least squared solution does not converge.

Notes
-----
- Basis vectors are defined to be linearly independent wrt to one another
and need not be orthogonal.

- `basis2_overlap` is the matrix with (i, j)th entries
:math:`\braket{\psi_i, \psi_j}` where :math:`\psi_i` are in basis set 2.

- `basis21_overlap` is the matrix whose (i, j)th entries are
:math:`\braket{\psi_i, \phi_j}` where :math:`\phi_j` are in basis set 1 and
:math:`\psi_i` are in basis set 2.

- If `basis2_overlap` is not full rank, then least-squared solution is solved instead. This
is effectively solving :math:`||b - Ax||`.

"""
if basis2_overlap.shape[0] != basis2_overlap.shape[1]:
raise ValueError("The `basis2_overlap` should be a square matrix.")
if np.any(np.abs(basis2_overlap.T - basis2_overlap) > 1e-10):
raise ValueError("The `basis2_overlap` should be a symmetric matrix.")

b = basis21_overlap.dot(coeffs1)
try:
# Try solving exact solution if `basis12_overlap` is full rank.
coeffs2 = np.linalg.solve(basis2_overlap, b)
except np.linalg.LinAlgError:
# Try least-squared solution.
coeffs2, _, _, _ = np.linalg.lstsq(basis2_overlap, b)
return coeffs2
Loading