Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store unit system in H5MD file #3751

Merged
merged 11 commits into from
Jun 12, 2020
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ repos:
always_run: false
files: '.*\.(py|pyx|pxd)'
exclude: '\.pylintrc|.*.\.py\.in'
args: ["--ignore=E266,W291,W293", "--in-place", "--aggressive"]
args: ["--ignore=E266,E701,W291,W293", "--in-place", "--aggressive"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E701 causes dataclasses members to be formatted on a new line:

-        force: str = dataclasses.field(
-            init=False, default='')
+        force:
+            str = dataclasses.field(
+                init=False, default='')

This change causes a syntax error:

1559   File "/builds/espressomd/espresso/build/src/python/espressomd/io/writer/h5md.py", line 36
1560     force:
1561          ^
1562 SyntaxError: invalid syntax

E701 cannot be disabled on a per-line basis. I tried using line continuation symbols \ to work around it, without success. I think we should permanently disable it, even if we remove dataclasses, because it might cause us trouble again in the future, and troubleshooting this error took me a while.


- id: cmake-format
name: cmake-format
Expand Down
36 changes: 24 additions & 12 deletions doc/sphinx/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,8 @@ details). Currently |es| supports some basic functions for writing simulation
data to H5MD files. The implementation is MPI-parallelized and is capable
of dealing with varying numbers of particles.

To write data in a hdf5-file according to the H5MD proposal (https://nongnu.org/h5md/), first an object of the class
To write data in a hdf5-file according to the H5MD proposal (https://nongnu.org/h5md/),
first an object of the class
:class:`espressomd.io.writer.h5md.H5md` has to be created and linked to the
respective hdf5-file. This may, for example, look like:

Expand Down Expand Up @@ -164,7 +165,10 @@ to the :class:`~espressomd.io.writer.h5md.H5md` class. Currently available:
- ``write_mass``: particle masses

- ``write_ordered``: if particles should be written ordered according to their
id (implies serial write).
id (implies serial write)

- ``unit_system``: optionally, physical units for time, mass, length and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about other SI base units that are defined in the H5MD documentation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other SI units (current, temperature, moles, candela) do not seem to apply for espresso particles. The per-particle temperature is more of a trick for tailored Langevin dynamics. Same thing for derived units. Torque could have been relevant, but isn't defined.

electrical charge.

In simulations with varying numbers of particles (MC or reactions), the
size of the dataset will be adapted if the maximum number of particles
Expand All @@ -176,29 +180,35 @@ dataset for the ids to track which position/velocity/force/type/mass
entry belongs to which particle. To write data to the hdf5 file, simply
call the H5md object :meth:`~espressomd.io.writer.h5md.H5md.write` method without any arguments.

.. code:: python

h5.write()


After the last write call, you have to call the
:meth:`~espressomd.io.writer.h5md.H5md.close` method to remove
the backup file, close the datasets, etc.

H5MD files can be read and modified with the python module h5py (for
documentation see `h5py <https://docs.h5py.org/en/stable/>`_). For example,
all positions stored in the file called "h5mdfile.h5" can be read using:
all positions stored in the file called "sample.h5" can be read using:

.. code:: python

import h5py
h5file = h5py.File("h5mdfile.h5", 'r')
h5file = h5py.File("sample.h5", 'r')
positions = h5file['particles/atoms/position/value']

If the data was stored with `physical units
<https://nongnu.org/h5md/modules/units.html>`_, they can be accessed with:

.. code:: python

positions.attrs['unit']
forces = h5file['particles/atoms/force/value']
forces_unit = forces.attrs['unit']
sim_time = h5file['particles/atoms/id/time']
print('last frame: {:.3f} {}'.format(sim_time.value[-1], sim_time.attrs['unit'].decode('utf8')))

Furthermore, the files can be inspected with the GUI tool hdfview or visually with the
H5MD VMD plugin (see `H5MD plugin <https://github.com/h5md/VMD-h5mdplugin>`_).

For other examples, see :file:`/samples/h5md.py`
For an example involving physical units, see :file:`/samples/h5md.py`.


.. _Writing MPI-IO binary files:
Expand Down Expand Up @@ -351,10 +361,12 @@ the system's particles.
fp = open('trajectory.vcf', mode='w+t')
vtf.writevcf(system, fp, types='all')

.. _vtf_pid_map\: Going back and forth between |es| and VTF indexing:
.. _vtf_pid_map\: Going back and forth between ESPResSo and VTF indexing:

``vtf_pid_map``: Going back and forth between |es| and VTF indexing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
:meth:`espressomd.io.writer.vtf.vtf_pid_map`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Generates a dictionary which maps |es| particle ``id`` to VTF indices.
This is motivated by the fact that the list of |es| particle ``id`` is allowed to contain *holes* but VMD
requires increasing and continuous indexing. The |es| ``id`` can be used as *key* to obtain the VTF index as the *value*, for example:
Expand Down
9 changes: 6 additions & 3 deletions samples/h5md.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,14 @@
if i > 0:
system.part[id].add_bond((fene, id - 1))

system.integrator.run(steps=0)
h5_units = h5md.UnitSystem(time='ps', mass='u', length='nm', charge='e')
h5_file = h5md.H5md(filename="sample.h5", write_pos=True, write_vel=True,
write_force=True, write_species=True, write_mass=False,
write_charge=True, write_ordered=True)
for i in range(1):
write_charge=True, write_ordered=True, unit_system=h5_units)

for i in range(2):
h5_file.write()
system.integrator.run(steps=10)

h5_file.flush()
h5_file.close()
34 changes: 22 additions & 12 deletions src/core/io/writer/h5md_core.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -149,18 +149,18 @@ void File::init_filestructure() {

dataset_descriptors = {
// path, dim, type
{"particles/atoms/box/edges", 1, type_double},
{"particles/atoms/mass/value", 2, type_double},
{"particles/atoms/charge/value", 2, type_double},
{"particles/atoms/id/value", 2, type_int},
{"particles/atoms/id/time", 1, type_double},
{"particles/atoms/id/step", 1, type_int},
{"particles/atoms/species/value", 2, type_int},
{"particles/atoms/position/value", 3, type_double},
{"particles/atoms/velocity/value", 3, type_double},
{"particles/atoms/force/value", 3, type_double},
{"particles/atoms/image/value", 3, type_int},
{"connectivity/atoms", 2, type_int},
{"particles/atoms/box/edges", 1, type_double, m_length_unit},
{"particles/atoms/mass/value", 2, type_double, m_mass_unit},
{"particles/atoms/charge/value", 2, type_double, m_charge_unit},
{"particles/atoms/id/value", 2, type_int, ""},
{"particles/atoms/id/time", 1, type_double, m_time_unit},
{"particles/atoms/id/step", 1, type_int, ""},
{"particles/atoms/species/value", 2, type_int, ""},
{"particles/atoms/position/value", 3, type_double, m_length_unit},
{"particles/atoms/velocity/value", 3, type_double, m_velocity_unit},
{"particles/atoms/force/value", 3, type_double, m_force_unit},
{"particles/atoms/image/value", 3, type_int, ""},
{"connectivity/atoms", 2, type_int, ""},
};
}

Expand Down Expand Up @@ -195,6 +195,10 @@ void File::create_datasets(bool only_load) {
H5Pset_create_intermediate_group(lcpl_id, 1);
datasets[path] = h5xx::dataset(m_h5md_file, path, descr.type, dataspace,
storage, lcpl_id, H5P_DEFAULT);
// write only units attribute when the value is non-zero.
if (descr.unit.length() > 0) {
h5xx::write_attribute(datasets[path], "unit", descr.unit);
}
}
}
if (!only_load)
Expand Down Expand Up @@ -259,6 +263,12 @@ void File::create_new_file(const std::string &filename) {
auto h5md_author_group = h5xx::group(h5md_group, "author");
h5xx::write_attribute(h5md_author_group, "name", "N/A");

if (is_unit_system_defined()) {
auto h5md_unit_module = h5xx::group(h5md_group, "modules/units");
std::vector<int> h5md_unit_module_version = {1, 0};
write_attribute(h5md_unit_module, "version", h5md_unit_module_version);
}

bool only_load = false;
create_datasets(only_load);

Expand Down
20 changes: 20 additions & 0 deletions src/core/io/writer/h5md_core.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,18 @@ class File {
// the dataset in the order of ids (possibly slower on output for many
// particles).
bool &write_ordered() { return m_write_ordered; };

// Unit system
bool is_unit_system_defined() {
return m_time_unit.length() > 0 && m_mass_unit.length() > 0 &&
m_length_unit.length() > 0;
}
std::string &time_unit() { return m_time_unit; };
std::string &mass_unit() { return m_mass_unit; };
std::string &length_unit() { return m_length_unit; };
std::string &force_unit() { return m_force_unit; };
std::string &velocity_unit() { return m_velocity_unit; };
std::string &charge_unit() { return m_charge_unit; };
/**
* @brief Method to force flush to h5md file.
*/
Expand Down Expand Up @@ -179,10 +191,18 @@ class File {
boost::filesystem::path m_absolute_script_path = "nullptr";
h5xx::file m_h5md_file;

std::string m_time_unit;
std::string m_length_unit;
std::string m_mass_unit;
std::string m_force_unit;
std::string m_velocity_unit;
std::string m_charge_unit;

struct DatasetDescriptor {
std::string path;
hsize_t dim;
h5xx::datatype type;
std::string unit;
};
std::vector<std::string> group_names;
std::vector<DatasetDescriptor> dataset_descriptors;
Expand Down
72 changes: 56 additions & 16 deletions src/python/espressomd/io/writer/h5md.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,53 +14,85 @@
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
#
"""Interface module for the H5md core implementation."""


import sys

from ...script_interface import PScriptInterface # pylint: disable=import
from ...code_info import features

if 'H5MD' not in features():
class H5md:
class UnitSystem:
def __init__(self, *args, **kwargs):
raise RuntimeError("UnitSystem not available.")

class H5md:
def __init__(self, *args, **kwargs):
raise RuntimeError("H5md not available.")
else:
class UnitSystem:
"""
Data class for writing H5MD trajectories with
`physical units <https://nongnu.org/h5md/modules/units.html>`.
There are four settable units: 'mass', 'length', 'time', 'charge'.
Units should be written as strings following the specifications defined
`here <https://nongnu.org/h5md/modules/units.html#unit-string>`,
e.g. ``UnitSystem(time='ps', mass='u', length='nm', charge='e')``.
"""

def __init__(self, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could still use type annotations if you wanted to...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no need to re-implement dataclasses :) providing a different type already throws a meaningful error message: RuntimeError: Provided argument of type int is not convertible to std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >

self.mass = ''
self.time = ''
self.length = ''
self.charge = ''
for key, value in kwargs.items():
assert hasattr(self, key), 'unknown dimension ' + key
setattr(self, key, value or '')

if self.length and self.mass and self.time:
self.force = '{} {} {}-2'.format(self.length,
self.mass, self.time)
else:
self.force = ''
if self.length and self.time:
self.velocity = '{} {}-1'.format(self.length, self.time)
else:
self.velocity = ''

class H5md:

"""H5md file object.

Used for accessing the H5MD core implementation via the
PScriptInterface.
Used for accessing the H5MD core implementation.

.. note::
Bonds will be written to the file automatically if they exist.

Parameters
----------
filename : :obj:`str`
Name of the trajectory file.
Name of the trajectory file.
write_pos : :obj:`bool`, optional
If positions should be written.
If positions should be written.
write_vel : :obj:`bool`, optional
If velocities should be written.
If velocities should be written.
write_force : :obj:`bool`, optional
If forces should be written.
If forces should be written.
write_species : :obj:`bool`, optional
If types (called 'species' in the H5MD specification) should be written.
If types (called 'species' in the H5MD specification) should be written.
write_mass : :obj:`bool`, optional
If masses should be written.
If masses should be written.
write_charge : :obj:`bool`, optional
If charges should be written.
If charges should be written.
write_ordered : :obj:`bool`, optional
If particle properties should be ordered according to
ids.
If particle properties should be ordered according to ids.
unit_system : :obj:`UnitSystem`, optional
Physical units for the data.

"""

def __init__(self, write_ordered=True, **kwargs):
def __init__(self, write_ordered=True, unit_system=None, **kwargs):
self.valid_params = ['filename', "write_ordered"]
if 'filename' not in kwargs:
raise ValueError("'filename' parameter missing.")
Expand All @@ -84,16 +116,24 @@ def __init__(self, write_ordered=True, **kwargs):
raise ValueError(
"Unknown parameter {} for H5MD writer.".format(i))

if unit_system is None:
unit_system = UnitSystem('', '', '', '')
self.h5md_instance = PScriptInterface(
"ScriptInterface::Writer::H5mdScript")
self.h5md_instance.set_params(filename=kwargs['filename'],
what=self.what_bin,
scriptname=sys.argv[0],
write_ordered=write_ordered)
write_ordered=write_ordered,
mass_unit=unit_system.mass,
length_unit=unit_system.length,
time_unit=unit_system.time,
force_unit=unit_system.force,
velocity_unit=unit_system.velocity,
charge_unit=unit_system.charge)
self.h5md_instance.call_method("init_file")

def get_params(self):
"""Get the parameters from the scriptinterface."""
"""Get the parameters from the script interface."""
return self.h5md_instance.get_params()

def write(self):
Expand Down
6 changes: 6 additions & 0 deletions src/script_interface/h5md/h5md.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,12 @@ class H5mdScript : public AutoParameters<H5mdScript> {
add_parameters({{"filename", m_h5md->filename()},
{"scriptname", m_h5md->scriptname()},
{"what", m_h5md->what()},
{"mass_unit", m_h5md->mass_unit()},
{"time_unit", m_h5md->time_unit()},
{"length_unit", m_h5md->length_unit()},
{"force_unit", m_h5md->force_unit()},
{"velocity_unit", m_h5md->velocity_unit()},
{"charge_unit", m_h5md->charge_unit()},
{"write_ordered", m_h5md->write_ordered()}});
};

Expand Down
Loading