Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TrajectoryData.get_step_data throws error if trajectory does not contain cells array #4170

Closed
lorisercole opened this issue Jun 9, 2020 · 5 comments · Fixed by #5734
Closed
Labels
Milestone

Comments

@lorisercole
Copy link
Contributor

TrajectoryData.get_step_data throws an error if the trajectory does not contain an array cells.
This is due to a typo here:

cells = self.get_cells()
if cells is not None:
cell = cells[index, :, :]
return (self.get_stepids()[index], time, cell, self.symbols, self.get_positions()[index, :, :], vel)

cell will not be defined if cells is None.

Side note:
once the bug above if fixed, if a trajectory does not contain cells, the method TrajectoryData.get_step_structure will return a StructureData with the default cell matrix

[[1.0, 0.0, 0.0],
[0.0, 1.0, 0.0],
[0.0, 0.0, 1.0]]

Should the user be warned about this (perhaps not intended) behavior? or is it ok, since it is the default cell value of StructureData?

@sphuber
Copy link
Contributor

sphuber commented Jun 10, 2020

[index, :, :]

What the hell is this syntax even? That really threw me off, but just realize this is numpy notation for slicing. I wonder why they chose a different syntax over the normal python one. Maybe to distinguish it.

Anyway, I got side-tracked.

Should the user be warned about this (perhaps not intended) behavior? or is it ok, since it is the default cell value of StructureData?

I didn't know this was the actual behavior of the StructureData class and personally am not a fan of that behavior. But maybe this is done for the use case of molecules where you don't really care about the cell and don't want to have to enter a box everytime. Then again, I think that shows that a periodic crystal structure and an isolated molecule deserve their own data types. I don't think we can change the StructureData constructor now, but what you can do is emit a UserWarning in get_step_structure if the cell is None. @giovannipizzi what do you think?

@lorisercole
Copy link
Contributor Author

lorisercole commented Jun 10, 2020

Another question: the method get_structure essentially does the same job as get_step_structure, but calling the CalcFunction _get_aiida_structure_inline that calls get_step_structure:

def get_structure(self, store=False, **kwargs):
"""
Creates :py:class:`aiida.orm.nodes.data.structure.StructureData`.
.. versionadded:: 1.0
Renamed from _get_aiida_structure
:param converter: specify the converter. Default 'ase'.
:param store: If True, intermediate calculation gets stored in the
AiiDA database for record. Default False.
:return: :py:class:`aiida.orm.nodes.data.structure.StructureData` node.
"""
from aiida.orm.nodes.data.dict import Dict
from aiida.tools.data.array.trajectory import _get_aiida_structure_inline
param = Dict(dict=kwargs)
ret_dict = _get_aiida_structure_inline(trajectory=self, parameters=param, metadata={'store_provenance': store}) # pylint: disable=unexpected-keyword-arg
return ret_dict['structure']

Is there a reason to keep both methods when they could be merged to one?

(In fact, the documentation is also wrong: it does not use the parameter converter nor ASE)

@calcfunction
def _get_aiida_structure_inline(trajectory, parameters):
"""
Creates :py:class:`aiida.orm.nodes.data.structure.StructureData` using ASE.
.. note:: requires ASE module.
"""
kwargs = {}
if parameters is not None:
kwargs = parameters.get_dict()
if 'index' not in kwargs.keys() or kwargs['index'] is None:
raise ValueError('Step index is not supplied for TrajectoryData')
return {'structure': trajectory.get_step_structure(**kwargs)}

I can add a fix to the PR #4171, let me know. :)

@greschd
Copy link
Member

greschd commented Jun 10, 2020

[index, :, :]

What the hell is this syntax even? That really threw me off, but just realize this is numpy notation for slicing. I wonder why they chose a different syntax over the normal python one. Maybe to distinguish it.

Numpy indexing is fantastic (but sometimes also fantastically confusing), you should treat yourself to https://numpy.org/devdocs/user/basics.indexing.html some time. I'm pretty sure there are some things in there that wouldn't work if each axis had a separate __getitem__ or __setitem__ call.

@giovannipizzi
Copy link
Member

  • Thanks @lorisercole for the bug report!
  • Indeed, the cell = diag([1,1,1]) is for the case you want to treat molecules and don't care about the box. I know it's not very elegant but that's the best we could design at the time. Ok with me to add a warning.
  • Regarding get_structure vs get_step_structure - I guess this was to facilitate keeping track of the provenance by default when calling get_structure; but there are good reasons not to call it as a calcfunction, e.g. if you just need to do some in-memory post-processing. We cannot change this before 2.0 - but if we feel it's confusing we can deprecate one of the two.
  • @lorisercole feel free to make the PR to fix the docs if they are wrong!
  • [OT] I agree with @greschd that it's probably needed to have that type of slicing. E.g. I'm not sure array[1,:,2] could ever be efficient (or even work?) if written as array[1][:][2]. Anyway, this is for another discussion... ;-)

@lorisercole
Copy link
Contributor Author

  • Indeed, the cell = diag([1,1,1]) is for the case you want to treat molecules and don't care about the box. I know it's not very elegant but that's the best we could design at the time. Ok with me to add a warning.

Ok thanks. I'll add a warning then.
What about adding an optional parameter custom_cell to the method?

   def get_step_structure(self, index, custom_kinds=None, custom_cell=None):

If the user specifies a cell, it will be used/overridden to generate the StructureData; otherwise the cell of the trajectory will be used (if present).
If no cell is found/specified, the structure will have the default cell diag([1,1,1]) and we issue a warning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants