Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[minor] Recursive to/from_dict #1602

Merged
merged 4 commits into from
Aug 19, 2024
Merged

[minor] Recursive to/from_dict #1602

merged 4 commits into from
Aug 19, 2024

Conversation

pmrv
Copy link
Contributor

@pmrv pmrv commented Aug 16, 2024

Adds a create_from_dict function that takes the output of HasDict.to_dict and turns it into a live object, similar to our earlier to_object() method on ProjectHDFio.
Adds an instantiate class method to HasDict to support this, this allows HasDictfromHDF to work and will be useful for dataclasses in the future. HasDict.to_dict now goes over the contents of what is returned from _to_dict and automatically converts any HasDict/HasHDF objects it finds. I haven't used this in downstream code yet to keep the change small, but in principle this will allow GenericJob/DataContainer to stop calling to_dict on their children explicitly and let the generic interface handle it.

The rest of the changes are renaming everything to _from_dict/_to_dict and normalizing the argument name to obj_dict.

This allows to store "abitrary" objects again in the input/outputs of TemplateJobs.

Separate PR for easier review. I will add extra unit tests, but it worked already in my local notebook tests.

Adds a create_from_dict function that takes the output of
HasDict.to_dict and turns it into a live object, similar to our earlier
to_object() method on ProjectHDFio.
Adds an instantiate class method to HasDict to support this, this allows
HasDictfromHDF to work and will be useful for dataclasses in the future.
HasDict.to_dict now goes over the contents of what is returned from
_to_dict and automatically converts any HasDict/HasHDF objects it finds.
I haven't used this in downstream code yet to keep the change small, but
in principle this will allow GenericJob/DataContainer to stop calling
to_dict on their children explicitly and let the generic interface
handle it.

The rest of the changes are renaming everything to _from_dict/_to_dict
and normalizing the argument name to obj_dict.
@pmrv pmrv added enhancement New feature or request integration Start the integration tests with pyiron_atomistics/contrib for this PR minor add functionality in a backward compatible manner labels Aug 16, 2024
@pmrv pmrv requested a review from jan-janssen August 16, 2024 15:21
@github-actions github-actions bot changed the title Recursive to/from_dict [minor] Recursive to/from_dict Aug 16, 2024
Comment on lines +1182 to +1183
if "executable" in obj_dict.keys() and obj_dict["executable"] is not None:
self._executable = obj_dict["executable"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit surprised about this part, when is the dictionary of the executable converted to the executable object? Is this already happening when reading the executable from the HDF5 file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this new setup HasDict.from_dict converts nested objects back from their dictionary form before the implementations HasDict._from_dict is called. So by the time GenericJob._from_dict runs and looks for the executable Executable._from_dict already ran.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle the reverse also works, i.e. it's not always necessary anymore to call to_dict on ones children that are returned in the obj_dict.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to document this at least in the HasDict class, so developers understand how to derive their own classes from HasDict and that they only have to implement the _to_dict() and _from_dict() interface. Finally, I am wondering if it makes sense to overload the __getstate__() and __setstate__() function in the HasDict class. Then we can slowly transition from calling to_dict() and from_dict() to using the __getstate__() and __setstate__() interface to reload the pyiron objects. This would simplify the interface for developers who just want to integrate a new code. As long as their classes can be pickled, then we can store their classes in our jobs. If they want to optimise the performance and benefit from the hierarchical nature of the HDF5 file, then they can overload the __getstate__() and __setstate__() method, for example by attaching a dataclass to their class which they use for data storage or by attaching a data container.

Comment on lines +71 to +78
def load(inner_dict):
if not isinstance(inner_dict, dict):
return inner_dict
if not all(
k in inner_dict for k in ("NAME", "TYPE", "OBJECT", "DICT_VERSION")
):
return {k: load(v) for k, v in inner_dict.items()}
return create_from_dict(inner_dict)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I understand that the conversion is handled here, still I am wondering if this is what we want. I thought the intention was to get towards using __getstate__() and __setstate__(). Now with the executable and the server object using datacalsses for data storage internally, I was hoping that we could use a simple dictionary representation there where the class is stored by storing the attached dataclass.

Comment on lines 115 to 121
for k, v in self._to_dict().items():
if isinstance(v, HasDict):
child_dict[k] = v.to_dict()
elif isinstance(v, HasHDF):
child_dict[k] = HasDictfromHDF.to_dict(v)
else:
data_dict[k] = v
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify this, this part currently does not work for the Atoms object in pyiron_atomistics, as it is neither derived from HasDict nor from HasHDF, correct? It only works when the Atoms object was stored as part of a DataContainer, correct?

@pmrv pmrv merged commit 013a21b into hasdict Aug 19, 2024
22 of 26 checks passed
@pmrv pmrv deleted the hasdictrec branch August 19, 2024 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request integration Start the integration tests with pyiron_atomistics/contrib for this PR minor add functionality in a backward compatible manner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants