-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add two classes for HasHDF/HasDict compat #1364
Conversation
@jan-janssen and I discussed something like this last week. The problem statement is essentially: We want everything as to_dict, but don't want to convert all to_hdf at once. Some code also still expects to be able to call to_hdf and again we don't want to change those all at one as well. So this introduces two mixins that can translate implementers of HasHDF and HasDict between each other. If we use them we can then slowly fade out to_hdf implementation by manually converting them to HasHDF over time. Once all objects are converted, we deprecate HasHDFfromDict.to_hdf to show us what we need to update in the surrounding code, and when that is done we remove both classes again. Tests will need to follow and it's not yet used by any objects, but I want to discuss this plan and the design first. |
|
pyiron_base/interfaces/has_dict.py
Outdated
pass | ||
|
||
def _type_to_dict(self): | ||
def to_dict(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def to_dict(self): | |
@HasDict.to_dict | |
def to_dict(self): |
What about this so people don't have to put _...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do literature search what is common in the community? Some reference to OOP best practices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I poked around for this, but I found nothing conclusive. The best I got was this SO thread from eight years ago; they lay out the same choices we're already aware of (OP's default was to do it the way you have here, if that's anything), but the question is left unanswered as "well, that's opinion". Personally, I find the @abstractmethod
flag on the private method of the same name the clearest...but I agree it's opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to stick to the current convention then. Since this is a private class used internally anyway, it should be fairly easy to change in the future anyway, in case we decide differently.
I've added a test in form of the @jan-janssen I'd prefer to merge this in before proceeding with your I've noticed as well when running the tests locally that there is one location where we create a job with |
The only pull requests which use |
The reason for the unit tests hanging is this line pyiron_base/tests/job/test_worker.py Line 26 in 0bdde9b
|
What I don't understand is why this is suddenly a problem as this PR doesn't touch worker code at all. |
Seems this only breaks, once I touched |
@jan-janssen I can't seem to fix this quickly today, so go ahead with the other PRs. |
621fc12
to
ad9a086
Compare
def to_dict(self): | ||
executable_dict = self._type_to_dict() | ||
executable_storage_dict = self.storage._type_to_dict() | ||
executable_storage_dict["READ_ONLY"] = self.storage._read_only | ||
executable_storage_dict.update(self.storage.to_builtin()) | ||
executable_dict["executable"] = executable_storage_dict | ||
return executable_dict | ||
def _to_dict(self): | ||
full_dict = self.storage.to_dict() | ||
data = full_dict.pop("data") | ||
return {"executable": {**full_dict, **data}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle, I like the idea with the DummyHDFio
. But for classes like the Executable
, I am not even sure if they need a DataContainer
. I think the goal is to get to pickle
-able classes in the future, with the to_dict()
and from_dict()
method being part of the transition. So I feel switching to _to_dict()
here goes in the wrong direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DummyHDFio is not used for the executable or the data container, so I'm not sure what you mean. I agree that in the future Executable
could be a simple dataclass or so, but again here I just modified it, so it conforms to the new _to_dict.
def to_dict(self): | ||
server_dict = self._type_to_dict() | ||
server_dict.update( | ||
{ | ||
"user": self._user, | ||
"host": self._host, | ||
"run_mode": self.run_mode.mode, | ||
"queue": self.queue, | ||
"qid": self._queue_id, | ||
"cores": self.cores, | ||
"threads": self.threads, | ||
"new_h5": self.new_hdf, | ||
"structure_id": self.structure_id, | ||
"run_time": self.run_time, | ||
"memory_limit": self.memory_limit, | ||
"accept_crash": self.accept_crash, | ||
} | ||
) | ||
def _to_dict(self): | ||
# server_dict = self._type_to_dict() | ||
server_dict = { | ||
"user": self._user, | ||
"host": self._host, | ||
"run_mode": self.run_mode.mode, | ||
"queue": self.queue, | ||
"qid": self._queue_id, | ||
"cores": self.cores, | ||
"threads": self.threads, | ||
"new_h5": self.new_hdf, | ||
"structure_id": self.structure_id, | ||
"run_time": self.run_time, | ||
"memory_limit": self.memory_limit, | ||
"accept_crash": self.accept_crash, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In analogy to the Executable
class, does the Server
class need the transition from to_dict()
to _to_dict()
or can we modify the class in a way that the __getstate__()
returns the same dictionary as to_dict()
. Again my focus would be on simplifying the class hierarchy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both Server and Executable derived from HasDict
before this PR. I'm ok with changing this in the future, but here I just did the minimal changes to keep existing sub classes working with the changes I did to HasDict
(which is just adding the type info automatically to the results of to_dict).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add this to the bucket list. I would like to replace both with data classes sooner rather than later then we can potentially also reuse them in the functional approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
contrib tests fail because its dependencies cannot currently be installed concurrently with the ones from atomistics. |
This is in a good state now, but I'm still trying to add recursive to_dict/from_dict (in a separate PR). |
Adds a create_from_dict function that takes the output of HasDict.to_dict and turns it into a live object, similar to our earlier to_object() method on ProjectHDFio. Adds an instantiate class method to HasDict to support this, this allows HasDictfromHDF to work and will be useful for dataclasses in the future. HasDict.to_dict now goes over the contents of what is returned from _to_dict and automatically converts any HasDict/HasHDF objects it finds. I haven't used this in downstream code yet to keep the change small, but in principle this will allow GenericJob/DataContainer to stop calling to_dict on their children explicitly and let the generic interface handle it. The rest of the changes are renaming everything to _from_dict/_to_dict and normalizing the argument name to obj_dict.
@pmrv Can this be merged? Or should we wait for ta new |
Both this and #1602 dont need the h5io fix, thats just something I found while trying to get the jobs read their dicts from HDF recursively. |
for more information, see https://pre-commit.ci
[minor] Recursive to/from_dict
PR #1364 changed saved the contents of the container in a data subgroup, but that breaks backwards compat and is not actually necessary for the TemplateJob, so revert it here.
* Revert changes to DataContainer storage format PR #1364 changed saved the contents of the container in a data subgroup, but that breaks backwards compat and is not actually necessary for the TemplateJob, so revert it here. * Handle list in to_dict --------- Co-authored-by: Marvin Poul <poul@mpie.de>
and outline transition path in docstrings there.