Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add two classes for HasHDF/HasDict compat #1364

Merged
merged 27 commits into from
Aug 19, 2024
Merged

Add two classes for HasHDF/HasDict compat #1364

merged 27 commits into from
Aug 19, 2024

Conversation

pmrv
Copy link
Contributor

@pmrv pmrv commented Mar 2, 2024

and outline transition path in docstrings there.

@pmrv pmrv added the enhancement New feature or request label Mar 2, 2024
@pmrv
Copy link
Contributor Author

pmrv commented Mar 2, 2024

@jan-janssen and I discussed something like this last week. The problem statement is essentially: We want everything as to_dict, but don't want to convert all to_hdf at once. Some code also still expects to be able to call to_hdf and again we don't want to change those all at one as well. So this introduces two mixins that can translate implementers of HasHDF and HasDict between each other. If we use them we can then slowly fade out to_hdf implementation by manually converting them to HasHDF over time. Once all objects are converted, we deprecate HasHDFfromDict.to_hdf to show us what we need to update in the surrounding code, and when that is done we remove both classes again.

Tests will need to follow and it's not yet used by any objects, but I want to discuss this plan and the design first.

@jan-janssen jan-janssen marked this pull request as draft March 2, 2024 10:02
@pmrv
Copy link
Contributor Author

pmrv commented Mar 3, 2024

Error: The action 'Test' has timed out after 5 minutes.

pass

def _type_to_dict(self):
def to_dict(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def to_dict(self):
@HasDict.to_dict
def to_dict(self):

What about this so people don't have to put _...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do literature search what is common in the community? Some reference to OOP best practices?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I poked around for this, but I found nothing conclusive. The best I got was this SO thread from eight years ago; they lay out the same choices we're already aware of (OP's default was to do it the way you have here, if that's anything), but the question is left unanswered as "well, that's opinion". Personally, I find the @abstractmethod flag on the private method of the same name the clearest...but I agree it's opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to stick to the current convention then. Since this is a private class used internally anyway, it should be fairly easy to change in the future anyway, in case we decide differently.

@pmrv
Copy link
Contributor Author

pmrv commented Mar 19, 2024

I've added a test in form of the DataContainer implementation of HasDictFromHDF and adapted DummyHDFio to work with this.

@jan-janssen I'd prefer to merge this in before proceeding with your to_dict series of PRs. Changes needed there to use this would hopefully be minimal.

I've noticed as well when running the tests locally that there is one location where we create a job with run_mode=="manual" that hangs for a long time before the tests proceed. This may be the issue of the timeout problems we are seeing with the CI, but I couldn't locate the test case yet. I have a feeling this may be related to the files changes with the decompression that we did recently. I have also seen a bunch unrelated test failures locally.

@pmrv pmrv marked this pull request as ready for review March 19, 2024 16:11
@jan-janssen
Copy link
Member

@jan-janssen I'd prefer to merge this in before proceeding with your to_dict series of PRs. Changes needed there to use this would hopefully be minimal.

The only pull requests which use HasDict are #1380 and #1377 . As those are required for pyiron/pyiron_atomistics#1356 I would prefer to merge those first before we change the interface.

@pmrv
Copy link
Contributor Author

pmrv commented Mar 25, 2024

The reason for the unit tests hanging is this line

self.sub_project.wait_for_jobs()

@pmrv
Copy link
Contributor Author

pmrv commented Mar 25, 2024

The reason for the unit tests hanging is this line

self.sub_project.wait_for_jobs()

What I don't understand is why this is suddenly a problem as this PR doesn't touch worker code at all.

@pmrv
Copy link
Contributor Author

pmrv commented Mar 26, 2024

Seems this only breaks, once I touched GenericParameters.

@pmrv
Copy link
Contributor Author

pmrv commented Mar 26, 2024

@jan-janssen I can't seem to fix this quickly today, so go ahead with the other PRs.

@pmrv pmrv force-pushed the hasdict branch 2 times, most recently from 621fc12 to ad9a086 Compare July 29, 2024 18:11
@pmrv pmrv mentioned this pull request Jul 29, 2024
@pmrv pmrv added integration Start the integration tests with pyiron_atomistics/contrib for this PR format_black reformat the code using the black standard labels Aug 1, 2024
@pmrv pmrv marked this pull request as ready for review August 1, 2024 15:47
Comment on lines 206 to 209
def to_dict(self):
executable_dict = self._type_to_dict()
executable_storage_dict = self.storage._type_to_dict()
executable_storage_dict["READ_ONLY"] = self.storage._read_only
executable_storage_dict.update(self.storage.to_builtin())
executable_dict["executable"] = executable_storage_dict
return executable_dict
def _to_dict(self):
full_dict = self.storage.to_dict()
data = full_dict.pop("data")
return {"executable": {**full_dict, **data}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, I like the idea with the DummyHDFio. But for classes like the Executable, I am not even sure if they need a DataContainer. I think the goal is to get to pickle-able classes in the future, with the to_dict() and from_dict() method being part of the transition. So I feel switching to _to_dict() here goes in the wrong direction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DummyHDFio is not used for the executable or the data container, so I'm not sure what you mean. I agree that in the future Executable could be a simple dataclass or so, but again here I just modified it, so it conforms to the new _to_dict.

Comment on lines 605 to 620
def to_dict(self):
server_dict = self._type_to_dict()
server_dict.update(
{
"user": self._user,
"host": self._host,
"run_mode": self.run_mode.mode,
"queue": self.queue,
"qid": self._queue_id,
"cores": self.cores,
"threads": self.threads,
"new_h5": self.new_hdf,
"structure_id": self.structure_id,
"run_time": self.run_time,
"memory_limit": self.memory_limit,
"accept_crash": self.accept_crash,
}
)
def _to_dict(self):
# server_dict = self._type_to_dict()
server_dict = {
"user": self._user,
"host": self._host,
"run_mode": self.run_mode.mode,
"queue": self.queue,
"qid": self._queue_id,
"cores": self.cores,
"threads": self.threads,
"new_h5": self.new_hdf,
"structure_id": self.structure_id,
"run_time": self.run_time,
"memory_limit": self.memory_limit,
"accept_crash": self.accept_crash,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In analogy to the Executable class, does the Server class need the transition from to_dict() to _to_dict() or can we modify the class in a way that the __getstate__() returns the same dictionary as to_dict(). Again my focus would be on simplifying the class hierarchy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both Server and Executable derived from HasDict before this PR. I'm ok with changing this in the future, but here I just did the minimal changes to keep existing sub classes working with the changes I did to HasDict (which is just adding the type info automatically to the results of to_dict).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add this to the bucket list. I would like to replace both with data classes sooner rather than later then we can potentially also reuse them in the functional approach.

Copy link
Member

@jan-janssen jan-janssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@jan-janssen
Copy link
Member

@pmrv As #1578 and #1580 are now merged - can you update this pull request, so we can include it in the next release? I think the three pull requests together streamline the to_dict() and from_dict() interface, so I would like to release them as part of pyiron_base=0.10.0.

@pmrv
Copy link
Contributor Author

pmrv commented Aug 16, 2024

contrib tests fail because its dependencies cannot currently be installed concurrently with the ones from atomistics.

@pmrv
Copy link
Contributor Author

pmrv commented Aug 16, 2024

This is in a good state now, but I'm still trying to add recursive to_dict/from_dict (in a separate PR).

Adds a create_from_dict function that takes the output of
HasDict.to_dict and turns it into a live object, similar to our earlier
to_object() method on ProjectHDFio.
Adds an instantiate class method to HasDict to support this, this allows
HasDictfromHDF to work and will be useful for dataclasses in the future.
HasDict.to_dict now goes over the contents of what is returned from
_to_dict and automatically converts any HasDict/HasHDF objects it finds.
I haven't used this in downstream code yet to keep the change small, but
in principle this will allow GenericJob/DataContainer to stop calling
to_dict on their children explicitly and let the generic interface
handle it.

The rest of the changes are renaming everything to _from_dict/_to_dict
and normalizing the argument name to obj_dict.
@jan-janssen
Copy link
Member

@pmrv Can this be merged? Or should we wait for ta new h5io_browser version to include h5io/h5io_browser#61 ?

@pmrv
Copy link
Contributor Author

pmrv commented Aug 17, 2024

Both this and #1602 dont need the h5io fix, thats just something I found while trying to get the jobs read their dicts from HDF recursively.

@pmrv pmrv merged commit 49e4ffd into main Aug 19, 2024
23 of 27 checks passed
@pmrv pmrv deleted the hasdict branch August 19, 2024 19:57
pmrv added a commit that referenced this pull request Aug 23, 2024
PR #1364 changed saved the contents of the container in a data subgroup,
but that breaks backwards compat and is not actually necessary for the
TemplateJob, so revert it here.
jan-janssen added a commit that referenced this pull request Aug 23, 2024
* Revert changes to DataContainer storage format

PR #1364 changed saved the contents of the container in a data subgroup,
but that breaks backwards compat and is not actually necessary for the
TemplateJob, so revert it here.

* Handle list in to_dict

---------

Co-authored-by: Marvin Poul <poul@mpie.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request format_black reformat the code using the black standard integration Start the integration tests with pyiron_atomistics/contrib for this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants