-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Engine: implement functionality to import completed CalcJobs
#5086
Engine: implement functionality to import completed CalcJobs
#5086
Conversation
Thanks @sphuber! I'm not worried about the command line - I think each plugin can decide to create a custom CLI command that explicitly declares which kwargs are needed, and then calls Finally: the name was proposed by Eric Hontz, who first designed it for Quantum ESPRESSO and contributed it (for QE) to AiiDA (AiiDA core at the time, then moved into aiida-quantumespresso), back in 2014-2015. |
ef97e1f
to
7175a85
Compare
Note that I have updated the AEP to change the naming officially from "immigrator" to "importer". I am waiting with adapting this implementation until we have discussed and approved this decision in order to prevent unnecessary work should it be rejected. |
3957d62
to
db45fb1
Compare
Codecov Report
@@ Coverage Diff @@
## develop #5086 +/- ##
===========================================
+ Coverage 80.84% 80.88% +0.05%
===========================================
Files 534 536 +2
Lines 36974 37057 +83
===========================================
+ Hits 29889 29971 +82
- Misses 7085 7086 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
CalcJobs
CalcJobs
When people start using AiiDA they typically already have many calculation jobs completed without the use of AiiDA and they wish to import these somehow, such that they can be included in the provenance graph along with the future calculations they will run through AiiDA. This concept was originally implemented for the `PwCalculation` in the `aiida-quantumespresso` plugin and worked, but the approach required a separate `CalcJob` implementation for each existing `CalcJob` class that one might want to import. Here we implement a generic mechanism directly in `aiida-core` that will allow any `CalcJob` implementation to import already completed jobs. The calculation job is launched just as one would launch a normal one through AiiDA, except one additional input is passed: a `RemoteData` instance under the name `remote_folder` that contains the output files of the completed calculation. The naming is chosen on purpose to be the same as the `RemoteData` that is normally created by the engine during a normal calculation job run. When the engine detects this input, instead of going through the normal sequence of transport tasks, it simply performs the presubmit and then goes straight to the "retrieve" step. Here the engine will retrieve the files from the provided `RemoteData` as if they had just been produced during an actual run. In this way, the process is executed almost exactly in the same way as a normal run, except the job itself is not actually executed.
The `CalcJobImporter` class is added, which defines a single abstract staticmethod `parse_remote_data`. The idea is that plugins can define an importer for a `CalcJob` implementation and implement this method. The method takes a `RemoteData` node that points to a path on the associated computer that contains the input and output files of a calculation that has been run outside of AiiDA, but by an executable that is normally run with this particular `CalcJob`. The `parse_remote_data` implementation should read the input files found in the remote data and parse their content into the input nodes that when used to launch the calculation job, would result in similar input files. These inputs, including the `RemoteData` as the `remote_folder` input, can then be used to run an instance of this particular `CalcJob`. The engine will recognize the `remote_folder` input, signalling an import job, and instead of running a normal job that creates the input files on the remote before submitting it to the scheduler, it passes straight to the retrieve step. This will retrieve the files from the `RemoteData` as if it would have been created by the job itself. If a parsers was defined in the inputs, the contents are parsed and the returned output nodes are attached. The `CalcJobImporter` can be loaded through its entry point name using the `CalcJobImporterFactory`, just like the entry points of all other entry point groups have their associated factory. As a shortcut, the `CalcJob` class, provides the `get_importer` class method which will attempt to load a `CalcJobImporter` class with the exact same entry point. Alternatively, the caller can specify the desired entry point name should it not correspond to that of the `CalcJob` class. To test the functionality, a `CalcJobImporter` is implemented for the `ArithmeticAddCalculation` class.
db45fb1
to
8910074
Compare
Fixes #1892
Implementation of this AEP.
Example of usage:
This is as standardized as it can be as the
parse_remote_data
for each importer implementation will probably need custom keyword arguments. For the Python API, we could provide a simple wrapping function, like e.g.:However, due to the arbitrary keyword arguments that need to be supported for the importer, it will be tricky to turn this into a CLI command. Unless we implement some plugin system for that to dynamically define the options based on the spec of the importer, such as we do for the transport CLI.
I think it would be great if we can try to release this with v2.0 as it is very useful feature, that some users having been waiting a long time for. Of course, this would require that plugin developers have some time to implement the importer, so the sooner we finish this implementation, the sooner we can help them prepare their plugins.
P.S.: this should probably be addressed in the AEP (and I will) but we should maybe discuss the naming of "Immigrator" and "immigrating". I have merely kept this because the original concept was implemented forPwCalculation
and was called thePwImmigrant
. I am not sure what the reasoning behind this naming was since I wasn't there (@giovannipizzi maybe you know) but maybe "importer/importing" would be better. It sounds more neutral and technically it is also more correct. Immigrating, in my understanding at least, is more something moving as seen from the point of view of the place that it is leaving, not so much the place it is moving towards. That being said, I am not sure if naming this also importing may cause confusion with importing of archives. As said, I will probably add this to the discussion on the AEP, but thought I would mention it here as well.