-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement functionality to immigrate already completed CalcJobs
#3893
Conversation
When people start using AiiDA they typically already have many calculation jobs completed without the use of AiiDA and they wish to import these somehow, such that they can be included in the provenance graph along with the future calculations they will run through AiiDA. This concept was originally implemented for the `PwCalculation` in the `aiida-quantumespresso` plugin and worked, but the approach required a separate `CalcJob` implementation for each existing `CalcJob` class that one might want to import. Here we implement a generic mechanism directly in `aiida-core` that will allow any `CalcJob` implementation to "immigrate" already completed jobs. The calculation job is launched just as one would launch a normal one through AiiDA, except one additional input is passed: a `RemoteData` instance under the name `remote_folder` that contains the output files of the completed calculation. The naming is chosen on purpose to be the same as the `RemoteData` that is normally created by the engine during a normal calculation job run. When the engine detects this input, instead of going through the normal sequence of transport tasks, it simply performs the presubmit and then goes straight to the "retrieve" step. Here the engine will retrieve the files from the provided `RemoteData` as if they had just been produced during an actual run. In this way, the process is executed almost exactly in the same way as a normal run, except the job itself is not actually executed.
Remaining open design questions:
|
The nice part of this approach is that the manner of launching an immigration job is almost identical to running a normal job. One only has to provide one extra input. This approach also makes the functionality available for all To give you an idea of what the code would look like to immigrate a completed
|
Codecov Report
@@ Coverage Diff @@
## develop #3893 +/- ##
===========================================
+ Coverage 78.00% 78.02% +0.02%
===========================================
Files 457 456 -1
Lines 33830 33846 +16
===========================================
+ Hits 26390 26410 +20
+ Misses 7440 7436 -4
Continue to review full report at Codecov.
|
Yep definitely +1 for this, looks generally good to me
I would say an attribute, as its an important caveat of the calculation.
I think this would be necessary if for example you wanted to run further "properties" calculations that require input of the
But then isn't this the case anyway for the correspondence between the inputs and outputs. I think its good to have a record of the code and resources; with the caveat this is an immigrated calculation and so the information may not be correct, so don't see this option as a necessity |
Great to see this in A few comments:
Yes, this would be convenient. Having some easy way to know if it has been immigrated would be nice.
I do not think so as the real (provenance important) information in this might be lost at the time of immigration.
Same as above. We might not know what code was used.
One possible way is to require the input and then if we do not know what code was used we have to state this explicitly (or that this is the default).
This is maybe to be considered more as a workchain. And then is the question, is this something we should support? Sounds complicated to implement.
This would certainly be nice. Maybe as a future feature request?
I think there is anyway no way for us to guarantee that whatever is supplied is in fact compliant with the real provenance so given that, I would say this should be optional. A user should be able to construct it (even though, this might not be advisable), but should maybe not be the default.
Sounds convenient, but again, maybe a future feature request?
I think this should be placed as an output. Otherwise: Immigrating results can quickly pollute what is believed to be strict provenance control downstream. Having a clear tag would help. Should results that have evolved from an initial immigrated result/input also be tagged? |
Thanks for the comments @espenfl . Note that I wrote this PR before I wrote the official AEP which is more detailed in its discussion of the various questions and I actually already give my preference concerning open questions I had presented here. Espcially:
For me the biggest open questions is really:
|
@sphuber Good. In fact it was my intention to put comments in the AEP and not here. Will quickly read the AEP now and add some comments there if need be. |
Closing this and will make a new one once the AEP is finished and accepted and will adapt the code accordingly |
Fixes #1892
When people start using AiiDA they typically already have many
calculation jobs completed without the use of AiiDA and they wish to
import these somehow, such that they can be included in the provenance
graph along with the future calculations they will run through AiiDA.
This concept was originally implemented for the
PwCalculation
in theaiida-quantumespresso
plugin and worked, but the approach required aseparate
CalcJob
implementation for each existingCalcJob
class thatone might want to import.
Here we implement a generic mechanism directly in
aiida-core
that willallow any
CalcJob
implementation to "immigrate" already completedjobs. The calculation job is launched just as one would launch a
normal one through AiiDA, except one additional input is passed: a
RemoteData
instance under the nameremote_folder
that contains theoutput files of the completed calculation. The naming is chosen on
purpose to be the same as the
RemoteData
that is normally created bythe engine during a normal calculation job run.
When the engine detects this input, instead of going through the normal
sequence of transport tasks, it simply performs the presubmit and then
goes straight to the "retrieve" step. Here the engine will retrieve the
files from the provided
RemoteData
as if they had just been producedduring an actual run. In this way, the process is executed almost
exactly in the same way as a normal run, except the job itself is not
actually executed.