Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement functionality to immigrate already completed CalcJobs #3893

Closed

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Apr 5, 2020

Fixes #1892

When people start using AiiDA they typically already have many
calculation jobs completed without the use of AiiDA and they wish to
import these somehow, such that they can be included in the provenance
graph along with the future calculations they will run through AiiDA.

This concept was originally implemented for the PwCalculation in the
aiida-quantumespresso plugin and worked, but the approach required a
separate CalcJob implementation for each existing CalcJob class that
one might want to import.

Here we implement a generic mechanism directly in aiida-core that will
allow any CalcJob implementation to "immigrate" already completed
jobs. The calculation job is launched just as one would launch a
normal one through AiiDA, except one additional input is passed: a
RemoteData instance under the name remote_folder that contains the
output files of the completed calculation. The naming is chosen on
purpose to be the same as the RemoteData that is normally created by
the engine during a normal calculation job run.

When the engine detects this input, instead of going through the normal
sequence of transport tasks, it simply performs the presubmit and then
goes straight to the "retrieve" step. Here the engine will retrieve the
files from the provided RemoteData as if they had just been produced
during an actual run. In this way, the process is executed almost
exactly in the same way as a normal run, except the job itself is not
actually executed.

When people start using AiiDA they typically already have many
calculation jobs completed without the use of AiiDA and they wish to
import these somehow, such that they can be included in the provenance
graph along with the future calculations they will run through AiiDA.

This concept was originally implemented for the `PwCalculation` in the
`aiida-quantumespresso` plugin and worked, but the approach required a
separate `CalcJob` implementation for each existing `CalcJob` class that
one might want to import.

Here we implement a generic mechanism directly in `aiida-core` that will
allow any `CalcJob` implementation to "immigrate" already completed
jobs. The calculation job is launched just as one would launch a
normal one through AiiDA, except one additional input is passed: a
`RemoteData` instance under the name `remote_folder` that contains the
output files of the completed calculation. The naming is chosen on
purpose to be the same as the `RemoteData` that is normally created by
the engine during a normal calculation job run.

When the engine detects this input, instead of going through the normal
sequence of transport tasks, it simply performs the presubmit and then
goes straight to the "retrieve" step. Here the engine will retrieve the
files from the provided `RemoteData` as if they had just been produced
during an actual run. In this way, the process is executed almost
exactly in the same way as a normal run, except the job itself is not
actually executed.
@sphuber
Copy link
Contributor Author

sphuber commented Apr 5, 2020

Remaining open design questions:

  • Set an attribute/extra to indicate the calculation job has been immigrated
  • Should the metadata.computer be required? Most likely this should be paired with the absolute path of the folder containing the outputs anyway, so the most logical approach would be to reuse the RemoteData node type for this. In this way the metadata.computer is indirectly required.
  • Should the code input be required?
  • Should the code input have a different type, e.g. ImmigratedCode which is a subclass of Code.
  • How to deal with CalcJob implementations that specify or require more than one Code as input
  • Current solution only supports importing files from a folder on localhost. Should we add support for any other Computer as well? In this case, it would have to potentially go through the TransportQueue with the additional complexity that that requires.
  • Allow to not specify a code nor computer (and resources)? Currently they are required for all CalcJobs, but for immigrated jobs it does not make much sense. However, if we make them non-required for immigrations, the provenance of those processes will be "less" compared to normal calculation job runs. On the other hand, if we keep them required, users can simply pass any code that matches the required plugin on any computer. There can be no check that it corresponds to the "real" code that produced the results. So the information may actually be incorrect. Maybe it is better to have no information than incorrect information?
  • Should we define an official immigrant entry point group and Immigrant base class that can be registered there. A plugin package can implement this base class for a specific CalcJob implementation and through this class can provide additional custom functionality.
  • Should the RemoteData be attached as an input and/or output to the immigrated CalcJobNode

@sphuber
Copy link
Contributor Author

sphuber commented Apr 5, 2020

The nice part of this approach is that the manner of launching an immigration job is almost identical to running a normal job. One only has to provide one extra input. This approach also makes the functionality available for all CalcJob implementations without requiring the implementation of new wrapper classes. The one step that will probably require some tooling from the various plugins is to build up the inputs dictionary based on some existing input files. Essentially this is the reverse action of what the prepare_for_submission does. This functionality can of course be provided with easy to use utility functions by the plugin itself. @chrisjsewell has one implemented in aiida-quantumespresso for the parsing of the inputs of a pw.x calculation.

To give you an idea of what the code would look like to immigrate a completed ArithmeticAddCalculation, see the following snippet:

from aiida import orm
from aiida.engine import run

In [1]: remote_data = orm.RemoteData('/home/sph/code/aiida/env/dev/aiida-core/arithmetic_add', computer=load_computer('localhost'))
        inputs = {
            'remote_folder': remote_data,
            'code': load_code('add@localhost'),
            'x': orm.Int(1),
            'y': orm.Int(2),
            'metadata': {
                'options': {
                    'resources': {
                        'num_machines': 1,
                     }
                 }
             }
        }

        ArithmeticAddCalculation = CalculationFactory('arithmetic.add')
        results, node = run.get_node(ArithmeticAddCalculation, **inputs)

In [2]: node.is_finished_ok
Out[2]: True

In [3]: node.get_outgoing().all()
Out[3]: 
[LinkTriple(node=<Int: uuid: 9425463e-545f-49f0-acb8-c7c04ad1c0d7 (pk: 1123) value: 3>, link_type=<LinkType.CREATE: 'create'>, link_label='sum'),
 LinkTriple(node=<FolderData: uuid: 964aaff8-b3bd-4d29-b80c-8e92f0e17e69 (pk: 1122)>, link_type=<LinkType.CREATE: 'create'>, link_label='retrieved')]

In [4]: node.attributes
Out[4]: 
{'sealed': True,
 'version': {'core': '1.1.1', 'plugin': '1.1.1'},
 'withmpi': False,
 'resources': {'num_machines': 1},
 'append_text': '',
 'exit_status': 0,
 'parser_name': 'arithmetic.add',
 'prepend_text': '',
 'process_label': 'ArithmeticAddCalculation',
 'process_state': 'finished',
 'retrieve_list': ['aiida.out',
  '_scheduler-stdout.txt',
  '_scheduler-stderr.txt'],
 'input_filename': 'aiida.in',
 'remote_workdir': '/home/sph/code/aiida/env/dev/aiida-core/arithmetic_add',
 'output_filename': 'aiida.out',
 'scheduler_stderr': '_scheduler-stderr.txt',
 'scheduler_stdout': '_scheduler-stdout.txt',
 'mpirun_extra_params': [],
 'environment_variables': {},
 'import_sys_environment': True,
 'retrieve_temporary_list': [],
 'custom_scheduler_commands': ''}

@codecov
Copy link

codecov bot commented Apr 5, 2020

Codecov Report

Merging #3893 into develop will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #3893      +/-   ##
===========================================
+ Coverage    78.00%   78.02%   +0.02%     
===========================================
  Files          457      456       -1     
  Lines        33830    33846      +16     
===========================================
+ Hits         26390    26410      +20     
+ Misses        7440     7436       -4     
Flag Coverage Δ
#django 70.08% <100.00%> (+0.02%) ⬆️
#sqlalchemy 70.89% <100.00%> (+0.01%) ⬆️
Impacted Files Coverage Δ
aiida/engine/launch.py 97.50% <100.00%> (ø)
aiida/engine/processes/calcjobs/calcjob.py 83.45% <100.00%> (+1.18%) ⬆️
aiida/engine/processes/calcjobs/__init__.py
aiida/transports/plugins/local.py 80.46% <0.00%> (+0.25%) ⬆️
aiida/engine/daemon/execmanager.py 61.27% <0.00%> (+1.12%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5c0d86f...6d956c9. Read the comment docs.

@chrisjsewell
Copy link
Member

chrisjsewell commented Apr 6, 2020

Yep definitely +1 for this, looks generally good to me

Set an attribute/extra to indicate the calculation job has been immigrated

I would say an attribute, as its an important caveat of the calculation.

Current solution only supports importing files from a folder on localhost. Should we add support for any other Computer as well?

I think this would be necessary if for example you wanted to run further "properties" calculations that require input of the parent_folder, e.g. you immigrate some pw.x computation then want to use them to run projwfc.x computations on the same HPC.

Allow to not specify a code nor computer ... On the other hand, if we keep them required, users can simply pass any code that matches the required plugin on any computer. There can be no check that it corresponds to the "real" code that produced the results.

But then isn't this the case anyway for the correspondence between the inputs and outputs. I think its good to have a record of the code and resources; with the caveat this is an immigrated calculation and so the information may not be correct, so don't see this option as a necessity

@espenfl
Copy link
Contributor

espenfl commented Apr 21, 2020

Great to see this in aiida_core. Thanks. We will move our immigrator in aiida-vasp to use this intrinsic one when it is merged.

A few comments:

* Set an attribute/extra to indicate the calculation job has been immigrated

Yes, this would be convenient. Having some easy way to know if it has been immigrated would be nice.

* Should the `metadata.computer` be required? Most likely this should be paired with the absolute path of the folder containing the outputs anyway, so the most logical approach would be to reuse the `RemoteData` node type for this. In this way the `metadata.computer` is indirectly required.

I do not think so as the real (provenance important) information in this might be lost at the time of immigration.

* Should the `code` input be required?

Same as above. We might not know what code was used.

* Should the `code` input have a different type, e.g. `ImmigratedCode` which is a subclass of `Code`.

One possible way is to require the input and then if we do not know what code was used we have to state this explicitly (or that this is the default).

* How to deal with `CalcJob` implementations that specify or require more than one `Code` as input

This is maybe to be considered more as a workchain. And then is the question, is this something we should support? Sounds complicated to implement.

* Current solution only supports importing files from a folder on localhost. Should we add support for any other `Computer` as well? In this case, it would have to potentially go through the `TransportQueue` with the additional complexity that that requires.

This would certainly be nice. Maybe as a future feature request?

* Allow to not specify a code nor computer (and resources)? Currently they are required for all `CalcJobs`, but for immigrated jobs it does not make much sense. However, if we make them non-required for immigrations, the provenance of those processes will be "less" compared to normal calculation job runs. On the other hand, if we keep them required, users can simply pass any code that matches the required plugin on any computer. There can be no check that it corresponds to the "real" code that produced the results. So the information may actually be incorrect. Maybe it is better to have no information than incorrect information?

I think there is anyway no way for us to guarantee that whatever is supplied is in fact compliant with the real provenance so given that, I would say this should be optional. A user should be able to construct it (even though, this might not be advisable), but should maybe not be the default.

* Should we define an official immigrant entry point group and `Immigrant` base class that can be registered there. A plugin package can implement this base class for a specific `CalcJob` implementation and through this class can provide additional custom functionality.

Sounds convenient, but again, maybe a future feature request?

* Should the `RemoteData` be attached as an input and/or output to the immigrated `CalcJobNode`

I think this should be placed as an output.

Otherwise: Immigrating results can quickly pollute what is believed to be strict provenance control downstream. Having a clear tag would help. Should results that have evolved from an initial immigrated result/input also be tagged?

@sphuber
Copy link
Contributor Author

sphuber commented Apr 21, 2020

Thanks for the comments @espenfl . Note that I wrote this PR before I wrote the official AEP which is more detailed in its discussion of the various questions and I actually already give my preference concerning open questions I had presented here. Espcially:

  • Mark immigrated calculations: absolutely, most likely through an attribute on the CalcJobNode
  • Support output folder on remote machine: yes absolutely, should already be in first implementation
  • Input RemoteData as output: I am pretty sure this is what we should do, see reasoning details in AEP

For me the biggest open questions is really:

  • Validation of computer and code. Still not sure if making them optional is a good idea, but of course we cannot check the validity. I just think they should be there and we instruct people how they can define these after-the-fact to represent the actual codes as best as possible
  • Additional utility infrastructure: this is something I would definitely not include in the beginning and leave as an open feature request. If we find there is a lot of need for this from plugins and they have similar methods to preparing immigrant inputs, we can generalize this and put it in aiida-core, a bit like the process of the BaseRestartWorkChain I would say

@espenfl
Copy link
Contributor

espenfl commented Apr 21, 2020

@sphuber Good. In fact it was my intention to put comments in the AEP and not here. Will quickly read the AEP now and add some comments there if need be.

@sphuber
Copy link
Contributor Author

sphuber commented Sep 16, 2020

Closing this and will make a new one once the AEP is finished and accepted and will adapt the code accordingly

@sphuber sphuber closed this Sep 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ImmigrantJobProcess
3 participants