Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

♻️ REFACTOR: use pydantic instead of __init__ constructor #10

Merged
merged 2 commits into from
Apr 22, 2023

Conversation

mbercx
Copy link
Member

@mbercx mbercx commented Apr 16, 2023

Fixes #9

Switch to using pydantic to annotate the constructor inputs and add validation for some inputs. This makes the code much cleaner and readable, e.g. for the BaseSubmissionController:

class BaseSubmissionController(BaseModel):
    """Controller to submit a maximum number of processes (workflows or calculations) at a given time.

    This is an abstract base class: you need to subclass it and define the abstract methods.
    """
    group_label: str
    """Label of the group to store the process nodes in."""
    max_concurrent: int
    """Maximum concurrent active processes."""

It's also easy and clear to write validation for the input arguments, e.g. for group_label:

def validate_group_exists(value: str) -> str:
    try:
        orm.Group.collection.get(label=value)
    except NotExistent as exc:
        raise ValueError(f'Group with label `{value}` does not exist.') from exc
    else:
        return value

[...]

    _validate_group_exists = validator('group_label', allow_reuse=True)(validate_group_exists)

(Note that this validation is a change from the previous get_or_create approach. I can revert that, but I think it's the safer approach since silently creating a new group every time an instance of the BaseSubmissionController is made with a non-existing group label seems worse than simply having the user pass an existing group.)

The FromGroupSubmissionController then becomes:

class FromGroupSubmissionController(BaseSubmissionController):  # pylint: disable=abstract-method
    """SubmissionController implementation getting data to submit from a parent group.

    This is (still) an abstract base class: you need to subclass it
    and define the abstract methods.
    """
    parent_group_label: str
    """Label of the parent group from which to construct the process inputs."""

    _validate_group_exists = validator('parent_group_label', allow_reuse=True)(validate_group_exists)

But with the pydantic approach the full signature is shown when using the constructor, it even shows the docstring added for each "field" as you hover over it:

Screenshot 2023-04-17 at 01 09 10

The validation is also quite clear, instead of returning a massive stack trace. E.g. for a little DummyController class I wrote:

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
<ipython-input-7-8c412ef0bd57> in <module>
----> 1 dcontrol = DummyController(
      2     group_label='non-existent',
      3     max_concurrent=20,
      4     sum_list=[(5, 2), (bool, 3)],
      5     code_label='add@localhost'

~/.virtualenvs/aiida-super/lib/python3.8/site-packages/pydantic/main.cpython-38-x86_64-linux-gnu.so in pydantic.main.BaseModel.__init__()

ValidationError: 1 validation error for DummyController
group_label
  Group with label `non-existent` does not exist. (type=value_error)
DummyController code
class DummyController(BaseSubmissionController):
    """Dummy controller for testing purposes."""
    sum_list: List
    code_label: str

    def get_extra_unique_keys(self):
        """The `left_operand` and `right_operand` keys contains the values of the integers to sum."""
        return ('left_operand', 'right_operand')

    def get_all_extras_to_submit(self):
        """Get all the extras, i.e. the list of tuples the user wants to sum the contents of."""
        return self.sum_list

    def get_inputs_and_processclass_from_extras(self, extras_values):
        """Return inputs and process class for the submission of this specific process."""
        code = orm.load_code(self.code_label)
        inputs = {
            'code': code,
            'x': orm.Int(extras_values[0]),
            'y': orm.Int(extras_values[1])
        }
        return inputs, CalculationFactory(code.default_calc_job_plugin)

Other advantages are that it becomes very easy to share a specific controller instance, since pydantic models have a json() method and hence are easily serialisable and parsable from JSON.

One down side is that not all field types are supported, e.g. you cannot simply pass a Code type to the annotation. Custom fields can be implemented if this is necessary, but I think we can go pretty far with the supported types in pydantic.

EDIT: the latter is not an issue, apparently:

https://docs.pydantic.dev/usage/types/#arbitrary-types-allowed

@mbercx
Copy link
Member Author

mbercx commented Apr 16, 2023

Also you can avoid having to write a massive docstring and simply use: https://autodoc-pydantic.readthedocs.io/en/stable/users/examples.html#showcase

@mbercx
Copy link
Member Author

mbercx commented Apr 17, 2023

(Note that this validation is a change from the previous get_or_create approach. I can revert that, but I think it's the safer approach since silently creating a new group every time an instance of the BaseSubmissionController is made with a non-existing group label seems worse than simply having the user pass an existing group.)

Here the click validation caught the fact that I had a typo in my group label:

(aiida-super) aiida@prnmarvelsrv3:~/envs/aiida-super/submit/relax$ bash submit.sh 
PBE PseudoDojo relaxations

Usage: relax_controller.py [OPTIONS]
Try 'relax_controller.py --help' for help.

Error: Invalid value for '-w' / '--workchain-group': no Group found with LABEL<workchains/PBEsol/elph/Dq/k2>: No result was found

But I think it's clear that we should validate that the group exists instead of generating it silently on the fly.

@mbercx mbercx force-pushed the fix/use-pydantic branch 2 times, most recently from de5c04c to a2c940d Compare April 22, 2023 12:39
@mbercx mbercx force-pushed the fix/use-pydantic branch 2 times, most recently from ad07255 to 557628d Compare April 22, 2023 13:28
Breaking behaviour: In case no `Group` with `group_label` exists,
submission controllers created the specified group upon instantiation.
This means is the user simply had a typo in the label upon instatiation
of the controller, a new (useless) group was created. Here we adapt
this behaviour to check for the groups existence instead.

This is done by converting the controller classes into `pydantic`
`BaseModel`s and using a `validator`. Using `BaseModel` has a few added
benefits:

1. Much cleaner constructor specification, that is inherited by sub
   classes. This becomes especially appreciable for more complex submission
   controllers.
2. Automatic validation on all constructor inputs.
3. Adds type hinting to all constructor inputs, which makes the classes
   much more user friendly and facilitates development.
4. `pydantic` `BaseModel`s come with built-in JSON serialization
   support, which makes it easier to store and recreate submission
   controllers from JSON files. If we want to set up reproducable HTC
   infrastructure, this may come in handy.
@mbercx
Copy link
Member Author

mbercx commented Apr 22, 2023

Design discussed with and approved by @giovannipizzi during a meeting last week.

@mbercx mbercx merged commit a3b1d33 into aiidateam:main Apr 22, 2023
@mbercx mbercx deleted the fix/use-pydantic branch April 22, 2023 13:34
@mbercx
Copy link
Member Author

mbercx commented Apr 23, 2023

But I think it's clear that we should validate that the group exists instead of generating it silently on the fly.

To hammer home this point: I was using get_or_create in a convergence check setup (for which I'd need #15) to do this properly via a controller), and accidentally had a typo in my group name after adapting it to use f-strings. Submitted 1000 work chains before I saw what was going on. Probably the most imported thing you can validate for a submission controller because it relies on the extras of the work chain nodes in the submission group and hence there is a serious risk of submitting calculations twice.

Thank god AiiDA does make the cleanup easy. ^^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

♻️ REFACTOR: Use pydantic for type annotation and validation
1 participant