Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Rework input validation process #52

Merged
merged 7 commits into from
Jan 28, 2023
Merged

Conversation

adrien-berchet
Copy link
Member

No description provided.

@codecov
Copy link

codecov bot commented Oct 5, 2022

Codecov Report

Merging #52 (500c539) into main (c43fa25) will increase coverage by 0.06%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main      #52      +/-   ##
==========================================
+ Coverage   97.36%   97.43%   +0.06%     
==========================================
  Files          33       38       +5     
  Lines        1900     1951      +51     
  Branches      281      288       +7     
==========================================
+ Hits         1850     1901      +51     
  Misses         34       34              
  Partials       16       16              
Flag Coverage Δ
pytest 97.43% <100.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
neurots/generate/algorithms/tmdgrower.py 99.25% <100.00%> (-0.03%) ⬇️
neurots/preprocess/__init__.py 100.00% <100.00%> (ø)
neurots/preprocess/exceptions.py 100.00% <100.00%> (ø)
neurots/preprocess/relevance_checkers.py 100.00% <100.00%> (ø)
neurots/preprocess/utils.py 100.00% <100.00%> (ø)
neurots/preprocess/validity_checkers.py 100.00% <100.00%> (ø)

self.check_min_bar_length(params, input_data)

@staticmethod
def check_min_bar_length(params, distrs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this to a validation module or somewhere else and import it here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to split the checks into 2 parts: the ones that check that the code will not break (these are only executed in the preprocess step and are located in the preprocessing module) and the ones that check that the parameters should give a relevant result (these are also executed in the preprocess step but also during the grower execution and they should only raise warnings). I think the second category should be located in the grower because they might need to know the start_point and the context given to the grower. For example, I guess a set of parameters can give relevant results most of the time except when the start_point is too close to the atlas boundary, so we need to know the start_point and context to check this. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if the checks need more info than the parames and distrs, you could have the entire grower passing to each check function to ensure that for all check cases you will always have all the info available.

Having the check attached to the grower however makes it unavailable to use it from another grower for example that may not derive from this one. And it would be strange to import this grower just to use its check statics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah ok, it's probably better to get rid of the inheritance limitations indeed. I moved this.

neurots/generate/tree.py Outdated Show resolved Hide resolved
neurots/generate/tree.py Outdated Show resolved Hide resolved
neurots/preprocess/__init__.py Outdated Show resolved Hide resolved
for grow_type in params["grow_types"]:
growth_method = params[grow_type]["growth_method"]
for preprocess_func in _preprocess_functions[growth_method]:
preprocess_func(params[grow_type], distrs[grow_type])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better maintenance, I would suggest that all preprocessing functions should return modified copies of params and distrs. Or at least copy them once and apply all the modifications to the copies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You really like functional programming, don't you? 😜
I am not sure it is relevant to do it each time, because some functions are called each time a grower is instantiated, which will lead to many copies. I think we should copy it only once at the beginning of the grower (which is already done for params but not for distrs).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's fine too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Adrien, if we don't need copies we better not create them...

@eleftherioszisis
Copy link
Contributor

+1 for the design.

return inner


@register_preprocess("trunk")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this design, where input data are checked depending on the need. I'll get back to this once you integrate the rest of the comments.

for grow_type in params["grow_types"]:
growth_method = params[grow_type]["growth_method"]
for preprocess_func in _preprocess_functions[growth_method]:
preprocess_func(params[grow_type], distrs[grow_type])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Adrien, if we don't need copies we better not create them...

@adrien-berchet
Copy link
Member Author

adrien-berchet commented Oct 7, 2022

I reorganized the code so that all the check/preprocess functions are in the preprocess module. I created 3 kind of functions:

  • validity checkers: they check that the input parameters and distributions given by the user are valid (if they are not the code will probably break or have undefined behavior).
  • preprocesses:
    • they can consider that the inputs are already valid ;
    • they can update/modify the parameters and distributions ;
    • they are responsible to ensure the updated parameters and distributions are still valid. (I think this is what @arnaudon needs for the fit in 3d trunk angles #49 )
  • relevancy checkers: they check that the current parameters and distributions will give 'relevant' results ; they can take optional start_point and a context arguments to refine the checks.

WDYT @eleftherioszisis @lidakanari ?

Copy link
Collaborator

@lidakanari lidakanari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good imo. I let Eleftherios comment for the technical details.

@@ -58,24 +59,14 @@ def __init__(
"""TMD basic grower."""
super().__init__(input_data, params, start_point, context)
self.bif_method = bif_methods[params["branching_method"]]
self.params = copy.deepcopy(params)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was keeping this for safety reasons, could we keep it even though a copy might not be necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it from here because it is already in the __init__ of AbstractAlgo, so calling super().__init__() already performs the deep copy.


def preprocess_inputs(params, distrs):
"""Validate and preprocess all inputs."""
params = deepcopy(params)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the deepcopy happens here, but what about the case where we do not call the preprocess tools? Is it certain that the inputs are not modified at all when there is no preprocessing?

@@ -145,16 +145,20 @@
"properties": {
"mean": {
"description": "The mean of the distribution",
"minimum": 0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

@arnaudon
Copy link
Contributor

arnaudon commented Nov 3, 2022

LGTM, feel free to merge and update mine if it needs to be adapted to this

neurots/preprocess/utils.py Outdated Show resolved Hide resolved
neurots/preprocess/utils.py Outdated Show resolved Hide resolved
neurots/preprocess/validity_checkers.py Outdated Show resolved Hide resolved
@eleftherioszisis
Copy link
Contributor

I have a few minor comments, but overall it's legitimate. One last thing, can we please use "relevance" instead of "relevancy"? I know they are synonyms, but my brain is triggered because "relevance" is the most used word. :D

@adrien-berchet adrien-berchet changed the title Rework input validation process Feat: Rework input validation process Nov 3, 2022
@adrien-berchet adrien-berchet marked this pull request as ready for review November 3, 2022 11:51
@adrien-berchet
Copy link
Member Author

I have a few minor comments, but overall it's legitimate. One last thing, can we please use "relevance" instead of "relevancy"? I know they are synonyms, but my brain is triggered because "relevance" is the most used word. :D

I changed for relevance, I didn't know it was more used than relevancy, I guess I'm an old school guy :)

@eleftherioszisis
Copy link
Contributor

I am good with the changes. @lidakanari I leave it to you to approve.

@arnaudon arnaudon force-pushed the validation_preprocess branch from 64bbe81 to 500c539 Compare January 27, 2023 16:03
@arnaudon
Copy link
Contributor

I just rebased on main, I guess if CI pass we can merge?

@arnaudon arnaudon merged commit f1d6a97 into main Jan 28, 2023
@arnaudon arnaudon deleted the validation_preprocess branch January 28, 2023 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants