-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Rework input validation process #52
Conversation
7dd592e
to
08ac803
Compare
Codecov Report
@@ Coverage Diff @@
## main #52 +/- ##
==========================================
+ Coverage 97.36% 97.43% +0.06%
==========================================
Files 33 38 +5
Lines 1900 1951 +51
Branches 281 288 +7
==========================================
+ Hits 1850 1901 +51
Misses 34 34
Partials 16 16
Flags with carried forward coverage won't be shown. Click here to find out more.
|
self.check_min_bar_length(params, input_data) | ||
|
||
@staticmethod | ||
def check_min_bar_length(params, distrs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this to a validation module or somewhere else and import it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to split the checks into 2 parts: the ones that check that the code will not break (these are only executed in the preprocess step and are located in the preprocessing module) and the ones that check that the parameters should give a relevant result (these are also executed in the preprocess step but also during the grower execution and they should only raise warnings). I think the second category should be located in the grower because they might need to know the start_point
and the context
given to the grower. For example, I guess a set of parameters can give relevant results most of the time except when the start_point
is too close to the atlas boundary, so we need to know the start_point
and context
to check this. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, if the checks need more info than the parames and distrs, you could have the entire grower passing to each check function to ensure that for all check cases you will always have all the info available.
Having the check attached to the grower however makes it unavailable to use it from another grower for example that may not derive from this one. And it would be strange to import this grower just to use its check statics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah ok, it's probably better to get rid of the inheritance limitations indeed. I moved this.
neurots/preprocess/__init__.py
Outdated
for grow_type in params["grow_types"]: | ||
growth_method = params[grow_type]["growth_method"] | ||
for preprocess_func in _preprocess_functions[growth_method]: | ||
preprocess_func(params[grow_type], distrs[grow_type]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better maintenance, I would suggest that all preprocessing functions should return modified copies of params and distrs. Or at least copy them once and apply all the modifications to the copies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You really like functional programming, don't you? 😜
I am not sure it is relevant to do it each time, because some functions are called each time a grower is instantiated, which will lead to many copies. I think we should copy it only once at the beginning of the grower (which is already done for params but not for distrs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that's fine too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Adrien, if we don't need copies we better not create them...
+1 for the design. |
neurots/preprocess/__init__.py
Outdated
return inner | ||
|
||
|
||
@register_preprocess("trunk") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this design, where input data are checked depending on the need. I'll get back to this once you integrate the rest of the comments.
neurots/preprocess/__init__.py
Outdated
for grow_type in params["grow_types"]: | ||
growth_method = params[grow_type]["growth_method"] | ||
for preprocess_func in _preprocess_functions[growth_method]: | ||
preprocess_func(params[grow_type], distrs[grow_type]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Adrien, if we don't need copies we better not create them...
I reorganized the code so that all the check/preprocess functions are in the preprocess module. I created 3 kind of functions:
WDYT @eleftherioszisis @lidakanari ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks good imo. I let Eleftherios comment for the technical details.
@@ -58,24 +59,14 @@ def __init__( | |||
"""TMD basic grower.""" | |||
super().__init__(input_data, params, start_point, context) | |||
self.bif_method = bif_methods[params["branching_method"]] | |||
self.params = copy.deepcopy(params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was keeping this for safety reasons, could we keep it even though a copy might not be necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it from here because it is already in the __init__
of AbstractAlgo
, so calling super().__init__()
already performs the deep copy.
|
||
def preprocess_inputs(params, distrs): | ||
"""Validate and preprocess all inputs.""" | ||
params = deepcopy(params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the deepcopy happens here, but what about the case where we do not call the preprocess tools? Is it certain that the inputs are not modified at all when there is no preprocessing?
@@ -145,16 +145,20 @@ | |||
"properties": { | |||
"mean": { | |||
"description": "The mean of the distribution", | |||
"minimum": 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
LGTM, feel free to merge and update mine if it needs to be adapted to this |
I have a few minor comments, but overall it's legitimate. One last thing, can we please use "relevance" instead of "relevancy"? I know they are synonyms, but my brain is triggered because "relevance" is the most used word. :D |
0f56f41
to
64bbe81
Compare
I changed for |
I am good with the changes. @lidakanari I leave it to you to approve. |
64bbe81
to
500c539
Compare
I just rebased on main, I guess if CI pass we can merge? |
No description provided.