Feat: Rework input validation process #52

adrien-berchet · 2022-10-04T15:31:09Z

No description provided.

codecov · 2022-10-05T06:58:01Z

Codecov Report

Merging #52 (500c539) into main (c43fa25) will increase coverage by 0.06%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main      #52      +/-   ##
==========================================
+ Coverage   97.36%   97.43%   +0.06%     
==========================================
  Files          33       38       +5     
  Lines        1900     1951      +51     
  Branches      281      288       +7     
==========================================
+ Hits         1850     1901      +51     
  Misses         34       34              
  Partials       16       16

Flag	Coverage Δ
pytest	`97.43% <100.00%> (+0.06%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
neurots/generate/algorithms/tmdgrower.py	`99.25% <100.00%> (-0.03%)`	⬇️
neurots/preprocess/__init__.py	`100.00% <100.00%> (ø)`
neurots/preprocess/exceptions.py	`100.00% <100.00%> (ø)`
neurots/preprocess/relevance_checkers.py	`100.00% <100.00%> (ø)`
neurots/preprocess/utils.py	`100.00% <100.00%> (ø)`
neurots/preprocess/validity_checkers.py	`100.00% <100.00%> (ø)`

eleftherioszisis · 2022-10-06T07:27:27Z

neurots/generate/algorithms/tmdgrower.py

+            self.check_min_bar_length(params, input_data)
+
+    @staticmethod
+    def check_min_bar_length(params, distrs):


I would move this to a validation module or somewhere else and import it here.

I wanted to split the checks into 2 parts: the ones that check that the code will not break (these are only executed in the preprocess step and are located in the preprocessing module) and the ones that check that the parameters should give a relevant result (these are also executed in the preprocess step but also during the grower execution and they should only raise warnings). I think the second category should be located in the grower because they might need to know the start_point and the context given to the grower. For example, I guess a set of parameters can give relevant results most of the time except when the start_point is too close to the atlas boundary, so we need to know the start_point and context to check this. WDYT?

Hmm, if the checks need more info than the parames and distrs, you could have the entire grower passing to each check function to ensure that for all check cases you will always have all the info available.

Having the check attached to the grower however makes it unavailable to use it from another grower for example that may not derive from this one. And it would be strange to import this grower just to use its check statics.

Yeah ok, it's probably better to get rid of the inheritance limitations indeed. I moved this.

neurots/generate/tree.py

neurots/preprocess/__init__.py

eleftherioszisis · 2022-10-06T07:39:15Z

neurots/preprocess/__init__.py

+    for grow_type in params["grow_types"]:
+        growth_method = params[grow_type]["growth_method"]
+        for preprocess_func in _preprocess_functions[growth_method]:
+            preprocess_func(params[grow_type], distrs[grow_type])


For better maintenance, I would suggest that all preprocessing functions should return modified copies of params and distrs. Or at least copy them once and apply all the modifications to the copies.

You really like functional programming, don't you? 😜
I am not sure it is relevant to do it each time, because some functions are called each time a grower is instantiated, which will lead to many copies. I think we should copy it only once at the beginning of the grower (which is already done for params but not for distrs).

Sure, that's fine too.

I agree with Adrien, if we don't need copies we better not create them...

tests/test_preprocess.py

eleftherioszisis · 2022-10-06T07:45:28Z

+1 for the design.

lidakanari · 2022-10-06T10:47:15Z

neurots/preprocess/__init__.py

+    return inner
+
+
+@register_preprocess("trunk")


I like this design, where input data are checked depending on the need. I'll get back to this once you integrate the rest of the comments.

lidakanari · 2022-10-06T10:47:53Z

neurots/preprocess/__init__.py

+    for grow_type in params["grow_types"]:
+        growth_method = params[grow_type]["growth_method"]
+        for preprocess_func in _preprocess_functions[growth_method]:
+            preprocess_func(params[grow_type], distrs[grow_type])


I agree with Adrien, if we don't need copies we better not create them...

adrien-berchet · 2022-10-07T08:28:11Z

I reorganized the code so that all the check/preprocess functions are in the preprocess module. I created 3 kind of functions:

validity checkers: they check that the input parameters and distributions given by the user are valid (if they are not the code will probably break or have undefined behavior).
preprocesses:
- they can consider that the inputs are already valid ;
- they can update/modify the parameters and distributions ;
- they are responsible to ensure the updated parameters and distributions are still valid. (I think this is what @arnaudon needs for the fit in 3d trunk angles #49 )
relevancy checkers: they check that the current parameters and distributions will give 'relevant' results ; they can take optional start_point and a context arguments to refine the checks.

WDYT @eleftherioszisis @lidakanari ?

lidakanari

I think it looks good imo. I let Eleftherios comment for the technical details.

lidakanari · 2022-10-31T07:20:13Z

neurots/generate/algorithms/tmdgrower.py

@@ -58,24 +59,14 @@ def __init__(
        """TMD basic grower."""
        super().__init__(input_data, params, start_point, context)
        self.bif_method = bif_methods[params["branching_method"]]
-        self.params = copy.deepcopy(params)


I was keeping this for safety reasons, could we keep it even though a copy might not be necessary?

I removed it from here because it is already in the __init__ of AbstractAlgo, so calling super().__init__() already performs the deep copy.

lidakanari · 2022-10-31T07:21:46Z

neurots/preprocess/utils.py

+
+def preprocess_inputs(params, distrs):
+    """Validate and preprocess all inputs."""
+    params = deepcopy(params)


I see the deepcopy happens here, but what about the case where we do not call the preprocess tools? Is it certain that the inputs are not modified at all when there is no preprocessing?

lidakanari · 2022-10-31T07:23:29Z

neurots/schemas/parameters.json

@@ -145,16 +145,20 @@
                            "properties": {
                                "mean": {
                                    "description": "The mean of the distribution",
+                                    "minimum": 0,


Good point!

arnaudon · 2022-11-03T09:47:00Z

LGTM, feel free to merge and update mine if it needs to be adapted to this

neurots/preprocess/utils.py

neurots/preprocess/validity_checkers.py

eleftherioszisis · 2022-11-03T10:56:48Z

I have a few minor comments, but overall it's legitimate. One last thing, can we please use "relevance" instead of "relevancy"? I know they are synonyms, but my brain is triggered because "relevance" is the most used word. :D

adrien-berchet · 2022-11-03T11:52:47Z

I have a few minor comments, but overall it's legitimate. One last thing, can we please use "relevance" instead of "relevancy"? I know they are synonyms, but my brain is triggered because "relevance" is the most used word. :D

I changed for relevance, I didn't know it was more used than relevancy, I guess I'm an old school guy :)

eleftherioszisis · 2022-11-03T12:09:36Z

I am good with the changes. @lidakanari I leave it to you to approve.

arnaudon · 2023-01-27T16:04:41Z

I just rebased on main, I guess if CI pass we can merge?

adrien-berchet marked this pull request as draft October 4, 2022 15:31

adrien-berchet force-pushed the validation_preprocess branch from 7dd592e to 08ac803 Compare October 4, 2022 15:32

adrien-berchet requested review from lidakanari, eleftherioszisis and arnaudon October 4, 2022 15:33

eleftherioszisis reviewed Oct 6, 2022

View reviewed changes

tests/test_preprocess.py Show resolved Hide resolved

lidakanari reviewed Oct 6, 2022

View reviewed changes

lidakanari reviewed Oct 31, 2022

View reviewed changes

eleftherioszisis reviewed Nov 3, 2022

View reviewed changes

neurots/preprocess/utils.py Outdated Show resolved Hide resolved

neurots/preprocess/utils.py Outdated Show resolved Hide resolved

neurots/preprocess/validity_checkers.py Outdated Show resolved Hide resolved

adrien-berchet changed the title ~~Rework input validation process~~ Feat: Rework input validation process Nov 3, 2022

adrien-berchet force-pushed the validation_preprocess branch from 0f56f41 to 64bbe81 Compare November 3, 2022 11:47

adrien-berchet marked this pull request as ready for review November 3, 2022 11:51

adrien-berchet added 7 commits January 27, 2023 17:03

Rework input validation process

2914b8b

New proposal

e86cb53

Review comments

c425022

Lint

b7602f6

Reorganize checkers

8149405

Improve tests and schemas

2e00183

Some renaming and new tests

500c539

arnaudon force-pushed the validation_preprocess branch from 64bbe81 to 500c539 Compare January 27, 2023 16:03

arnaudon approved these changes Jan 27, 2023

View reviewed changes

arnaudon merged commit f1d6a97 into main Jan 28, 2023

arnaudon deleted the validation_preprocess branch January 28, 2023 10:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Rework input validation process #52

Feat: Rework input validation process #52

adrien-berchet commented Oct 4, 2022

codecov bot commented Oct 5, 2022 •

edited

Loading

eleftherioszisis Oct 6, 2022

adrien-berchet Oct 6, 2022

eleftherioszisis Oct 7, 2022

adrien-berchet Oct 7, 2022

eleftherioszisis Oct 6, 2022

adrien-berchet Oct 6, 2022

eleftherioszisis Oct 6, 2022

lidakanari Oct 6, 2022

eleftherioszisis commented Oct 6, 2022

lidakanari Oct 6, 2022

lidakanari Oct 6, 2022

adrien-berchet commented Oct 7, 2022 •

edited

Loading

lidakanari left a comment

lidakanari Oct 31, 2022

adrien-berchet Oct 31, 2022

lidakanari Oct 31, 2022

lidakanari Oct 31, 2022

arnaudon commented Nov 3, 2022

eleftherioszisis commented Nov 3, 2022

adrien-berchet commented Nov 3, 2022

eleftherioszisis commented Nov 3, 2022

arnaudon commented Jan 27, 2023

Feat: Rework input validation process #52

Feat: Rework input validation process #52

Conversation

adrien-berchet commented Oct 4, 2022

codecov bot commented Oct 5, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eleftherioszisis commented Oct 6, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrien-berchet commented Oct 7, 2022 • edited Loading

lidakanari left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arnaudon commented Nov 3, 2022

eleftherioszisis commented Nov 3, 2022

adrien-berchet commented Nov 3, 2022

eleftherioszisis commented Nov 3, 2022

arnaudon commented Jan 27, 2023

codecov bot commented Oct 5, 2022 •

edited

Loading

adrien-berchet commented Oct 7, 2022 •

edited

Loading