Cellpose normalization #738

fstur · 2024-05-06T16:35:50Z

Checklist before merging

[ x] I added an appropriate entry to CHANGELOG.md

The cellpose_segmentation task currently accepts a custom normalization option. The problem is, that if two channels are used for the cellpose segmentation, both channels are normalized with the same CellposeCustomNormalizer, but one might want to normalize them with different parameters.
I added an optional normalize2 argument to cellpose_segmentation, and updated segement_ROI accordingly.
No tests are implemented yet.

…ation

jluethi · 2024-06-05T17:49:06Z

Thanks a lot for the PR @fstur ! Very useful functionality to have indeed.

As discussed today, @lorenzocerrone and I refactored the Cellpose input models to handle the normalization aspects better: Normalization can now be specified when selecting the channel. This needed a bit of reworking to make it work with the Pydantic models afterwards.
I also used this opportunity to move a bit of code out of the task logic into the new CellposeChannel1InputModel.

Things that remain:

Add specific tests for the normalization to ensure it works as expected => check coverage & add new unit tests for normalization functions (goal is not to test cellpose behavior itself)
Refactor other cellpose inputs into an Advanced Cellpose Parameters model to keep the task reasonably compact
Update Changelog with API changes to Cellpose task
Test locally on some real data with different normalization options

Unrelated (cc @tcompa ):
I'm suggesting we disabled flake8 E203 checks, because this block has a flake8 vs. black conflict about whether there should be spaces around the colon:

x[channels[0] - 1 : channels[0]] = normalized_img(
                x[channels[0] - 1 : channels[0]],
                lower_p=normalize.lower_percentile,
                upper_p=normalize.upper_percentile,
                lower_bound=normalize.lower_bound,
                upper_bound=normalize.upper_bound,
            )

It's also what made the checks above fail before.

I also added an exception for flake8 W503, which conflicts with black in how to format the cellpose tests.
(unrelated: do we want to consider moving to ruff linting?)

tcompa · 2024-06-07T06:44:25Z

Unrelated (cc @tcompa ): I'm suggesting we disabled flake8 E203 [...]

Agreed. This handling of : also annoys me, on top of often requiring # noqa comments.

(unrelated: do we want to consider moving to ruff linting?)

Sure, nothing against it. It's not like pre-commit takes very long (time poetry run pre-commit run --all-files takes ~1.7 seconds on my machine, which includes reordering imports, black, flake8 and bandit), but of course a speedup wouldn't hurt.
Once we have it in this repo, let's also make sure we adopt it in fractal-server and fractal-tasks-core - to keep them consistent.

…nalytics-platform/fractal-tasks-core into cellpose_normalization

jluethi · 2024-06-07T11:59:27Z

Hey @fstur
I finished my work on this PR. It now has the channel & normalization input combined, plus the whole normalization logic moved to its own, tests helper function (see _normalize_cellpose_channels).
I also modified the advanced model inputs into a CellposeModelParams object that can be passed to the task, so the logic of the task input changes a bit both on the Python side as well as the Fractal web side.

Can you check whether this version now works for you? I'd be happy with merging this into fractal tasks core & then providing the normalization approach for both channels.

@tcompa Can you review this PR?

One of the slightly weird things I'm doing is making channel2 optional in the task signature to avoid it being expanded in Fractal web. Because the whole block is already quite big.
Fractal web always initializes an object for it though, because the Normalizer has default values => I'll open a separate issue to discuss this further.

@lorenzocerrone If you also want to have a look, open for further feedback :) But not a requirement that you review the PR.

…ield better

tcompa · 2024-06-10T09:30:23Z

One of the slightly weird things I'm doing is making channel2 optional in the task signature to avoid it being expanded in Fractal web

This comment is not relevant any more as of b63547f, right?

jluethi · 2024-06-10T09:32:10Z

This comment is not relevant any more as of b63547f, right?

Right! Now I made the normalize part of channel have a default. Would be great if you could review if the use of Field with default_factory mode is a good approach for this

fractal_tasks_core/tasks/cellpose_utils.py

tcompa · 2024-06-10T12:04:17Z

Re: channel2 type hint and JSON Schema

I could not find a better way for making channel2 fully optional (as in "it is set to null in fractal-web UI unless at lease one of its attribute was set explicitly by the user"). Thus I'd proceed as in the PR: the is_set() method is quite clear in its meaning, and it doesn't introduce a lot of complexity.

Higher-level: I agree about

Fractal web always initializes an object for it though, because the Normalizer has default values

and this is what blocks us towards a clean version where channel2 is fully optional. Note that this issue is at least partly a consequence of fractal-web logic: when working with a CLI client we wouldn't have issues in setting a None default for an Optional function argument.

I'll open a separate issue to discuss this further.

I think it's worth continuing this discussion (in a different issue, indeed), because it sounds like a possibly frequent pattern. Note that the outcome in principle could also look like "refactor the CellposeCustomNormalizer model so that it has no default value", since this would likely solve the whole issue.

[continues below, concerning default_factory]

tcompa · 2024-06-10T12:24:07Z

Re: default_factory

Using arg: Field(default_factory=Model) rather than arg: Model = Model() has a difference, but I think it is not relevant for this case. Note that the latter is the one we are currently using in main for

fractal-tasks-core/fractal_tasks_core/tasks/cellpose_segmentation.py

Line 214 in 00cbfdf

normalize: CellposeCustomNormalizer = CellposeCustomNormalizer(),

The main difference is that the produced JSON Schemas are different. In the example below, I reproduce a minimal example with two function arguments (one always using default_factory, and the other using the same Model() option):

Python

from typing import Optional, Literal
from pydantic import BaseModel, Field


class Normalizer(BaseModel):
    """
    Description of Normalizer

    Attributes:
        type: Description of type
        time: Description of time
    """
    type: Literal["option_a", "option_b"] = "option_a"

class MyModelDefaultFactory(BaseModel):
    """
    Description of ChannelModel

    Attributes:
        normalizer: Description of normalizer.
        time: Description of time.
        attr: Description of attr1.
    """
    normalizer: Normalizer = Field(default_factory=Normalizer)
    time: float = Field(default_factory=time.perf_counter)
    attr: Optional[str] = None

class MyModelObjectInit(BaseModel):
    """
    Description of ChannelModel

    Attributes:
        normalizer: Description of normalizer.
        time: Description of time.
        attr: Description of attr1.
    """
    normalizer: Normalizer = Normalizer()
    time: float = time.perf_counter()
    attr: Optional[str] = None


@validate_arguments
def function(
        arg_default_factory: MyModelDefaultFactory = Field(default_factory=MyModelDefaultFactory),
        arg2: MyModelObjectInit = MyModelObjectInit(),
        ):
    """
    Description of function

    Args:
        arg_default_factory: Description of arg_default_factory
        arg2: Description of arg2
    """
    pass

which leads to

{
  "title": "Function",
  "type": "object",
  "properties": {
    "arg_default_factory": {
      "$ref": "#/definitions/MyModelDefaultFactory",
      "title": "Arg_Default_Factory",
      "description": "Description of arg_default_factory"
    },
    "arg2": {
      "title": "Arg2",
      "default": {
        "normalizer": {
          "type": "option_a"
        },
        "time": 24457.131491715,
        "attr": null
      },
      "allOf": [
        {
          "$ref": "#/definitions/MyModelObjectInit"
        }
      ],
      "description": "Description of arg2"
    }
  },
  "additionalProperties": false,
  "definitions": {
    "Normalizer": {
      "title": "Normalizer",
      "description": "Description of Normalizer",
      "type": "object",
      "properties": {
        "type": {
          "title": "Type",
          "default": "option_a",
          "enum": [
            "option_a",
            "option_b"
          ],
          "type": "string"
        }
      }
    },
    "MyModelDefaultFactory": {
      "title": "MyModelDefaultFactory",
      "description": "Description of ChannelModel",
      "type": "object",
      "properties": {
        "normalizer": {
          "$ref": "#/definitions/Normalizer",
          "title": "Normalizer"
        },
        "time": {
          "title": "Time",
          "type": "number"
        },
        "attr": {
          "title": "Attr",
          "type": "string"
        }
      }
    },
    "MyModelObjectInit": {
      "title": "MyModelObjectInit",
      "description": "Description of ChannelModel",
      "type": "object",
      "properties": {
        "normalizer": {
          "title": "Normalizer",
          "default": {
            "type": "option_a"
          },
          "allOf": [
            {
              "$ref": "#/definitions/Normalizer"
            }
          ]
        },
        "time": {
          "title": "Time",
          "default": 24457.131491715,
          "type": "number"
        },
        "attr": {
          "title": "Attr",
          "type": "string"
        }
      }
    }
  }
}

The main difference here is the presence of

 "default": {
        "normalizer": {
          "type": "option_a"
        },
        "time": 24457.131491715,
        "attr": null
      },

for arg2, while it is missing when we use default_factory.
Part of the explanation is that default_factory should not be filled upon schema generation (see e.g. pydantic/pydantic#1520), and the time attribute that I include above helps to show this behavior (it is present when it's called explicitly, but not when using a default_factory=time.perf_counter).

Does this matter for `fractal-web`?

In the fractal-web UI, this difference appears as irrelevant. If I don't modify any form field, current data are identical for the two cases (apart from the time attribute, but that is expected):

Does this matter for `fractal-server`?

For top-level arguments (that is, this does not concern nested attribute of complex arguments), the presence/absence of a default in the JSON Schema does matter in fractal-server. If a default value is present, then it is used as a basis during any insert-worfklowtask or patch-workflowtask operation.
Considering the example above:

If we always go through fractal-web, there will always be a client-provided value (and therefore the default is not relevant).
If we use a different client, then the usage based on default_factory will have no default in the arguments JSON file, and the default will be generated at runtime via the default_factory.

I cannot find any issue with either option, right now, but I wouldn't be surprised if we eventually identify some edge case of one or the other.

[continues with a TL;DR]

tcompa · 2024-06-10T12:30:44Z

TL;DR re: default_factory

The advantage of using default_factory is that we are not using a mutable function argument (which is a bad practice, but irrelevant for the standard command-line usage of fractal tasks)
The advantage of using arg: MyModel = MyModel() is that the default field in the JSON Schema is set, making it more transparent.
The two cases appear in the same in fractal-web, for my dummy example and also for the current PR.
I cannot think of an actual problematic edge case related to using one or the other.

If I were to choose, I would replace

channel2: CellposeChannel2InputModel = Field(default_factory=CellposeChannel2InputModel)

with

channel2: CellposeChannel2InputModel = CellposeChannel2InputModel()

which creates a more transparent JSON Schema that includes

-            "$ref": "#/definitions/CellposeChannel2InputModel",
             "title": "Channel2",
+            "default": {
+              "wavelength_id": null,
+              "label": null,
+              "normalize": {
+                "type": "default",
+                "lower_percentile": null,
+                "upper_percentile": null,
+                "lower_bound": null,
+                "upper_bound": null
+              }
+            },
+            "allOf": [
+              {
+                "$ref": "#/definitions/CellposeChannel2InputModel"
+              }
+            ],

but this is just personal preference and I have no specific reasons for it.

tcompa · 2024-06-10T12:31:54Z

This closes my feedback on the arguments/schema-related part of the PR.

I can still review the rest of the PR, if it's useful, but that will be a bit later.

jluethi · 2024-06-17T16:18:09Z

Thanks for the review & the reasoning on trade-offs @tcompa . We'll stay with this version for the time being and will evaluate later whether we'll have a strong opinion one way or the other for Fractal tasks.

jluethi · 2024-06-17T16:18:51Z

Thanks @fstur for starting this work, I'm now merging this PR so we have the updated version of the Cellpose task with easier input handling & normalize options for both channels available in main :)

fstur and others added 7 commits May 2, 2024 17:03

added separate normalization for channel2

a64edeb

Merge branch 'fractal-analytics-platform:main' into cellpose_normaliz…

dc40fbb

…ation

Update __FRACTAL_MANIFEST__.json

f430d19

Update CHANGELOG.md

e36c4da

Change input models for Cellpose task to include normalization options

c98f570

Update Cellpose tests

8d114af

Update docstrings & manifest

5238667

tcompa and others added 13 commits June 7, 2024 11:02

Merge branch 'main' into cellpose_normalization

7ec82b8

Merge branch 'main' into cellpose_normalization

c6c6748

Update Fractal task input to use advanced_cellpose_model_parameters

335733a

Improve CellposeModelParams

5b6b8b5

Update cellpose tests for CellposeModelParams

a48e027

Update tests, rename cellpose_transforms to cellpose_utils

a229d37

Update changelog

5757e1f

Merge branch 'cellpose_normalization' of https://github.com/fractal-a…

070b2cd

…nalytics-platform/fractal-tasks-core into cellpose_normalization

Merge local branch into remote branch

dd6b5bc

Update docstring

6b4ba17

Cleanup changelog

151acd3

Add unit test for _normalize_cellpose_channels function

cc26a49

Merge branch 'main' into cellpose_normalization

342e5a2

jluethi requested a review from tcompa June 7, 2024 11:53

jluethi added 3 commits June 7, 2024 16:03

Tweak CellposeChannel1/2InputModel to handle expansion of normalize f…

1706c6c

…ield better

Minor cleanup

790b613

Remove Optional from channel2 input

b63547f

jluethi mentioned this pull request Jun 7, 2024

Allow developers to influence schema building in finer detail #755

Open

Fix Cellpose task signature

82004ee

tcompa reviewed Jun 10, 2024

View reviewed changes

fractal_tasks_core/tasks/cellpose_utils.py Show resolved Hide resolved

tcompa mentioned this pull request Jun 12, 2024

[auto] Update dependencies in poetry.lock #761

Closed

jluethi merged commit 371bbc0 into fractal-analytics-platform:main Jun 17, 2024
16 checks passed

lorenzocerrone added a commit that referenced this pull request Jun 18, 2024

fix missing v1 refactor from incoming #738

5c152a8

This was referenced Jul 24, 2024

Nested default parameters not shown in workflow page fractal-analytics-platform/fractal-web#537

Closed

Sandbox vs webclient behavior for nested arguments fractal-analytics-platform/fractal-web#538

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cellpose normalization #738

Cellpose normalization #738

fstur commented May 6, 2024

jluethi commented Jun 5, 2024 •

edited

Loading

tcompa commented Jun 7, 2024

jluethi commented Jun 7, 2024

tcompa commented Jun 10, 2024

jluethi commented Jun 10, 2024

tcompa commented Jun 10, 2024

tcompa commented Jun 10, 2024

tcompa commented Jun 10, 2024

tcompa commented Jun 10, 2024

jluethi commented Jun 17, 2024

jluethi commented Jun 17, 2024

Cellpose normalization #738

Cellpose normalization #738

Conversation

fstur commented May 6, 2024

Checklist before merging

jluethi commented Jun 5, 2024 • edited Loading

tcompa commented Jun 7, 2024

jluethi commented Jun 7, 2024

tcompa commented Jun 10, 2024

jluethi commented Jun 10, 2024

tcompa commented Jun 10, 2024

tcompa commented Jun 10, 2024

Does this matter for fractal-web?

Does this matter for fractal-server?

tcompa commented Jun 10, 2024

tcompa commented Jun 10, 2024

jluethi commented Jun 17, 2024

jluethi commented Jun 17, 2024

jluethi commented Jun 5, 2024 •

edited

Loading

Does this matter for `fractal-web`?

Does this matter for `fractal-server`?