Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cellpose normalization #738

Merged
merged 24 commits into from
Jun 17, 2024

Conversation

fstur
Copy link
Contributor

@fstur fstur commented May 6, 2024

Checklist before merging

  • [ x] I added an appropriate entry to CHANGELOG.md

The cellpose_segmentation task currently accepts a custom normalization option. The problem is, that if two channels are used for the cellpose segmentation, both channels are normalized with the same CellposeCustomNormalizer, but one might want to normalize them with different parameters.
I added an optional normalize2 argument to cellpose_segmentation, and updated segement_ROI accordingly.
No tests are implemented yet.

@jluethi
Copy link
Collaborator

jluethi commented Jun 5, 2024

Thanks a lot for the PR @fstur ! Very useful functionality to have indeed.

As discussed today, @lorenzocerrone and I refactored the Cellpose input models to handle the normalization aspects better: Normalization can now be specified when selecting the channel. This needed a bit of reworking to make it work with the Pydantic models afterwards.
I also used this opportunity to move a bit of code out of the task logic into the new CellposeChannel1InputModel.

Things that remain:

  • Add specific tests for the normalization to ensure it works as expected => check coverage & add new unit tests for normalization functions (goal is not to test cellpose behavior itself)
  • Refactor other cellpose inputs into an Advanced Cellpose Parameters model to keep the task reasonably compact
  • Update Changelog with API changes to Cellpose task
  • Test locally on some real data with different normalization options

Unrelated (cc @tcompa ):
I'm suggesting we disabled flake8 E203 checks, because this block has a flake8 vs. black conflict about whether there should be spaces around the colon:

x[channels[0] - 1 : channels[0]] = normalized_img(
                x[channels[0] - 1 : channels[0]],
                lower_p=normalize.lower_percentile,
                upper_p=normalize.upper_percentile,
                lower_bound=normalize.lower_bound,
                upper_bound=normalize.upper_bound,
            )

It's also what made the checks above fail before.

I also added an exception for flake8 W503, which conflicts with black in how to format the cellpose tests.
(unrelated: do we want to consider moving to ruff linting?)

@tcompa
Copy link
Collaborator

tcompa commented Jun 7, 2024

Unrelated (cc @tcompa ): I'm suggesting we disabled flake8 E203 [...]

Agreed. This handling of : also annoys me, on top of often requiring # noqa comments.

(unrelated: do we want to consider moving to ruff linting?)

Sure, nothing against it. It's not like pre-commit takes very long (time poetry run pre-commit run --all-files takes ~1.7 seconds on my machine, which includes reordering imports, black, flake8 and bandit), but of course a speedup wouldn't hurt.
Once we have it in this repo, let's also make sure we adopt it in fractal-server and fractal-tasks-core - to keep them consistent.

@jluethi jluethi requested a review from tcompa June 7, 2024 11:53
@jluethi
Copy link
Collaborator

jluethi commented Jun 7, 2024

Hey @fstur
I finished my work on this PR. It now has the channel & normalization input combined, plus the whole normalization logic moved to its own, tests helper function (see _normalize_cellpose_channels).
I also modified the advanced model inputs into a CellposeModelParams object that can be passed to the task, so the logic of the task input changes a bit both on the Python side as well as the Fractal web side.

Can you check whether this version now works for you? I'd be happy with merging this into fractal tasks core & then providing the normalization approach for both channels.


@tcompa Can you review this PR?

One of the slightly weird things I'm doing is making channel2 optional in the task signature to avoid it being expanded in Fractal web. Because the whole block is already quite big.
Fractal web always initializes an object for it though, because the Normalizer has default values => I'll open a separate issue to discuss this further.


@lorenzocerrone If you also want to have a look, open for further feedback :) But not a requirement that you review the PR.

@tcompa
Copy link
Collaborator

tcompa commented Jun 10, 2024

One of the slightly weird things I'm doing is making channel2 optional in the task signature to avoid it being expanded in Fractal web

This comment is not relevant any more as of b63547f, right?

@jluethi
Copy link
Collaborator

jluethi commented Jun 10, 2024

This comment is not relevant any more as of b63547f, right?

Right! Now I made the normalize part of channel have a default. Would be great if you could review if the use of Field with default_factory mode is a good approach for this

@tcompa
Copy link
Collaborator

tcompa commented Jun 10, 2024

Re: channel2 type hint and JSON Schema

I could not find a better way for making channel2 fully optional (as in "it is set to null in fractal-web UI unless at lease one of its attribute was set explicitly by the user"). Thus I'd proceed as in the PR: the is_set() method is quite clear in its meaning, and it doesn't introduce a lot of complexity.

Higher-level: I agree about

Fractal web always initializes an object for it though, because the Normalizer has default values

and this is what blocks us towards a clean version where channel2 is fully optional. Note that this issue is at least partly a consequence of fractal-web logic: when working with a CLI client we wouldn't have issues in setting a None default for an Optional function argument.


I'll open a separate issue to discuss this further.

I think it's worth continuing this discussion (in a different issue, indeed), because it sounds like a possibly frequent pattern. Note that the outcome in principle could also look like "refactor the CellposeCustomNormalizer model so that it has no default value", since this would likely solve the whole issue.

[continues below, concerning default_factory]

@tcompa
Copy link
Collaborator

tcompa commented Jun 10, 2024

Re: default_factory

Using arg: Field(default_factory=Model) rather than arg: Model = Model() has a difference, but I think it is not relevant for this case. Note that the latter is the one we are currently using in main for

normalize: CellposeCustomNormalizer = CellposeCustomNormalizer(),

The main difference is that the produced JSON Schemas are different. In the example below, I reproduce a minimal example with two function arguments (one always using default_factory, and the other using the same Model() option):

Python

from typing import Optional, Literal
from pydantic import BaseModel, Field


class Normalizer(BaseModel):
    """
    Description of Normalizer

    Attributes:
        type: Description of type
        time: Description of time
    """
    type: Literal["option_a", "option_b"] = "option_a"

class MyModelDefaultFactory(BaseModel):
    """
    Description of ChannelModel

    Attributes:
        normalizer: Description of normalizer.
        time: Description of time.
        attr: Description of attr1.
    """
    normalizer: Normalizer = Field(default_factory=Normalizer)
    time: float = Field(default_factory=time.perf_counter)
    attr: Optional[str] = None

class MyModelObjectInit(BaseModel):
    """
    Description of ChannelModel

    Attributes:
        normalizer: Description of normalizer.
        time: Description of time.
        attr: Description of attr1.
    """
    normalizer: Normalizer = Normalizer()
    time: float = time.perf_counter()
    attr: Optional[str] = None


@validate_arguments
def function(
        arg_default_factory: MyModelDefaultFactory = Field(default_factory=MyModelDefaultFactory),
        arg2: MyModelObjectInit = MyModelObjectInit(),
        ):
    """
    Description of function

    Args:
        arg_default_factory: Description of arg_default_factory
        arg2: Description of arg2
    """
    pass

which leads to

{
  "title": "Function",
  "type": "object",
  "properties": {
    "arg_default_factory": {
      "$ref": "#/definitions/MyModelDefaultFactory",
      "title": "Arg_Default_Factory",
      "description": "Description of arg_default_factory"
    },
    "arg2": {
      "title": "Arg2",
      "default": {
        "normalizer": {
          "type": "option_a"
        },
        "time": 24457.131491715,
        "attr": null
      },
      "allOf": [
        {
          "$ref": "#/definitions/MyModelObjectInit"
        }
      ],
      "description": "Description of arg2"
    }
  },
  "additionalProperties": false,
  "definitions": {
    "Normalizer": {
      "title": "Normalizer",
      "description": "Description of Normalizer",
      "type": "object",
      "properties": {
        "type": {
          "title": "Type",
          "default": "option_a",
          "enum": [
            "option_a",
            "option_b"
          ],
          "type": "string"
        }
      }
    },
    "MyModelDefaultFactory": {
      "title": "MyModelDefaultFactory",
      "description": "Description of ChannelModel",
      "type": "object",
      "properties": {
        "normalizer": {
          "$ref": "#/definitions/Normalizer",
          "title": "Normalizer"
        },
        "time": {
          "title": "Time",
          "type": "number"
        },
        "attr": {
          "title": "Attr",
          "type": "string"
        }
      }
    },
    "MyModelObjectInit": {
      "title": "MyModelObjectInit",
      "description": "Description of ChannelModel",
      "type": "object",
      "properties": {
        "normalizer": {
          "title": "Normalizer",
          "default": {
            "type": "option_a"
          },
          "allOf": [
            {
              "$ref": "#/definitions/Normalizer"
            }
          ]
        },
        "time": {
          "title": "Time",
          "default": 24457.131491715,
          "type": "number"
        },
        "attr": {
          "title": "Attr",
          "type": "string"
        }
      }
    }
  }
}

The main difference here is the presence of

 "default": {
        "normalizer": {
          "type": "option_a"
        },
        "time": 24457.131491715,
        "attr": null
      },

for arg2, while it is missing when we use default_factory.
Part of the explanation is that default_factory should not be filled upon schema generation (see e.g. pydantic/pydantic#1520), and the time attribute that I include above helps to show this behavior (it is present when it's called explicitly, but not when using a default_factory=time.perf_counter).

Does this matter for fractal-web?

In the fractal-web UI, this difference appears as irrelevant. If I don't modify any form field, current data are identical for the two cases (apart from the time attribute, but that is expected):
image

Does this matter for fractal-server?

For top-level arguments (that is, this does not concern nested attribute of complex arguments), the presence/absence of a default in the JSON Schema does matter in fractal-server. If a default value is present, then it is used as a basis during any insert-worfklowtask or patch-workflowtask operation.
Considering the example above:

  1. If we always go through fractal-web, there will always be a client-provided value (and therefore the default is not relevant).
  2. If we use a different client, then the usage based on default_factory will have no default in the arguments JSON file, and the default will be generated at runtime via the default_factory.

I cannot find any issue with either option, right now, but I wouldn't be surprised if we eventually identify some edge case of one or the other.

[continues with a TL;DR]

@tcompa
Copy link
Collaborator

tcompa commented Jun 10, 2024

TL;DR re: default_factory

  1. The advantage of using default_factory is that we are not using a mutable function argument (which is a bad practice, but irrelevant for the standard command-line usage of fractal tasks)
  2. The advantage of using arg: MyModel = MyModel() is that the default field in the JSON Schema is set, making it more transparent.
  3. The two cases appear in the same in fractal-web, for my dummy example and also for the current PR.
  4. I cannot think of an actual problematic edge case related to using one or the other.

If I were to choose, I would replace

channel2: CellposeChannel2InputModel = Field(default_factory=CellposeChannel2InputModel)

with

channel2: CellposeChannel2InputModel = CellposeChannel2InputModel()

which creates a more transparent JSON Schema that includes

-            "$ref": "#/definitions/CellposeChannel2InputModel",
             "title": "Channel2",
+            "default": {
+              "wavelength_id": null,
+              "label": null,
+              "normalize": {
+                "type": "default",
+                "lower_percentile": null,
+                "upper_percentile": null,
+                "lower_bound": null,
+                "upper_bound": null
+              }
+            },
+            "allOf": [
+              {
+                "$ref": "#/definitions/CellposeChannel2InputModel"
+              }
+            ],

but this is just personal preference and I have no specific reasons for it.

@tcompa
Copy link
Collaborator

tcompa commented Jun 10, 2024

This closes my feedback on the arguments/schema-related part of the PR.

I can still review the rest of the PR, if it's useful, but that will be a bit later.

@jluethi
Copy link
Collaborator

jluethi commented Jun 17, 2024

Thanks for the review & the reasoning on trade-offs @tcompa . We'll stay with this version for the time being and will evaluate later whether we'll have a strong opinion one way or the other for Fractal tasks.

@jluethi
Copy link
Collaborator

jluethi commented Jun 17, 2024

Thanks @fstur for starting this work, I'm now merging this PR so we have the updated version of the Cellpose task with easier input handling & normalize options for both channels available in main :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants