Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use JSON schemas for WorkflowTask arguments #82

Closed
Tracked by #154
tcompa opened this issue Mar 29, 2023 · 10 comments
Closed
Tracked by #154

Use JSON schemas for WorkflowTask arguments #82

tcompa opened this issue Mar 29, 2023 · 10 comments

Comments

@tcompa
Copy link
Collaborator

tcompa commented Mar 29, 2023

An example of a WorkflowTask.args JSON schema could be:

  • Argument "arg1" of type "string
  • Argument "arg2" of type "list of strings"
  • Argument "arg3 of type list[Channel]

Where Channel is an "object of string->boolean key/value pairs, with keys "A", "B" and "C"".

Each argument comes with required=True/False


(even more complex: what about WorkflowTask.task.default_args?)

@tcompa tcompa added this to the v0.x milestone Mar 29, 2023
@rkpasia
Copy link
Contributor

rkpasia commented Apr 4, 2023

An example of a schema could be the following, where for a given workflow task we have a property which is a list of objects.

Each object represents the definition of a workflow task argument. Arguments could be nested.

{
  "workflow_task_name": "workflow_task_name",
  "workflow_task_id": 1,
  "workflow_task_schema": [
    {
      "argument_name": "task_argument_name",
      "argument_type": "object",
      "argument_description": "description",
      "default_value": "default value, with respect to argument type",
      "is_required": true,
      "inner_argument": {
        "argument_name": "inner_argument_name",
        "argument_type": "name of type",
        "argument_description": null,
        "default_value": null,
        "is_required": true,
        "inner_argument": null
      }
    }
  ]
}

@tcompa tcompa modified the milestones: v0.4, v0.5 Apr 5, 2023
@tcompa
Copy link
Collaborator Author

tcompa commented Apr 19, 2023

Here (below) is a realistic example of what comes out of pydantic schema method (note: the method name changes when moving to pydantic 2). This is the raw output, that we could make available in fractal-server rather quickly.

If needed, I suggest that the logic for converting it towards a different kind of schema (like the one in #82 (comment)) is implemented as part of the the web client - to avoid relying on custom definitions (i.e. different from pydantic) in multiple places (tasks repository and web-client repository).

Example 1

Task arguments

class TaskArguments(BaseModel, extra=Extra.forbid):

    x: int = Field(description="This is the description of argument x")
    y: Optional[str]


print(json.dumps((TaskArguments.schema()), indent=2))

JSON schema:

{
  "title": "TaskArguments",
  "type": "object",
  "properties": {
    "x": {
      "title": "X",
      "description": "This is the description of argument x",
      "type": "integer"
    },
    "y": {
      "title": "Y",
      "type": "string"
    }
  },
  "required": [
    "x"
  ],
  "additionalProperties": false
}

Example 2

Task arguments:

class TaskArguments(BaseModel, extra=Extra.forbid):
    input_paths: Sequence[str]
    output_path: str
    metadata: Dict[str, Any]
    image_extension: str
    image_glob_patterns: Optional[list[str]]
    allowed_channels: Sequence[Dict[str, Any]]
    num_levels: Optional[int]
    coarsening_xy: Optional[int]
    metadata_table: Optional[str]


print(json.dumps((TaskArguments.schema()), indent=2))

JSON schema

{
  "title": "TaskArguments",
  "type": "object",
  "properties": {
    "input_paths": {
      "title": "Input Paths",
      "type": "array",
      "items": {
        "type": "string"
      }
    },
    "output_path": {
      "title": "Output Path",
      "type": "string"
    },
    "metadata": {
      "title": "Metadata",
      "type": "object"
    },
    "image_extension": {
      "title": "Image Extension",
      "type": "string"
    },
    "image_glob_patterns": {
      "title": "Image Glob Patterns",
      "type": "array",
      "items": {
        "type": "string"
      }
    },
    "allowed_channels": {
      "title": "Allowed Channels",
      "type": "array",
      "items": {
        "type": "object"
      }
    },
    "num_levels": {
      "title": "Num Levels",
      "type": "integer"
    },
    "coarsening_xy": {
      "title": "Coarsening Xy",
      "type": "integer"
    },
    "metadata_table": {
      "title": "Metadata Table",
      "type": "string"
    }
  },
  "required": [
    "input_paths",
    "output_path",
    "metadata",
    "image_extension",
    "allowed_channels"
  ],
  "additionalProperties": false
}

@tcompa
Copy link
Collaborator Author

tcompa commented Apr 19, 2023

Another example:

from pydantic import BaseModel
from pydantic import Extra
from typing import Optional
import json


class Channel(BaseModel):
    x: int
    y: Optional[int]


class TaskArguments(BaseModel, extra=Extra.forbid):
    channels: list[Channel]


print(json.dumps((TaskArguments.schema()), indent=2))
{
  "title": "TaskArguments",
  "type": "object",
  "properties": {
    "channels": {
      "title": "Channels",
      "type": "array",
      "items": {
        "$ref": "#/definitions/Channel"
      }
    }
  },
  "required": [
    "channels"
  ],
  "additionalProperties": false,
  "definitions": {
    "Channel": {
      "title": "Channel",
      "type": "object",
      "properties": {
        "x": {
          "title": "X",
          "type": "integer"
        },
        "y": {
          "title": "Y",
          "type": "integer"
        }
      },
      "required": [
        "x"
      ]
    }
  }
}

@jluethi jluethi modified the milestones: v0.5, v0.4 Apr 19, 2023
@tcompa tcompa changed the title [exploration] Test visualization of an args JSON schema Use JSON schemas for WorkflowTask arguments Apr 20, 2023
@tcompa
Copy link
Collaborator Author

tcompa commented Apr 20, 2023

Here are some first thoughts, with @rkpasia. A lot more will come up during implementation, and also we will open more specific issues.

  1. If a task has not args schema (that is, it has null value), then the current behavior is expected (arguments are to be added/edited one by one). This "freestyle-form" code path (for the moment) could diverge in style&features from the one based on schemas, and we can decide later on whether to merge them into a single place.
  2. If a schema comes with N arguments, they should always all be visible in the WorkflowTask page.
  3. If an argument is optional (which is known, from the schema), the user can "uncheck" it (e.g. via a tick on its right). Note that there are multiple options of how this could work (ref also Deleting Workflow Task properties after adding them fractal-server#629), let's discuss it somewhere else. High-level: optional arguments are not required, obviously.
  4. A user cannot add an argument, when a JSON schema is present. The rationale for this is that the schema is considered as the ground truth for which arguments are present, and any other argument would simply be ignored by the task.
  5. To be clarified: what does the user see when they create a WorkflowTask? For sure they should see the N arguments, but what are the defaults?
  6. When available, the "description" field of an argument is visible (either via an "info" button, or overlay message, or something else).
  7. The first step towards this feature should concern with scalar arguments (that is, no objects and no arrays). This is because objects and arrays are slightly more complex to handle (we should be able to add/remove elements, since their number is not defined).

Something a bit more general:

  1. What is the format of the argument schema? Right now we will go with whatever comes out of pydantic 1, which is "compliant with the specifications: JSON Schema Core, JSON Schema Validation and OpenAPI." Longer term, we may have to specify a subset of one of those broader standard definitions that is supported for fractal-web.

@tcompa
Copy link
Collaborator Author

tcompa commented Apr 20, 2023

Here is the first example we should address (note: priority is obviously for scalars). Possible additional complexity would come from defining custom types, as in #82 (comment).

Note: Field also accepts several other arguments, see https://docs.pydantic.dev/usage/schema/#field-customization. I'd rather use as few as possible, but let's check whether some are relevant for us.


Pydantic:

from pydantic import BaseModel
from pydantic import Extra
from pydantic import Field
from typing import Optional
import json


class TaskArguments(BaseModel, extra=Extra.forbid):

    i1: int
    i2: int = 1 
    i3: int = Field(description="Description of i3")
    i4: int = Field(examples=["i4=8"])
    i5: Optional[int] = None

    f1: float
    f2: float = 0.5

    b1: bool
    b2: bool = Field(
            description="Description of b2",
            default=True,
            title="b2 argument",
            )
    b3: Optional[bool]

    a1: list
    a2: list[int]
    a3: list[list[int]] = Field(default=[[1, 2], [3, 4]])

    o1: dict
    o2: dict[str, int]
    o3: dict[str, list[int]]


print(json.dumps((TaskArguments.schema()), indent=2))

JSON:

{
  "title": "TaskArguments",
  "type": "object",
  "properties": {
    "i1": {
      "title": "I1",
      "type": "integer"
    },
    "i2": {
      "title": "I2",
      "default": 1,
      "type": "integer"
    },
    "i3": {
      "title": "I3",
      "description": "Description of i3",
      "type": "integer"
    },
    "i4": {
      "title": "I4",
      "examples": [
        "i4=8"
      ],
      "type": "integer"
    },
    "i5": {
      "title": "I5",
      "type": "integer"
    },
    "f1": {
      "title": "F1",
      "type": "number"
    },
    "f2": {
      "title": "F2",
      "default": 0.5,
      "type": "number"
    },
    "b1": {
      "title": "B1",
      "type": "boolean"
    },
    "b2": {
      "title": "b2 argument",
      "description": "Description of b2",
      "default": true,
      "type": "boolean"
    },
    "b3": {
      "title": "B3",
      "type": "boolean"
    },
    "a1": {
      "title": "A1",
      "type": "array",
      "items": {}
    },
    "a2": {
      "title": "A2",
      "type": "array",
      "items": {
        "type": "integer"
      }
    },
    "a3": {
      "title": "A3",
      "default": [
        [
          1,
          2
        ],
        [
          3,
          4
        ]
      ],
      "type": "array",
      "items": {
        "type": "array",
        "items": {
          "type": "integer"
        }
      }
    },
    "o1": {
      "title": "O1",
      "type": "object"
    },
    "o2": {
      "title": "O2",
      "type": "object",
      "additionalProperties": {
        "type": "integer"
      }
    },
    "o3": {
      "title": "O3",
      "type": "object",
      "additionalProperties": {
        "type": "array",
        "items": {
          "type": "integer"
        }
      }
    }
  },
  "required": [
    "i1",
    "i3",
    "i4",
    "f1",
    "b1",
    "a1",
    "a2",
    "o1",
    "o2",
    "o3"
  ],
  "additionalProperties": false
}

@tcompa
Copy link
Collaborator Author

tcompa commented Apr 20, 2023

A user cannot add an argument, when a JSON schema is present. The rationale for this is that the schema is considered as the ground truth for which arguments are present, and any other argument would simply be ignored by the task.

The only use case where this rule would not be appropriate is (as far as I can tell) the following:

I am developing a new task, where I do not use pydantic for argument validation (*). I still want to have an argument schema, so I write it from scratch (since I don't have a pydantic model to export). I include arguments A, B and C, but later on during development I want to include another argument D. In principle I would have to modify the schema, and calling the task-edit endpoint (go to the task page, find my own task, click "edit", send the new schema), but maybe I'd rather just add a new argument from within the WorkflowTask editor, without changing the schema.

To be honest, this seems a sufficiently edge case that we should not support it, at least for the moment.

(*)
Note that if I already use pydantic for argument validation then it make no sense to add an argument which is not part of the schema, because it will be ignored during validation.

@jluethi
Copy link
Collaborator

jluethi commented Apr 20, 2023

  1. If a task has not args schema (that is, it has null value), then the current behavior is expected (arguments are to be added/edited one by one)

Agreed.

  1. If a schema comes with N arguments, they should always all be visible in the WorkflowTask page.

We can start that way. Would make sense to focus on the required arguments and eventually have a way to define what are "advanced" arguments. For example, the cellpose task has many potential arguments that users will very rarely change, but they always need to set the level.

  1. If an argument is optional (which is known, from the schema), the user can "uncheck" it (e.g. via a tick on its right).

I'd prefer if optional arguments just have some default and None is a valid default for them. Adding a second complexity level of whether or not each argument is active sounds cumbersome.

  1. A user cannot add an argument, when a JSON schema is present.

I had some reservations about this, but your arguments are convincing. At least for the top level.
For the level below, e.g. adding elements to a list, that needs to be allowed (though potentially restricted in what can be added). For example, a user needs to be able to add channels in the Create OME-Zarr task. => see point 7

  1. To be clarified: what does the user see when they create a WorkflowTask? For sure they should see the N arguments, but what are the defaults?

I think we should move the defaults also into the pydantic schemes and optional parameters should have a default. If no default is provided, it defaults to None => empty box. None needs to be a valid value then for optional arguments, but not for required arguments.. And all defaults are shown.

  1. When available, the "description" field of an argument is visible (either via an "info" button, or overlay message, or something else).

Agreed. Not sure about the best user-interface for this. It could be a description that is always shown. But an info button is a good start.

Additional complexities to worry about later:
a) We have multiple ways to specify channels, either via wavelength_ID or label. But the user can only use 1, not both. Eventually, would be cool to have that present in the interface. But doesn't need to be in v0.4.
b) Hiding advanced arguments (and having something like advanced arguments)

@jluethi
Copy link
Collaborator

jluethi commented Apr 20, 2023

I am developing a new task, where I do not use pydantic for argument validation (*). I still want to have an argument schema, so I write it from scratch (since I don't have a pydantic model to export). I include arguments A, B and C, but later on during development I want to include another argument D. In principle I would have to modify the schema, and calling the task-edit endpoint (go to the task page, find my own task, click "edit", send the new schema), but maybe I'd rather just add a new argument from within the WorkflowTask editor, without changing the schema.

To be honest, this seems a sufficiently edge case that we should not support it, at least for the moment.

Many changes to task registration do sound cumbersome though, so I can imagine we'd eventually want to support this. But clearly not a priority as long as there are the 2 ways of having defined parameters by a schema or being able to add them fresh.

When we tackle 7 (adding lists, object content), it automatically means allowing addition of some arguments in the schema example (though limited), so that seems to be the point for generalization of these 2 approaches to me

@tcompa
Copy link
Collaborator Author

tcompa commented May 24, 2023

Reference about pydantic v1 vs v2:

@tcompa
Copy link
Collaborator Author

tcompa commented Jun 26, 2023

This is obviously complete, at least in its first version. Closing.

@tcompa tcompa closed this as completed Jun 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants