Use JSON schemas for WorkflowTask arguments #82

tcompa · 2023-03-29T11:55:55Z

An example of a WorkflowTask.args JSON schema could be:

Argument "arg1" of type "string
Argument "arg2" of type "list of strings"
Argument "arg3 of type list[Channel]

Where Channel is an "object of string->boolean key/value pairs, with keys "A", "B" and "C"".

Each argument comes with required=True/False

(even more complex: what about WorkflowTask.task.default_args?)

The text was updated successfully, but these errors were encountered:

rkpasia · 2023-04-04T13:48:43Z

An example of a schema could be the following, where for a given workflow task we have a property which is a list of objects.

Each object represents the definition of a workflow task argument. Arguments could be nested.

{
  "workflow_task_name": "workflow_task_name",
  "workflow_task_id": 1,
  "workflow_task_schema": [
    {
      "argument_name": "task_argument_name",
      "argument_type": "object",
      "argument_description": "description",
      "default_value": "default value, with respect to argument type",
      "is_required": true,
      "inner_argument": {
        "argument_name": "inner_argument_name",
        "argument_type": "name of type",
        "argument_description": null,
        "default_value": null,
        "is_required": true,
        "inner_argument": null
      }
    }
  ]
}

tcompa · 2023-04-19T10:19:37Z

Here (below) is a realistic example of what comes out of pydantic schema method (note: the method name changes when moving to pydantic 2). This is the raw output, that we could make available in fractal-server rather quickly.

If needed, I suggest that the logic for converting it towards a different kind of schema (like the one in #82 (comment)) is implemented as part of the the web client - to avoid relying on custom definitions (i.e. different from pydantic) in multiple places (tasks repository and web-client repository).

Example 1

Task arguments

class TaskArguments(BaseModel, extra=Extra.forbid):

    x: int = Field(description="This is the description of argument x")
    y: Optional[str]


print(json.dumps((TaskArguments.schema()), indent=2))

JSON schema:

{
  "title": "TaskArguments",
  "type": "object",
  "properties": {
    "x": {
      "title": "X",
      "description": "This is the description of argument x",
      "type": "integer"
    },
    "y": {
      "title": "Y",
      "type": "string"
    }
  },
  "required": [
    "x"
  ],
  "additionalProperties": false
}

Example 2

Task arguments:

class TaskArguments(BaseModel, extra=Extra.forbid):
    input_paths: Sequence[str]
    output_path: str
    metadata: Dict[str, Any]
    image_extension: str
    image_glob_patterns: Optional[list[str]]
    allowed_channels: Sequence[Dict[str, Any]]
    num_levels: Optional[int]
    coarsening_xy: Optional[int]
    metadata_table: Optional[str]


print(json.dumps((TaskArguments.schema()), indent=2))

JSON schema

{
  "title": "TaskArguments",
  "type": "object",
  "properties": {
    "input_paths": {
      "title": "Input Paths",
      "type": "array",
      "items": {
        "type": "string"
      }
    },
    "output_path": {
      "title": "Output Path",
      "type": "string"
    },
    "metadata": {
      "title": "Metadata",
      "type": "object"
    },
    "image_extension": {
      "title": "Image Extension",
      "type": "string"
    },
    "image_glob_patterns": {
      "title": "Image Glob Patterns",
      "type": "array",
      "items": {
        "type": "string"
      }
    },
    "allowed_channels": {
      "title": "Allowed Channels",
      "type": "array",
      "items": {
        "type": "object"
      }
    },
    "num_levels": {
      "title": "Num Levels",
      "type": "integer"
    },
    "coarsening_xy": {
      "title": "Coarsening Xy",
      "type": "integer"
    },
    "metadata_table": {
      "title": "Metadata Table",
      "type": "string"
    }
  },
  "required": [
    "input_paths",
    "output_path",
    "metadata",
    "image_extension",
    "allowed_channels"
  ],
  "additionalProperties": false
}

tcompa · 2023-04-19T10:23:21Z

Another example:

from pydantic import BaseModel
from pydantic import Extra
from typing import Optional
import json


class Channel(BaseModel):
    x: int
    y: Optional[int]


class TaskArguments(BaseModel, extra=Extra.forbid):
    channels: list[Channel]


print(json.dumps((TaskArguments.schema()), indent=2))

{
  "title": "TaskArguments",
  "type": "object",
  "properties": {
    "channels": {
      "title": "Channels",
      "type": "array",
      "items": {
        "$ref": "#/definitions/Channel"
      }
    }
  },
  "required": [
    "channels"
  ],
  "additionalProperties": false,
  "definitions": {
    "Channel": {
      "title": "Channel",
      "type": "object",
      "properties": {
        "x": {
          "title": "X",
          "type": "integer"
        },
        "y": {
          "title": "Y",
          "type": "integer"
        }
      },
      "required": [
        "x"
      ]
    }
  }
}

tcompa · 2023-04-20T10:42:25Z

Here are some first thoughts, with @rkpasia. A lot more will come up during implementation, and also we will open more specific issues.

If a task has not args schema (that is, it has null value), then the current behavior is expected (arguments are to be added/edited one by one). This "freestyle-form" code path (for the moment) could diverge in style&features from the one based on schemas, and we can decide later on whether to merge them into a single place.
If a schema comes with N arguments, they should always all be visible in the WorkflowTask page.
If an argument is optional (which is known, from the schema), the user can "uncheck" it (e.g. via a tick on its right). Note that there are multiple options of how this could work (ref also Deleting Workflow Task properties after adding them fractal-server#629), let's discuss it somewhere else. High-level: optional arguments are not required, obviously.
A user cannot add an argument, when a JSON schema is present. The rationale for this is that the schema is considered as the ground truth for which arguments are present, and any other argument would simply be ignored by the task.
To be clarified: what does the user see when they create a WorkflowTask? For sure they should see the N arguments, but what are the defaults?
When available, the "description" field of an argument is visible (either via an "info" button, or overlay message, or something else).
The first step towards this feature should concern with scalar arguments (that is, no objects and no arrays). This is because objects and arrays are slightly more complex to handle (we should be able to add/remove elements, since their number is not defined).

Something a bit more general:

What is the format of the argument schema? Right now we will go with whatever comes out of pydantic 1, which is "compliant with the specifications: JSON Schema Core, JSON Schema Validation and OpenAPI." Longer term, we may have to specify a subset of one of those broader standard definitions that is supported for fractal-web.

tcompa · 2023-04-20T10:55:57Z

Here is the first example we should address (note: priority is obviously for scalars). Possible additional complexity would come from defining custom types, as in #82 (comment).

Note: Field also accepts several other arguments, see https://docs.pydantic.dev/usage/schema/#field-customization. I'd rather use as few as possible, but let's check whether some are relevant for us.

Pydantic:

from pydantic import BaseModel
from pydantic import Extra
from pydantic import Field
from typing import Optional
import json


class TaskArguments(BaseModel, extra=Extra.forbid):

    i1: int
    i2: int = 1 
    i3: int = Field(description="Description of i3")
    i4: int = Field(examples=["i4=8"])
    i5: Optional[int] = None

    f1: float
    f2: float = 0.5

    b1: bool
    b2: bool = Field(
            description="Description of b2",
            default=True,
            title="b2 argument",
            )
    b3: Optional[bool]

    a1: list
    a2: list[int]
    a3: list[list[int]] = Field(default=[[1, 2], [3, 4]])

    o1: dict
    o2: dict[str, int]
    o3: dict[str, list[int]]


print(json.dumps((TaskArguments.schema()), indent=2))

JSON:

{
  "title": "TaskArguments",
  "type": "object",
  "properties": {
    "i1": {
      "title": "I1",
      "type": "integer"
    },
    "i2": {
      "title": "I2",
      "default": 1,
      "type": "integer"
    },
    "i3": {
      "title": "I3",
      "description": "Description of i3",
      "type": "integer"
    },
    "i4": {
      "title": "I4",
      "examples": [
        "i4=8"
      ],
      "type": "integer"
    },
    "i5": {
      "title": "I5",
      "type": "integer"
    },
    "f1": {
      "title": "F1",
      "type": "number"
    },
    "f2": {
      "title": "F2",
      "default": 0.5,
      "type": "number"
    },
    "b1": {
      "title": "B1",
      "type": "boolean"
    },
    "b2": {
      "title": "b2 argument",
      "description": "Description of b2",
      "default": true,
      "type": "boolean"
    },
    "b3": {
      "title": "B3",
      "type": "boolean"
    },
    "a1": {
      "title": "A1",
      "type": "array",
      "items": {}
    },
    "a2": {
      "title": "A2",
      "type": "array",
      "items": {
        "type": "integer"
      }
    },
    "a3": {
      "title": "A3",
      "default": [
        [
          1,
          2
        ],
        [
          3,
          4
        ]
      ],
      "type": "array",
      "items": {
        "type": "array",
        "items": {
          "type": "integer"
        }
      }
    },
    "o1": {
      "title": "O1",
      "type": "object"
    },
    "o2": {
      "title": "O2",
      "type": "object",
      "additionalProperties": {
        "type": "integer"
      }
    },
    "o3": {
      "title": "O3",
      "type": "object",
      "additionalProperties": {
        "type": "array",
        "items": {
          "type": "integer"
        }
      }
    }
  },
  "required": [
    "i1",
    "i3",
    "i4",
    "f1",
    "b1",
    "a1",
    "a2",
    "o1",
    "o2",
    "o3"
  ],
  "additionalProperties": false
}

tcompa · 2023-04-20T11:00:53Z

A user cannot add an argument, when a JSON schema is present. The rationale for this is that the schema is considered as the ground truth for which arguments are present, and any other argument would simply be ignored by the task.

The only use case where this rule would not be appropriate is (as far as I can tell) the following:

I am developing a new task, where I do not use pydantic for argument validation (*). I still want to have an argument schema, so I write it from scratch (since I don't have a pydantic model to export). I include arguments A, B and C, but later on during development I want to include another argument D. In principle I would have to modify the schema, and calling the task-edit endpoint (go to the task page, find my own task, click "edit", send the new schema), but maybe I'd rather just add a new argument from within the WorkflowTask editor, without changing the schema.

To be honest, this seems a sufficiently edge case that we should not support it, at least for the moment.

(*)
Note that if I already use pydantic for argument validation then it make no sense to add an argument which is not part of the schema, because it will be ignored during validation.

jluethi · 2023-04-20T12:59:54Z

If a task has not args schema (that is, it has null value), then the current behavior is expected (arguments are to be added/edited one by one)

Agreed.

If a schema comes with N arguments, they should always all be visible in the WorkflowTask page.

We can start that way. Would make sense to focus on the required arguments and eventually have a way to define what are "advanced" arguments. For example, the cellpose task has many potential arguments that users will very rarely change, but they always need to set the level.

If an argument is optional (which is known, from the schema), the user can "uncheck" it (e.g. via a tick on its right).

I'd prefer if optional arguments just have some default and None is a valid default for them. Adding a second complexity level of whether or not each argument is active sounds cumbersome.

A user cannot add an argument, when a JSON schema is present.

I had some reservations about this, but your arguments are convincing. At least for the top level.
For the level below, e.g. adding elements to a list, that needs to be allowed (though potentially restricted in what can be added). For example, a user needs to be able to add channels in the Create OME-Zarr task. => see point 7

To be clarified: what does the user see when they create a WorkflowTask? For sure they should see the N arguments, but what are the defaults?

I think we should move the defaults also into the pydantic schemes and optional parameters should have a default. If no default is provided, it defaults to None => empty box. None needs to be a valid value then for optional arguments, but not for required arguments.. And all defaults are shown.

When available, the "description" field of an argument is visible (either via an "info" button, or overlay message, or something else).

Agreed. Not sure about the best user-interface for this. It could be a description that is always shown. But an info button is a good start.

Additional complexities to worry about later:
a) We have multiple ways to specify channels, either via wavelength_ID or label. But the user can only use 1, not both. Eventually, would be cool to have that present in the interface. But doesn't need to be in v0.4.
b) Hiding advanced arguments (and having something like advanced arguments)

jluethi · 2023-04-20T13:03:36Z

I am developing a new task, where I do not use pydantic for argument validation (*). I still want to have an argument schema, so I write it from scratch (since I don't have a pydantic model to export). I include arguments A, B and C, but later on during development I want to include another argument D. In principle I would have to modify the schema, and calling the task-edit endpoint (go to the task page, find my own task, click "edit", send the new schema), but maybe I'd rather just add a new argument from within the WorkflowTask editor, without changing the schema.

To be honest, this seems a sufficiently edge case that we should not support it, at least for the moment.

Many changes to task registration do sound cumbersome though, so I can imagine we'd eventually want to support this. But clearly not a priority as long as there are the 2 ways of having defined parameters by a schema or being able to add them fresh.

When we tackle 7 (adding lists, object content), it automatically means allowing addition of some arguments in the schema example (though limited), so that seems to be the point for generalization of these 2 approaches to me

tcompa · 2023-05-24T06:48:24Z

Reference about pydantic v1 vs v2:

JSON Schema version and pydantic v2 fractal-tasks-core#375

tcompa · 2023-06-26T10:27:58Z

This is obviously complete, at least in its first version. Closing.

tcompa added this to the v0.x milestone Mar 29, 2023

tcompa modified the milestones: v0.4, v0.5 Apr 5, 2023

jluethi modified the milestones: v0.5, v0.4 Apr 19, 2023

jluethi added the 0.4 priority label Apr 19, 2023

This was referenced Apr 20, 2023

Export arguments JSON schemas somewhere fractal-analytics-platform/fractal-tasks-core#353

Closed

Use task-arguments JSON schemas #79

Closed

tcompa changed the title ~~[exploration] Test visualization of an args JSON schema~~ Use JSON schemas for WorkflowTask arguments Apr 20, 2023

This was referenced Apr 21, 2023

Default WorkflowTask arguments #127

Closed

Adding new arguments while editing a JSON-schema-based WorkflowTask #128

Closed

Populate WorkflowTask.args based on defaults in WorkflowTask.task.args_schema fractal-analytics-platform/fractal-server#639

Closed

tcompa removed the 0.6 priority label May 17, 2023

rkpasia mentioned this issue May 25, 2023

Definition of a Svelte component structure to handle a JSON Schema #154

Closed

7 tasks

tcompa closed this as completed Jun 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use JSON schemas for WorkflowTask arguments #82

Use JSON schemas for WorkflowTask arguments #82

tcompa commented Mar 29, 2023

rkpasia commented Apr 4, 2023

tcompa commented Apr 19, 2023

tcompa commented Apr 19, 2023

tcompa commented Apr 20, 2023 •

edited

Loading

tcompa commented Apr 20, 2023

tcompa commented Apr 20, 2023 •

edited

Loading

jluethi commented Apr 20, 2023

jluethi commented Apr 20, 2023

tcompa commented May 24, 2023

tcompa commented Jun 26, 2023

Use JSON schemas for WorkflowTask arguments #82

Use JSON schemas for WorkflowTask arguments #82

Comments

tcompa commented Mar 29, 2023

rkpasia commented Apr 4, 2023

tcompa commented Apr 19, 2023

Example 1

Example 2

tcompa commented Apr 19, 2023

tcompa commented Apr 20, 2023 • edited Loading

tcompa commented Apr 20, 2023

tcompa commented Apr 20, 2023 • edited Loading

jluethi commented Apr 20, 2023

jluethi commented Apr 20, 2023

tcompa commented May 24, 2023

tcompa commented Jun 26, 2023

tcompa commented Apr 20, 2023 •

edited

Loading

tcompa commented Apr 20, 2023 •

edited

Loading