Skip to content

Commit

Permalink
docs: document schema validation order
Browse files Browse the repository at this point in the history
Signed-off-by: Matthias Pichler <m.pichler@warrify.com>
  • Loading branch information
matthias-pichler committed Aug 21, 2024
1 parent feb9ca9 commit 46f6dfb
Show file tree
Hide file tree
Showing 2 changed files with 75 additions and 35 deletions.
32 changes: 18 additions & 14 deletions dsl-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -1494,17 +1494,17 @@ Represents the definition of the parameters that control the randomness or varia

### Input

Documents the structure - and optionally configures the filtering of - workflow/task input data.
Documents the structure - and optionally configures the transformation of - workflow/task input data.

It's crucial for authors to document the schema of input data whenever feasible. This documentation empowers consuming applications to provide contextual auto-suggestions when handling runtime expressions.

When set, runtimes must validate input data against the defined schema, unless defined otherwise.
When set, runtimes must validate raw input data against the defined schema before applying transformations, unless defined otherwise.

#### Properties

| Property | Type | Required | Description |
|----------|:----:|:--------:|-------------|
| schema | [`schema`](#schema) | `no` | The [`schema`](#schema) used to describe and validate input data.<br>*Even though the schema is not required, it is strongly encouraged to document it, whenever feasible.* |
| schema | [`schema`](#schema) | `no` | The [`schema`](#schema) used to describe and validate raw input data.<br>*Even though the schema is not required, it is strongly encouraged to document it, whenever feasible.* |
| from | `string`<br>`object` | `no` | A [runtime expression](dsl.md#runtime-expressions), if any, used to filter and/or mutate the workflow/task input. |

#### Examples
Expand All @@ -1515,9 +1515,16 @@ schema:
document:
type: object
properties:
petId:
type: string
required: [ petId ]
order:
type: object
required: [ pet ]
properties:
pet:
type: object
required: [ id ]
properties:
id:
type: string
from: .order.pet
```

Expand All @@ -1527,7 +1534,7 @@ Documents the structure - and optionally configures the transformations of - wor

It's crucial for authors to document the schema of output data whenever feasible. This documentation empowers consuming applications to provide contextual auto-suggestions when handling runtime expressions.

When set, runtimes must validate output data against the defined schema, unless defined otherwise.
When set, runtimes must validate output data against the defined schema after applying transformations, unless defined otherwise.

#### Properties

Expand All @@ -1550,16 +1557,13 @@ output:
required: [ petId ]
as:
petId: '${ .pet.id }'
export:
as:
'.petList += [ $task.output ]'
```

### Export

Certain task needs to set the workflow context to save the task output for later usage. Users set the content of the context through a runtime expression. The result of the expression is the new value of the context. The expression is evaluated against the existing context.
Certain task needs to set the workflow context to save the task output for later usage. Users set the content of the context through a runtime expression. The result of the expression is the new value of the context. The expression is evaluated against the transformed task output.

Optionally, the context might have an associated schema.
Optionally, the context might have an associated schema which is validated against the result of the expression.

#### Properties

Expand All @@ -1573,13 +1577,13 @@ Optionally, the context might have an associated schema.
Merge the task output into the current context.

```yaml
as: '.+$output'
as: '$context+.'
```

Replace the context with the task output.

```yaml
as: $output
as: '.'
```

### Schema
Expand Down
78 changes: 57 additions & 21 deletions dsl.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,98 +162,134 @@ Once the task has been executed, different things can happen:

### Data Flow

In Serverless Workflow DSL, data flow management is crucial to ensure that the right data is passed between tasks and to the workflow itself.
In Serverless Workflow DSL, data flow management is crucial to ensure that the right data is passed between tasks and to the workflow itself.

Here's how data flows through a workflow based on various transformation stages:

1. **Transform Workflow Input**
1. **Validate Workflow Input**
Before the workflow starts, the input data provided to the workflow can be validated against the `input.schema` property to ensure it conforms to the expected structure.
The execution only proceeds if the input is valid otherwise it will fault with a [ValidationError (https://serverlessworkflow.io/spec/1.0.0/errors/validation)](dsl-reference.md#error).

2. **Transform Workflow Input**
Before the workflow starts, the input data provided to the workflow can be transformed to ensure only relevant data in the expected format is passed into the workflow context. This can be done using the top level `input.from` expression. It evaluates on the raw workflow input and defaults to the identity expression which leaves the input unchanged. This step allows the workflow to start with a clean and focused dataset, reducing potential overhead and complexity in subsequent tasks. The result of this expression will set as the initial value for the `$context` runtime expression argument and be passed to the first task.

*Example: If the workflow receives a JSON object as input, a transformation can be applied to remove unnecessary fields and retain only those that are required for the workflow's execution.*

2. **Transform First Task Input**
The input data for the first task can be transformed to match the specific requirements of that task. This ensures that the first task receives only the data required to perform its operations. This can be done using the task's `input.from` expression. It evaluates the transformed workflow input and defaults to the identity expression, which leaves the input unchanged. The result of this expression will be set as the `$input` runtime expression argument and be passed to the task. This transformed input will be evaluated against any runtime expressions used within the task definition.
After workflow input validation and transformation, the transformed input is passed as the raw input to the first task.

3. **Validate Task Input**
Before a task executes, its raw input can be validated against the `input.schema` property to ensure it conforms to the expected structure.
The execution only proceeds if the input is valid otherwise it will fault with a [ValidationError (https://serverlessworkflow.io/spec/1.0.0/errors/validation)](dsl-reference.md#error).

*Example: If the first task is a function call that only needs a subset of the workflow input, a transformation can be applied to provide only those fields needed for the function to execute.*
4. **Transform Task Input**
The input data for the task can be transformed to match the specific requirements of that task. This ensures that the task receives only the data required to perform its operations. This can be done using the task's `input.from` expression. It evaluates on the raw task input (i.e. the transformed workflow input for the first task or the transformed output of the previous task) and defaults to the identity expression, which leaves the input unchanged. The result of this expression will be set as the `$input` runtime expression argument and be passed to the task. This transformed input will be evaluated against any runtime expressions used within the task definition.

3. **Transform First Task Output**
After completing the first task, its output can be transformed before passing it to the next task or storing it in the workflow context. Transformations are applied using the `output.as` runtime expression. It evaluates the raw task output and defaults to the identity expression, which leaves the output unchanged. Its result will be input for the next task. To update the context, one uses the `export.as` runtime expression. It evaluates the raw output and defaults to the expression that returns the existing context. The result of this runtime expression replaces the workflow's current context and the content of the `$context` runtime expression argument. This helps manage the data flow and keep the context clean by removing any unnecessary data produced by the task.
*Example: If the task is a function call that only needs a subset of the workflow input, a transformation can be applied to provide only those fields needed for the function to execute.*

*Example: If the first task returns a large dataset, a transformation can be applied to retain only the relevant results needed for subsequent tasks.*
5. **Transform Task Output**
After completing the task, its output can be transformed before passing it to the next task or storing it in the workflow context. Transformations are applied using the `output.as` runtime expression. It evaluates on the raw task output and defaults to the identity expression, which leaves the output unchanged. Its result will be input for the next task.

4. **Transform Last Task Input**
Before the last task in the workflow executes, its input data can be transformed to ensure it receives only the necessary information. This can be done using the task's `input.from` expression. It evaluates the transformed workflow input and defaults to the identity expression, which leaves the input unchanged. The result of this expression will be set as the `$input` runtime expression argument and be passed to the task. This transformed input will be evaluated against any runtime expressions used within the task definition. This step is crucial for ensuring the final task has all the required data to complete the workflow successfully.
*Example: If the task returns a large dataset, a transformation can be applied to retain only the relevant results needed for subsequent tasks.*

*Example: If the last task involves generating a report, the input transformation can ensure that only the data required for the report generation is passed to the task.*
6. **Validate Task Output**
After `output.as` is evaluated, the transformed task output is validated against the `output.schema` property to ensure it conforms to the expected structure. The execution only proceeds if the output is valid otherwise it will fault with a [ValidationError (https://serverlessworkflow.io/spec/1.0.0/errors/validation)](dsl-reference.md#error).

5. **Transform Last Task Output**
After the last task completes, its output can be transformed before it is considered the workflow output. Transformations are applied using the `output.as` runtime expression. It evaluates the raw task output and defaults to the identity expression, which leaves the output unchanged. Its result will be passed to the workflow `output.as` runtime expression. This ensures that the workflow produces a clean and relevant output, free from any extraneous data that might have been generated during the task execution.
7. **Update Workflow Context**
To update the context, one uses the `export.as` runtime expression. It evaluates on the transformed task output and defaults to the expression that returns the existing context. The result of this runtime expression replaces the workflow's current context and the content of the `$context` runtime expression argument. This helps manage the data flow and keep the context clean by removing any unnecessary data produced by the task.

*Example: If the last task outputs various statistics, a transformation can be applied to retain only the key metrics that are relevant to the stakeholders.*
8. **Validate Exported Context**
After the context is updated, the exported context is validated against the `export.schema` property to ensure it conforms to the expected structure. The execution only proceeds if the exported context is valid otherwise it will fault with a [ValidationError (https://serverlessworkflow.io/spec/1.0.0/errors/validation)](dsl-reference.md#error).

6. **Transform Workflow Output**
Finally, the overall workflow output can be transformed before it is returned to the caller or stored. Transformations are applied using the `output.as` runtime expression. It evaluates the last task's output and defaults to the identity expression, which leaves the output unchanged. This step ensures that the final output of the workflow is concise and relevant, containing only the necessary information that needs to be communicated or recorded.
9. **Continue Workflow**
After the context is updated, the workflow continues to the next task in the sequence. The transformed output of the previous task is passed as the raw input to the next task, and the data flow cycle repeats.
If no more tasks are defined, the task's transformed output is passed to the workflow output transformation step.

10. **Transform Workflow Output**
Finally, the overall workflow output can be transformed before it is returned to the caller or stored. Transformations are applied using the `output.as` runtime expression. It evaluates on the last task's transformed output and defaults to the identity expression, which leaves the output unchanged. This step ensures that the final output of the workflow is concise and relevant, containing only the necessary information that needs to be communicated or recorded.

*Example: If the workflow's final output is a summary report, a transformation can ensure that the report contains only the most important summaries and conclusions, excluding any intermediate data.*

11. **Validate Workflow Output**
After `output.as` is evaluated, the transformed workflow output is validated against the `output.schema` property to ensure it conforms to the expected structure. The execution only proceeds if the output is valid otherwise it will fault with a [ValidationError (https://serverlessworkflow.io/spec/1.0.0/errors/validation)](dsl-reference.md#error).

By applying transformations at these strategic points, Serverless Workflow DSL ensures that data flows through the workflow in a controlled and efficient manner, maintaining clarity and relevance at each execution stage. This approach helps manage complex workflows and ensures that each task operates with the precise data required, leading to more predictable and reliable workflow outcomes.

Visually, this can be represented as follows:

```mermaid
flowchart TD
subgraph Legend
legend_data{{Data}}
legend_schema[\Schema/]
legend_transformation[Transformation]
legend_arg([Runtime Argument])
end
initial_context_arg([<code>$context</code>])
context_arg([<code>$context</code>])
input_arg([<code>$input</code>])
output_arg([<code>$output</code>])
workflow_raw_input{{Raw Workflow Input}}
workflow_input_schema[\Workflow: <code>input.schema</code>/]
workflow_input_from[Workflow: <code>input.from</code>]
workflow_transformed_input{{Transformed Workflow Input}}
task_raw_input{{Raw Task Input}}
task_input_schema[\Task: <code>input.schema</code>/]
task_input_from[Task: <code>input.from</code>]
task_transformed_input{{Transformed Task Input}}
task_definition[Task definition]
task_raw_output{{Raw Task output}}
task_output_as[Task: <code>output.as</code>]
task_transformed_output{{Transformed Task output}}
task_output_schema[\Task: <code>output.schema</code>/]
task_export_as[Task: <code>export.as</code>]
task_export_schema[\Task: <code>export.schema</code>/]
new_context{{New execution context}}
workflow_raw_output{{Raw Workflow Output}}
workflow_output_as[Workflow: <code>output.as</code>]
workflow_transformed_output{{Transformed Workflow Output}}
workflow_output_schema[\Workflow: <code>output.schema</code>/]
workflow_raw_input --> workflow_input_from
workflow_raw_input -- Validated by --> workflow_input_schema
workflow_input_schema -- Passed to --> workflow_input_from
workflow_input_from -- Produces --> workflow_transformed_input
workflow_transformed_input -- Set as --> initial_context_arg
workflow_transformed_input -- Passed to --> task_raw_input
subgraph Task
task_raw_input -- Passed to --> task_input_from
task_raw_input -- Validated by --> task_input_schema
task_input_schema -- Passed to --> task_input_from
task_input_from -- Produces --> task_transformed_input
task_transformed_input -- Set as --> input_arg
task_transformed_input -- Passed to --> task_definition
task_definition -- Execution produces --> task_raw_output
task_raw_output -- Passed to --> task_output_as
task_output_as -- Produces --> task_transformed_output
task_output_as -- Set as --> output_arg
task_transformed_output -- Passed to --> task_export_as
task_transformed_output -- Set as --> output_arg
task_transformed_output -- Validated by --> task_output_schema
task_output_schema -- Passed to --> task_export_as
task_export_as -- Produces --> new_context
new_context -- Validated by --> task_export_schema
end
task_transformed_output -- Passed as raw input to --> next_task
subgraph next_task [Next Task]
end
task_export_as -- Result set as --> context_arg
new_context -- set as --> context_arg
next_task -- Transformed output becomes --> workflow_raw_output
workflow_raw_output -- Passed to --> workflow_output_as
workflow_output_as -- Produces --> workflow_transformed_output
workflow_transformed_output -- Validated by --> workflow_output_schema
```

### Runtime Expressions
Expand Down

0 comments on commit 46f6dfb

Please sign in to comment.