docs: document schema validation order

Signed-off-by: Matthias Pichler <m.pichler@warrify.com>
serverlessworkflow · Aug 21, 2024 · 46f6dfb · 46f6dfb
1 parent feb9ca9
commit 46f6dfb
Show file tree

Hide file tree

Showing 2 changed files with 75 additions and 35 deletions.
diff --git a/dsl-reference.md b/dsl-reference.md
@@ -1494,17 +1494,17 @@ Represents the definition of the parameters that control the randomness or varia
 
 ### Input
 
-Documents the structure - and optionally configures the filtering of - workflow/task input data.
+Documents the structure - and optionally configures the transformation of - workflow/task input data.
 
 It's crucial for authors to document the schema of input data whenever feasible. This documentation empowers consuming applications to provide contextual auto-suggestions when handling runtime expressions.
 
-When set, runtimes must validate input data against the defined schema, unless defined otherwise.
+When set, runtimes must validate raw input data against the defined schema before applying transformations, unless defined otherwise.
 
 #### Properties
 
 | Property | Type | Required | Description |
 |----------|:----:|:--------:|-------------|
-| schema | [`schema`](#schema) | `no` | The [`schema`](#schema) used to describe and validate input data.<br>*Even though the schema is not required, it is strongly encouraged to document it, whenever feasible.* |
+| schema | [`schema`](#schema) | `no` | The [`schema`](#schema) used to describe and validate raw input data.<br>*Even though the schema is not required, it is strongly encouraged to document it, whenever feasible.* |
 | from | `string`<br>`object` | `no` | A [runtime expression](dsl.md#runtime-expressions), if any, used to filter and/or mutate the workflow/task input.  |
 
 #### Examples
@@ -1515,9 +1515,16 @@ schema:
   document:
     type: object
     properties:
-      petId:
-        type: string
-    required: [ petId ]
+      order:
+        type: object
+        required: [ pet ]
+        properties:
+          pet:
+            type: object
+            required: [ id ]
+            properties:
+              id:
+                type: string
 from: .order.pet
 ```
 
@@ -1527,7 +1534,7 @@ Documents the structure - and optionally configures the transformations of - wor
 
 It's crucial for authors to document the schema of output data whenever feasible. This documentation empowers consuming applications to provide contextual auto-suggestions when handling runtime expressions.
 
-When set, runtimes must validate output data against the defined schema, unless defined otherwise.
+When set, runtimes must validate output data against the defined schema after applying transformations, unless defined otherwise.
 
 #### Properties
 
@@ -1550,16 +1557,13 @@ output:
       required: [ petId ]
   as:
     petId: '${ .pet.id }'
-export:
-  as: 
-   '.petList += [ $task.output ]'  
 ```
 
 ### Export
 
-Certain task needs to set the workflow context to save the task output for later usage. Users set the content of the context through a runtime expression. The result of the expression is the new value of the context. The expression is evaluated against the existing context. 
+Certain task needs to set the workflow context to save the task output for later usage. Users set the content of the context through a runtime expression. The result of the expression is the new value of the context. The expression is evaluated against the transformed task output.
 
-Optionally, the context might have an associated schema. 
+Optionally, the context might have an associated schema which is validated against the result of the expression.
 
 #### Properties
 
@@ -1573,13 +1577,13 @@ Optionally, the context might have an associated schema.
 Merge the task output into the current context.
 
 ```yaml
-as: '.+$output'
+as: '$context+.'
 ```
 
 Replace the context with the task output.
 
 ```yaml
-as: $output
+as: '.'
 ```
 
 ### Schema

diff --git a/dsl.md b/dsl.md
@@ -162,98 +162,134 @@ Once the task has been executed, different things can happen:
 
 ### Data Flow
 
-In Serverless Workflow DSL, data flow management is crucial to ensure that the right data is passed between tasks and to the workflow itself. 
+In Serverless Workflow DSL, data flow management is crucial to ensure that the right data is passed between tasks and to the workflow itself.
 
 Here's how data flows through a workflow based on various transformation stages:
 
-1. **Transform Workflow Input**
+1. **Validate Workflow Input**
+Before the workflow starts, the input data provided to the workflow can be validated against the `input.schema` property to ensure it conforms to the expected structure.
+The execution only proceeds if the input is valid otherwise it will fault with a [ValidationError (https://serverlessworkflow.io/spec/1.0.0/errors/validation)](dsl-reference.md#error).
+
+2. **Transform Workflow Input**
 Before the workflow starts, the input data provided to the workflow can be transformed to ensure only relevant data in the expected format is passed into the workflow context. This can be done using the top level `input.from` expression. It evaluates on the raw workflow input and defaults to the identity expression which leaves the input unchanged. This step allows the workflow to start with a clean and focused dataset, reducing potential overhead and complexity in subsequent tasks. The result of this expression will set as the initial value for the `$context` runtime expression argument and be passed to the first task.
 
 *Example: If the workflow receives a JSON object as input, a transformation can be applied to remove unnecessary fields and retain only those that are required for the workflow's execution.*
 
-2. **Transform First Task Input**
-The input data for the first task can be transformed to match the specific requirements of that task. This ensures that the first task receives only the data required to perform its operations. This can be done using the task's `input.from` expression. It evaluates the transformed workflow input and defaults to the identity expression, which leaves the input unchanged. The result of this expression will be set as the `$input` runtime expression argument and be passed to the task. This transformed input will be evaluated against any runtime expressions used within the task definition.
+After workflow input validation and transformation, the transformed input is passed as the raw input to the first task.
+
+3. **Validate Task Input**
+Before a task executes, its raw input can be validated against the `input.schema` property to ensure it conforms to the expected structure.
+The execution only proceeds if the input is valid otherwise it will fault with a [ValidationError (https://serverlessworkflow.io/spec/1.0.0/errors/validation)](dsl-reference.md#error).
 
-*Example: If the first task is a function call that only needs a subset of the workflow input, a transformation can be applied to provide only those fields needed for the function to execute.*
+4. **Transform Task Input**
+The input data for the task can be transformed to match the specific requirements of that task. This ensures that the task receives only the data required to perform its operations. This can be done using the task's `input.from` expression. It evaluates on the raw task input (i.e. the transformed workflow input for the first task or the transformed output of the previous task) and defaults to the identity expression, which leaves the input unchanged. The result of this expression will be set as the `$input` runtime expression argument and be passed to the task. This transformed input will be evaluated against any runtime expressions used within the task definition.
 
-3. **Transform First Task Output**
-After completing the first task, its output can be transformed before passing it to the next task or storing it in the workflow context. Transformations are applied using the `output.as` runtime expression. It evaluates the raw task output and defaults to the identity expression, which leaves the output unchanged. Its result will be input for the next task. To update the context, one uses the `export.as` runtime expression. It evaluates the raw output and defaults to the expression that returns the existing context. The result of this runtime expression replaces the workflow's current context and the content of the `$context` runtime expression argument. This helps manage the data flow and keep the context clean by removing any unnecessary data produced by the task.
+*Example: If the task is a function call that only needs a subset of the workflow input, a transformation can be applied to provide only those fields needed for the function to execute.*
 
-*Example: If the first task returns a large dataset, a transformation can be applied to retain only the relevant results needed for subsequent tasks.*
+5. **Transform Task Output**
+After completing the task, its output can be transformed before passing it to the next task or storing it in the workflow context. Transformations are applied using the `output.as` runtime expression. It evaluates on the raw task output and defaults to the identity expression, which leaves the output unchanged. Its result will be input for the next task.
 
-4. **Transform Last Task Input**
-Before the last task in the workflow executes, its input data can be transformed to ensure it receives only the necessary information. This can be done using the task's `input.from` expression. It evaluates the transformed workflow input and defaults to the identity expression, which leaves the input unchanged. The result of this expression will be set as the `$input` runtime expression argument and be passed to the task. This transformed input will be evaluated against any runtime expressions used within the task definition. This step is crucial for ensuring the final task has all the required data to complete the workflow successfully.
+*Example: If the task returns a large dataset, a transformation can be applied to retain only the relevant results needed for subsequent tasks.*
 
-*Example: If the last task involves generating a report, the input transformation can ensure that only the data required for the report generation is passed to the task.*
+6. **Validate Task Output**
+After `output.as` is evaluated, the transformed task output is validated against the `output.schema` property to ensure it conforms to the expected structure. The execution only proceeds if the output is valid otherwise it will fault with a [ValidationError (https://serverlessworkflow.io/spec/1.0.0/errors/validation)](dsl-reference.md#error).
 
-5. **Transform Last Task Output**
-After the last task completes, its output can be transformed before it is considered the workflow output. Transformations are applied using the `output.as` runtime expression. It evaluates the raw task output and defaults to the identity expression, which leaves the output unchanged. Its result will be passed to the workflow `output.as` runtime expression. This ensures that the workflow produces a clean and relevant output, free from any extraneous data that might have been generated during the task execution.
+7. **Update Workflow Context**
+To update the context, one uses the `export.as` runtime expression. It evaluates on the transformed task output and defaults to the expression that returns the existing context. The result of this runtime expression replaces the workflow's current context and the content of the `$context` runtime expression argument. This helps manage the data flow and keep the context clean by removing any unnecessary data produced by the task.
 
-*Example: If the last task outputs various statistics, a transformation can be applied to retain only the key metrics that are relevant to the stakeholders.*
+8. **Validate Exported Context**
+After the context is updated, the exported context is validated against the `export.schema` property to ensure it conforms to the expected structure. The execution only proceeds if the exported context is valid otherwise it will fault with a [ValidationError (https://serverlessworkflow.io/spec/1.0.0/errors/validation)](dsl-reference.md#error).
 
-6. **Transform Workflow Output**
-Finally, the overall workflow output can be transformed before it is returned to the caller or stored. Transformations are applied using the `output.as` runtime expression. It evaluates the last task's output and defaults to the identity expression, which leaves the output unchanged. This step ensures that the final output of the workflow is concise and relevant, containing only the necessary information that needs to be communicated or recorded.
+9. **Continue Workflow**
+After the context is updated, the workflow continues to the next task in the sequence. The transformed output of the previous task is passed as the raw input to the next task, and the data flow cycle repeats.
+If no more tasks are defined, the task's transformed output is passed to the workflow output transformation step.
+
+10. **Transform Workflow Output**
+Finally, the overall workflow output can be transformed before it is returned to the caller or stored. Transformations are applied using the `output.as` runtime expression. It evaluates on the last task's transformed output and defaults to the identity expression, which leaves the output unchanged. This step ensures that the final output of the workflow is concise and relevant, containing only the necessary information that needs to be communicated or recorded.
 
 *Example: If the workflow's final output is a summary report, a transformation can ensure that the report contains only the most important summaries and conclusions, excluding any intermediate data.*
 
+11. **Validate Workflow Output**
+After `output.as` is evaluated, the transformed workflow output is validated against the `output.schema` property to ensure it conforms to the expected structure. The execution only proceeds if the output is valid otherwise it will fault with a [ValidationError (https://serverlessworkflow.io/spec/1.0.0/errors/validation)](dsl-reference.md#error).
+
 By applying transformations at these strategic points, Serverless Workflow DSL ensures that data flows through the workflow in a controlled and efficient manner, maintaining clarity and relevance at each execution stage. This approach helps manage complex workflows and ensures that each task operates with the precise data required, leading to more predictable and reliable workflow outcomes.
 
 Visually, this can be represented as follows:
 
 ```mermaid
 flowchart TD
 
+  subgraph Legend
+    legend_data{{Data}}
+    legend_schema[\Schema/]
+    legend_transformation[Transformation]
+    legend_arg([Runtime Argument])
+  end
+
   initial_context_arg([<code>$context</code>])
   context_arg([<code>$context</code>])
   input_arg([<code>$input</code>])
   output_arg([<code>$output</code>])
 
   workflow_raw_input{{Raw Workflow Input}}
+  workflow_input_schema[\Workflow: <code>input.schema</code>/]
   workflow_input_from[Workflow: <code>input.from</code>]
   workflow_transformed_input{{Transformed Workflow Input}}
 
   task_raw_input{{Raw Task Input}}
+  task_input_schema[\Task: <code>input.schema</code>/]
   task_input_from[Task: <code>input.from</code>]
   task_transformed_input{{Transformed Task Input}}
   task_definition[Task definition]
   task_raw_output{{Raw Task output}}
   task_output_as[Task: <code>output.as</code>]
   task_transformed_output{{Transformed Task output}}
+  task_output_schema[\Task: <code>output.schema</code>/]
   task_export_as[Task: <code>export.as</code>]
+  task_export_schema[\Task: <code>export.schema</code>/]
+
+  new_context{{New execution context}}
   
   workflow_raw_output{{Raw Workflow Output}}
   workflow_output_as[Workflow: <code>output.as</code>]
   workflow_transformed_output{{Transformed Workflow Output}}
+  workflow_output_schema[\Workflow: <code>output.schema</code>/]
 
-  workflow_raw_input --> workflow_input_from
+  workflow_raw_input -- Validated by --> workflow_input_schema
+  workflow_input_schema -- Passed to --> workflow_input_from
   workflow_input_from -- Produces --> workflow_transformed_input
   workflow_transformed_input -- Set as --> initial_context_arg
   workflow_transformed_input -- Passed to --> task_raw_input
 
   subgraph Task
 
-    task_raw_input -- Passed to --> task_input_from
+    task_raw_input -- Validated by --> task_input_schema
+    task_input_schema -- Passed to --> task_input_from
     task_input_from -- Produces --> task_transformed_input
     task_transformed_input -- Set as --> input_arg
     task_transformed_input -- Passed to --> task_definition
 
     task_definition -- Execution produces --> task_raw_output
     task_raw_output -- Passed to --> task_output_as
     task_output_as -- Produces --> task_transformed_output
-    task_output_as -- Set as --> output_arg
-    task_transformed_output -- Passed to --> task_export_as
+    task_transformed_output -- Set as --> output_arg
+    task_transformed_output -- Validated by --> task_output_schema
+    task_output_schema -- Passed to --> task_export_as
+    task_export_as -- Produces --> new_context
+    new_context -- Validated by --> task_export_schema
   end
 
   task_transformed_output -- Passed as raw input to --> next_task
 
   subgraph next_task [Next Task]
   end
 
-  task_export_as -- Result set as --> context_arg
+  new_context -- set as --> context_arg
 
   next_task -- Transformed output becomes --> workflow_raw_output
   workflow_raw_output -- Passed to --> workflow_output_as
   workflow_output_as -- Produces --> workflow_transformed_output
+  workflow_transformed_output -- Validated by --> workflow_output_schema
 ```
 
 ### Runtime Expressions