You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In order to run kedro pipelines together with dvc experiment tracking, we need to maintain a correspondence between kedro pipelines and dvc pipelines defined in dvc.yaml files.
For a named pipeline in kedro.framework.project.pipelines (which reads pipelines from src/<project_package>/pipeline_registry.py), we create a dvc file according to the discussion.
NOTE: does not include stage parameters yet.
This also does not test environments yet, until the last case. All inputs and outputs should be in the base data catalog.
Cases: (given fixed data catalog):
by pipeline type
empty pipeline -- warning and no output
pipeline with one node and:
inputs:
no inputs
one data input
two data inputs
Error: data input of type not supported (e.g. memory)
Error: input not declared
outputs:
no outputs
one data output
one plot output
one metric output
one each of data, metric, plot
Error output type not supported (memory)
Error: output not declared
pipeline with 2 nodes, start, intermediate and final data.
2 pipelines: each with one node
modular pipelines: one pipeline containing another containing a node
modular pipelines: one overall pipeline using sub-pipeline twice in different namespaces
by dvc.yaml status:
dvc.yaml does not exist: create
exists, and corresponds: do nothing
error: dvc.yaml exists and is different
with --force: dvc.yaml exists and is different
same as above, except if exists and different, overwrite
by data catalog environment.
in base environment
some catalog items in base others in test
Scenarios:
kedro dvc pipelines update
on hook before_pipeline_run:
same as kedro dvc data update without --force for CURRENT pipeline
In order to run kedro pipelines together with dvc experiment tracking, we need to maintain a correspondence between kedro pipelines and dvc pipelines defined in
dvc.yaml
files.See discussion.
For a named pipeline in
kedro.framework.project.pipelines
(which reads pipelines fromsrc/<project_package>/pipeline_registry.py
), we create a dvc file according to the discussion.NOTE: does not include stage parameters yet.
This also does not test environments yet, until the last case. All inputs and outputs should be in the
base
data catalog.Cases: (given fixed data catalog):
dvc.yaml
status:dvc.yaml
does not exist: createdvc.yaml
exists and is different--force
:dvc.yaml
exists and is differentbase
environmentbase
others intest
Scenarios:
kedro dvc pipelines update
before_pipeline_run
:kedro dvc data update
without--force
for CURRENT pipelineIssues:
kedro dvc pipelines update
(no dvc.yaml) case: empty pipeline #43kedro dvc pipelines update
(no dvc.yaml) case: one node, no inputs, outputs #44kedro dvc pipelines update
(no dvc.yaml) case: one data input #45kedro dvc pipelines update
(no dvc.yaml) case: one output (all three types) #46kedro dvc pipelines update
(no dvc.yaml) case: other input cases #47kedro dvc pipelines update
(no dvc.yaml) case: other output cases #48kedro dvc pipelines update
(no dvc.yaml) case: pipeline with 2 nodes #49kedro dvc pipelines update
(no dvc.yaml) case: 2 pipelines, each one node #50kedro dvc pipelines update
(no dvc.yaml) case: nested pipelines #51kedro dvc pipelines update
(no dvc.yaml) case: nested pipeline: use sub-pipeline twice in different namespaces #52kedro dvc pipelines update
dvc.yaml exists, corresponds (all pipeline cases) #53kedro dvc pipelines update
error: dvc.yaml exists, doesn't correspond (all pipeline cases) #55kedro dvc pipelines update
with--force
dvc.yaml exists, doesn't correspond (all pipeline cases) #56before_pipeline_run
-- all cases #57The text was updated successfully, but these errors were encountered: