Replies: 1 comment 1 reply
-
Time for my random question and comment spree on this one... 😀 Questions
Maybe useful comments
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Representation
Kedro nodes represent explicit steps in an experiment. They are defined by wrappers around python functions.
Kedro pipelines represent experiments. They are also specified in python. In a pipeline, each node has inputs and outputs. These input and output names correspond to entries in the data catalog. The names of inputs and outputs are also edge labels in the execution DAG for the pipeline: a node with an output of a given name produces the input for another node with an input labelled by that name.
Kedro pipelines can load other pipelines as modules
DVC stages are represented by entries in
dvc.yaml
pipeline definitions. Entries define inputs and outputs via filename references. The task to execute is defined as a shell command. They also name parameters relevantto the step, and non-data artifacts (metrics and plots) produced by the task.
Implementation
Using the "before-pipeline" hook, we can write/update
dvc.yaml
. We can also have a utility that does this without running an experiment. By default we can keepdvc.yaml
inconfig/kedro_dvc
, named after the pipeline.Nodes refer to data catalog entries, which we translate to
.dvc
files. Kedro-dvc maintains a correspondence between registered pipelines anddvc.yaml
files. Each of this files are placed in<CONF_ROOT>/dvc/pipelines/<pipeline_name>/
for each pipeline.dvc.yaml
contains avars
section, which we need to use to map parameters (see discussion), and astages
section, which describes the equivalent of kedro nodes. For each stage, we maintain the following fields:kedro run --pipeline <pipeline name> --node <node name>
dvc.yaml
to project root.dvc
files corresponding to node data inputs.dvc
files corresponding to node data outputsNames
The node name is the qualified name:
<namespace>.<short_name>
, if a namespace is provided, and just<short_name>
if not. If no name is provided, the name of the function wrapped is used. These names might not be unique. Running the experiment via kedro will avoid problems with non-unique naming; it should also provide potential performance benefits.
Meta
For each input and output, currently just:
Beta Was this translation helpful? Give feedback.
All reactions