You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On pipeline run (or via a utility, possibly), kedro supports kedro hooks. (Hooks can be registered by plugins. In the "before pipeline run", kedro-dvc can create or update dvc annotations, and also supply a version number.
At first, we will only support data stored on local file system. However, both kedro and dvc support remote data, kedro via fsspec, dvc via its own mechanism. We could add support for this in the future. We could also support asynchronous writes of data passed in memory in kedro pipelines.
Dvc keeps track of different types of artifact: data, parameters, metrics, plots. We should be able to distinguish between these using the type of the items in the data catalog. However, we may require additional annotations.
Kedro does not yet track experiments as first class objects, though they have beta-support for some non-data artifacts (plots & metrics). In DVC, experiment tracking is part the core mission.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Stage 1: setup correspondence between kedro and dvc
Map kedro node inputs and outputs to dvc tracked artifacts
Kedro-dvc will be implemented as a Kedro plugin.
We will try to interact with dvc via
dvc.repo.Repo
, which is not a public api, but apparently is relatively straightforward.Kedro has a data catalog for initial data, which supports versioned datasets. DVC uses annotations in .dvc files to mark and store hashes.
On pipeline run (or via a utility, possibly), kedro supports kedro hooks. (Hooks can be registered by plugins. In the "before pipeline run", kedro-dvc can create or update dvc annotations, and also supply a version number.
At first, we will only support data stored on local file system. However, both kedro and dvc support remote data, kedro via fsspec, dvc via its own mechanism. We could add support for this in the future. We could also support asynchronous writes of data passed in memory in kedro pipelines.
Dvc keeps track of different types of artifact: data, parameters, metrics, plots. We should be able to distinguish between these using the type of the items in the data catalog. However, we may require additional annotations.
Kedro does not yet track experiments as first class objects, though they have beta-support for some non-data artifacts (plots & metrics). In DVC, experiment tracking is part the core mission.
dvc.yaml
stagedvc.yaml
Useful links:
Beta Was this translation helpful? Give feedback.
All reactions