You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For Kedro, an experiment is a "run of a pipeline". Kedro defines pipeline runners, and allows the definition of some artifacts. However, it does not represent experiments -- runs of pipelines -- as first class objects, and versioned data, which can be the "roots" of different experiments, are represented "spatially" in directories with timestamps. To extend this paradigm to full experiments would be to put them all beside each other.
DVC experiment management is a core feature. With a nod to Borges, experiment runs are represented primarily using the (forking) "time" dimension of the underlying git repo. However, when persisting experiments, different (or hybrid) patterns are possible. By default, when a pipeline is run, previously calculated results whose dependencies are not stored are skipped, with outputs restored from the run cache.
DVC supports branching in experiments by means of checkpoints. Although primarily language neutral, DVC has a python utility to wrap stage runs written in python to automatically generate node checkpoints.
Implementation
We rely on DVC to represent experiments. In order to relate experiments with pipelines, we use the [experiment naming mechanism][d-exp-name] to tie particular experiments to particular runs.
DVC has two methods of running experiments. The basic one, dvc repro does the following:
Determine which steps' dependencies have changed (using dvc.lock)
The second method, dvc exp run, wraps dvc repo in a context of a particular experiment:
create tag for experiment (references previous experiment if any)
associate tag with stash of current datasets
[proceed with dvc repro]
Kedro-DVC provides a way to use these DVC facilities while replacing step 2 ("run these steps") with kedro runner, and using before_node_run hook to skip steps DVC has noted need not be run.
The plan is to use dvc Repo.status() for repro step (1), and to call Repo.commit() for repro step (3). "Run exp" steps can be accomplished by dvc Repo.experiments.new().
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Representation
For Kedro, an experiment is a "run of a pipeline". Kedro defines pipeline runners, and allows the definition of some artifacts. However, it does not represent experiments -- runs of pipelines -- as first class objects, and versioned data, which can be the "roots" of different experiments, are represented "spatially" in directories with timestamps. To extend this paradigm to full experiments would be to put them all beside each other.
DVC experiment management is a core feature. With a nod to Borges, experiment runs are represented primarily using the (forking) "time" dimension of the underlying git repo. However, when persisting experiments, different (or hybrid) patterns are possible. By default, when a pipeline is run, previously calculated results whose dependencies are not stored are skipped, with outputs restored from the run cache.
DVC supports branching in experiments by means of checkpoints. Although primarily language neutral, DVC has a python utility to wrap stage runs written in python to automatically generate node checkpoints.
Implementation
We rely on DVC to represent experiments. In order to relate experiments with pipelines, we use the [experiment naming mechanism][d-exp-name] to tie particular experiments to particular runs.
DVC has two methods of running experiments. The basic one,
dvc repro
does the following:dvc.lock
)dvc.lock
The second method,
dvc exp run
, wrapsdvc repo
in a context of a particular experiment:Kedro-DVC provides a way to use these DVC facilities while replacing step 2 ("run these steps") with kedro runner, and using
before_node_run
hook to skip steps DVC has noted need not be run.The plan is to use dvc
Repo.status()
for repro step (1), and to callRepo.commit()
for repro step (3). "Run exp" steps can be accomplished by dvcRepo.experiments.new()
.Beta Was this translation helpful? Give feedback.
All reactions