Skip to content

Commit

Permalink
🔥 Remove deprecated kedro_mlflow.framework.context and cancel depreca…
Browse files Browse the repository at this point in the history
…tion of kedro_mlflow.io.metrics.MlflowMetricsDataSet for upcoming 0.8.0 release
  • Loading branch information
Galileo-Galilei committed Nov 7, 2021
1 parent 5080ec3 commit 257642f
Show file tree
Hide file tree
Showing 11 changed files with 53 additions and 785 deletions.
16 changes: 12 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,23 @@

## [Unreleased]

### Fixed
### Added

- :bug: ``KedroMlflowConfig.setup()`` methods now sets the experiment globally to ensure all runs are launched under the experiment specified in the configuraiton even in interactive mode ([#256](https://github.com/Galileo-Galilei/kedro-mlflow/issues/256)).
- :memo: Format code blocks in documentation with ``blacken-docs``
- :construction_worker: Enforce the use of ``black`` and ``isort`` in the CI to enforce style guidelines for developers

### Changed

- :sparkles: :boom: The ``KedroPipelineModel`` custom mlflow model now accepts any kedro `Pipeline` as input (provided they have a single DataFrame input and a single output because this is an mlflow limitation) instead of only ``PipelineML`` objects. This simplifies the API for user who want to customise the model logging ([#171](https://github.com/Galileo-Galilei/kedro-mlflow/issues/171)). `KedroPipelineModel.__init__` argument `pipeline_ml` is renamed `pipeline` to reflect this change.
- :memo: Format code blocks in documentation with ``blacken-docs``
- :construction_worker: Enforce the use of ``black`` and ``isort`` in the CI to enforce style guidelines
- :wastebasket: `kedro_mlflow.io.metrics.MlflowMetricsDataSet` is no longer deprecated because there is no alternative for now to log many metrics at the same time.

### Fixed

- :bug: ``KedroMlflowConfig.setup()`` methods now sets the experiment globally to ensure all runs are launched under the experiment specified in the configuraiton even in interactive mode ([#256](https://github.com/Galileo-Galilei/kedro-mlflow/issues/256)).

### Removed

- :fire: :boom: ``KedroMlflowConfig`` and `get_mlflow_config` were deprecated since ``0.7.3`` and are now removed from ``kedro_mlflow.framework.context``. Direct import must now use `kedro_mlflow.config`.

## [0.7.6] - 2021-10-08

Expand Down
72 changes: 41 additions & 31 deletions docs/source/04_experimentation_tracking/05_version_metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ MLflow defines a metric as "a (key, value) pair, where the value is numeric". Ea
`kedro-mlflow` introduces 3 ``AbstractDataSet`` to manage metrics:
- ``MlflowMetricDataSet`` which can log a float as a metric
- ``MlflowMetricHistoryDataSet`` which can log the evolution over time of a given metric, e.g. a list or a dict of float.
- ``MlflowMetricsDataSet`` (**DEPRECATED**). It is a wrapper around a dictionary with metrics which is returned by node and log metrics in MLflow.
- ``MlflowMetricsDataSet``. It is a wrapper around a dictionary with metrics which is returned by node and log metrics in MLflow.

### Saving a single float as a metric with ``MlflowMetricDataSet``

Expand All @@ -20,9 +20,11 @@ The ``MlflowMetricDataSet`` is an ``AbstractDataSet`` which enable to save or lo
```python
from kedro_mlflow.io.metrics import MlflowMetricDataSet

metric_ds=MlflowMetricDataSet(key="my_metric")
metric_ds = MlflowMetricDataSet(key="my_metric")
with mlflow.start_run():
metric_ds.save(0.3) # create a "my_metric=0.3" value in the "metric" field in mlflow UI
metric_ds.save(
0.3
) # create a "my_metric=0.3" value in the "metric" field in mlflow UI
```

```eval_rst
Expand All @@ -34,24 +36,28 @@ with mlflow.start_run():
```python
from kedro_mlflow.io.metrics import MlflowMetricDataSet

metric_ds=MlflowMetricDataSet(key="my_metric", run_id="123456789")
metric_ds = MlflowMetricDataSet(key="my_metric", run_id="123456789")
with mlflow.start_run():
metric_ds.save(0.3) # create a "my_metric=0.3" value in the "metric" field of the run 123456789
metric_ds.save(
0.3
) # create a "my_metric=0.3" value in the "metric" field of the run 123456789
```

It is also possible to pass ``load_args`` and ``save_args`` to control which step should be logged (in case you have logged several step for the same metric.) ``save_args`` accepts a ``mode`` key which can be set to ``overwrite`` (mlflow default) or ``append``. In append mode, if no step is specified, saving the metric will "bump" the last existing step to create a linear history. **This is very useful if you have a monitoring pipeline which calculates a metric frequently to check the performance of a deployed model.**

```python
from kedro_mlflow.io.metrics import MlflowMetricDataSet

metric_ds=MlflowMetricDataSet(key="my_metric", load_args={"step": 1}, save_args={"mode": "append"})
metric_ds = MlflowMetricDataSet(
key="my_metric", load_args={"step": 1}, save_args={"mode": "append"}
)

with mlflow.start_run():
metric_ds.save(0) # step 0 stored for "my_metric"
metric_ds.save(0.1) # step 1 stored for "my_metric"
metric_ds.save(0.2) # step 2 stored for "my_metric"
metric_ds.save(0) # step 0 stored for "my_metric"
metric_ds.save(0.1) # step 1 stored for "my_metric"
metric_ds.save(0.2) # step 2 stored for "my_metric"

my_metric=metric_ds.load() # value=0.1 (step number 1)
my_metric = metric_ds.load() # value=0.1 (step number 1)
```

Since it is an ``AbstractDataSet``, it can be used with the YAML API in your ``catalog.yml``, e.g. :
Expand All @@ -78,48 +84,55 @@ It enables logging either:
```python
from kedro_mlflow.io.metrics import MlflowMetricHistoryDataSet

metric_history_ds=MlflowMetricDataSet(key="my_metric", save_args={"mode": "list"})
metric_history_ds = MlflowMetricDataSet(key="my_metric", save_args={"mode": "list"})

with mlflow.start_run():
metric_history_ds.save([0.1,0.2,0.3]) # will be logged with incremental steps
metric_history_ds.save([0.1, 0.2, 0.3]) # will be logged with incremental steps
```
- a dict of {step: value} as a metric:
```python
from kedro_mlflow.io.metrics import MlflowMetricHistoryDataSet

metric_history_ds=MlflowMetricDataSet(key="my_metric", save_args={"mode": "dict"})
metric_history_ds = MlflowMetricDataSet(key="my_metric", save_args={"mode": "dict"})

with mlflow.start_run():
metric_history_ds.save({0: 0.1, 1: 0.2, 2: 0.3}) # will be logged with incremental steps
metric_history_ds.save(
{0: 0.1, 1: 0.2, 2: 0.3}
) # will be logged with incremental steps
```

- a list of dict [{log_metric_arg: value}] as a metric, e.g:

```python
from kedro_mlflow.io.metrics import MlflowMetricHistoryDataSet

metric_history_ds=MlflowMetricDataSet(key="my_metric", save_args={"mode": "history"})
metric_history_ds = MlflowMetricDataSet(key="my_metric", save_args={"mode": "history"})

with mlflow.start_run():
metric_history_ds.save(
[
{"step": 0, "value": 0.1, "timestamp": 1345545},
{"step": 1, "value": 0.2, "timestamp": 1345546},
{"step": 2, "value": 0.3, "timestamp": 1345547},
])
]
)
```

You can combine the different mode for save and load, e.g:

```python
from kedro_mlflow.io.metrics import MlflowMetricHistoryDataSet

metric_history_ds=MlflowMetricDataSet(key="my_metric", save_args={"mode": "dict"}, save_args={"mode": "list"})
metric_history_ds = MlflowMetricDataSet(
key="my_metric", save_args={"mode": "dict"}, save_args={"mode": "list"}
)

with mlflow.start_run():
metric_history_ds.save({0: 0.1, 1: 0.2, 2: 0.3}) # will be logged with incremental steps
metric_history_ds.load() # return [0.1,0.2,0.3]
metric_history_ds.save(
{0: 0.1, 1: 0.2, 2: 0.3}
) # will be logged with incremental steps
metric_history_ds.load() # return [0.1,0.2,0.3]
```

As usual, since it is an ``AbstractDataSet``, it can be used with the YAML API in your ``catalog.yml``, and in this case, the ``key`` argument is optional:
Expand All @@ -137,11 +150,6 @@ my_model_metric:
### Saving several metrics with their entire history with ``MlflowMetricsDataSet``
```eval_rst
.. warning:: This class is deprecated and will be removed soon. Use ``MlflowMetricDataSet`` or ``MlflowMetricHistoryDataSet`` instead.
```
Since it is an ``AbstractDataSet``, it can be used with the YAML API. You can define it in your ``catalog.yml`` as:
```yaml
Expand All @@ -167,7 +175,7 @@ Let assume that you have node which doesn't have any inputs and returns dictiona
def metrics_node() -> Dict[str, Union[float, List[float]]]:
return {
"metric1": {"value": 1.1, "step": 1},
"metric2": [{"value": 1.1, "step": 1}, {"value": 1.2, "step": 2}]
"metric2": [{"value": 1.1, "step": 1}, {"value": 1.2, "step": 2}],
}
```

Expand Down Expand Up @@ -199,10 +207,12 @@ As any entry in the catalog, the metrics data set must be defined in a Kedro pip
```python
def create_pipeline() -> Pipeline:
return Pipeline(node(
func=metrics_node,
inputs=None,
outputs="my_model_metrics",
name="log_metrics",
))
return Pipeline(
node(
func=metrics_node,
inputs=None,
outputs="my_model_metrics",
name="log_metrics",
)
)
```
4 changes: 0 additions & 4 deletions kedro_mlflow/framework/context/__init__.py

This file was deleted.

Loading

0 comments on commit 257642f

Please sign in to comment.