Galileo-Galilei · Galileo-Galilei · Oct 25, 2023 · Oct 22, 2023 · Oct 22, 2023 · Oct 22, 2023
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,19 +2,24 @@
 
 ## [Unreleased]
 
+### Added
+
 - :sparkles: Add support for python 3.11 ([#450, rxm7706](https://github.com/Galileo-Galilei/kedro-mlflow/pull/450))
+
+### Changed
+
+- :boom: :sparkles: Change default ``copy_mode``  to ``"assign"`` in ``KedroPipelineModel`` because this is the most efficient setup (and usually the desired one) when serving a Kedro ``Pipeline`` as a Mlflow model. This is different from Kedro's default which is to deepcopy the dataset ([#463, ShubhamZoro](https://github.com/Galileo-Galilei/kedro-mlflow/pull/463)).
+- :boom: :recycle: ``MlflowArtifactDataset.__init__`` method ``data_set`` argument is renamed ``dataset`` to match new Kedro conventions ([#391](https://github.com/Galileo-Galilei/kedro-mlflow/pull/391)).
 - :boom: :recycle: Rename the following ``DataSets`` with the ``Dataset`` suffix (without capitalized ``S``) to match new kedro conventions from ``kedro>=0.19`` and onwards ([#439, ShubhamZoro](https://github.com/Galileo-Galilei/kedro-mlflow/pull/439)):
   - ``MlflowArtifactDataSet``->``MlflowArtifactDataset``
   - ``MlflowAbstractModelDataSet``->``MlflowAbstractModelDataset``
   - ``MlflowModelRegistryDataSet``->``MlflowModelRegistryDataset``
   - ``MlflowMetricDataSet``->``MlflowMetricDataset``
   - ``MlflowMetricHistoryDataSet``->``MlflowMetricHistoryDataset``
-  - ``MlflowMetricsDataSet``->``MlflowMetricsDataset``
-- :boom: :recycle: Rename the following ``DataSets`` to make their use more explicit, and use the ``Dataset`` suffix ([#465, ShubhamZoro](https://github.com/Galileo-Galilei/kedro-mlflow/pull/465)):
-  -  ``MlflowModelLoggerDataSet``->``MlflowModelTrackingDataset``
-  -  ``MlflowModelSaverDataSet``->``MlflowModelLocalFileSystemDataset``
-
-- :boom: :sparkles: Change default ``copy_mode``  to ``"assign"`` in ``KedroPipelineModel`` because this is the most efficient setup (and usually the desired one) when serving a Kedro ``Pipeline`` as a Mlflow model. This is different from Kedro's default which is to deepcopy the dataset ([#463, ShubhamZoro](https://github.com/Galileo-Galilei/kedro-mlflow/pull/463)).
+- :boom: :recycle: Rename the following ``DataSets`` to make their use more explicit, and use the ``Dataset`` suffix:
+  - ``MlflowModelLoggerDataSet``->``MlflowModelTrackingDataset`` ([#391](https://github.com/Galileo-Galilei/kedro-mlflow/pull/391))
+  - ``MlflowModelSaverDataSet``->``MlflowModelLocalFileSystemDataset`` ([#391](https://github.com/Galileo-Galilei/kedro-mlflow/pull/391))
+  - ``MlflowMetricsDataSet``->``MlflowMetricsHistoryDataset`` ([#440](https://github.com/Galileo-Galilei/kedro-mlflow/pull/440))
 
 ## [0.11.10] - 2023-10-03
 
@@ -334,7 +339,7 @@
     -   `get_mlflow_config` now works in interactive mode if `load_context` is called  with a path different from the working directory ([#30](https://github.com/Galileo-Galilei/kedro-mlflow/issues/30))
     -   ``kedro_mlflow`` now works fine with ``kedro jupyter notebook`` independently of the working directory ([#64](https://github.com/Galileo-Galilei/kedro-mlflow/issues/64))
     -   You can use global variables in `mlflow.yml` which is now properly parsed if you use a `TemplatedConfigLoader` ([#72](https://github.com/Galileo-Galilei/kedro-mlflow/issues/72))
--   :bug: `MlflowMetricsDataset` now saves in the specified `run_id` instead of the current one when the prefix is not specified ([#62](https://github.com/Galileo-Galilei/kedro-mlflow/issues/62))
+-   :bug: `MlflowMetricsHistoryDataset` now saves in the specified `run_id` instead of the current one when the prefix is not specified ([#62](https://github.com/Galileo-Galilei/kedro-mlflow/issues/62))
 -   :memo: Other bug fixes and documentation improvements ([#6](https://github.com/Galileo-Galilei/kedro-mlflow/issues/6), [#99](https://github.com/Galileo-Galilei/kedro-mlflow/issues/99))
 
 ### Changed

diff --git a/README.md b/README.md
@@ -54,7 +54,7 @@ The documentation contains:
 
 Some frequently asked questions on more advanced features:
 
-- You want to log additional metrics to the run? -> [Try ``MlflowMetricsDataset``](https://kedro-mlflow.readthedocs.io/en/stable/source/04_experimentation_tracking/05_version_metrics.html) !
+- You want to log additional metrics to the run? -> [Try ``MlflowMetricsHistoryDataset``](https://kedro-mlflow.readthedocs.io/en/stable/source/04_experimentation_tracking/05_version_metrics.html) !
 - You want to log nice dataviz of your pipeline that you register with ``MatplotlibWriter``? -> [Try ``MlflowArtifactDataset`` to log any local files (.png, .pkl, .csv...) *automagically*](https://kedro-mlflow.readthedocs.io/en/stable/source/04_experimentation_tracking/03_version_datasets.html)!
 - You want to create easily an API to share your awesome model to anyone? -> [See if ``pipeline_ml_factory`` can fit your needs](https://github.com/Galileo-Galilei/kedro-mlflow/issues/16)
 - You want to do something that is not straigthforward with current implementation? Open an issue, and let's see what happens!

diff --git a/docs/source/01_introduction/02_motivation.md b/docs/source/01_introduction/02_motivation.md
@@ -41,7 +41,7 @@ Above implementations have the advantage of being very straightforward and *mlfl
 | Logging parameters        | ``mlflow.yml``  | ``MlflowHook``                                             |
 | Logging artifacts         | ``catalog.yml`` | ``MlflowArtifactDataset``                                  |
 | Logging models            | ``catalog.yml`` | `MlflowModelTrackingDataset` and `MlflowModelLocalFileSystemDataset` |
-| Logging metrics           | ``catalog.yml`` | ``MlflowMetricsDataset``                                   |
+| Logging metrics           | ``catalog.yml`` | ``MlflowMetricsHistoryDataset``                                   |
 | Logging Pipeline as model | ``hooks.py``    | ``KedroPipelineModel`` and ``pipeline_ml_factory``         |
 
 `kedro-mlflow` does not currently provide interface to set tags outside a Kedro ``Pipeline``. Some of above decisions are subject to debate and design decisions (for instance, metrics are often updated in a loop during each epoch / training iteration and it does not always make sense to register the metric between computation steps, e.g. as a an I/O operation after a node run).

diff --git a/docs/source/02_installation/03_migration_guide.md b/docs/source/02_installation/03_migration_guide.md
@@ -101,7 +101,7 @@ Replace the following entries:
 | old                                     | new                                               |
 | :-------------------------------------- | :------------------------------------------------ |
 | `kedro_mlflow.io.MlflowArtifactDataset` | `kedro_mlflow.io.artifacts.MlflowArtifactDataset` |
-| `kedro_mlflow.io.MlflowMetricsDataset`  | `kedro_mlflow.io.metrics.MlflowMetricsDataset`    |
+| `kedro_mlflow.io.MlflowMetricsHistoryDataset`  | `kedro_mlflow.io.metrics.MlflowMetricsHistoryDataset`    |
 
 ### Hooks
 

diff --git a/docs/source/03_getting_started/02_first_steps.md b/docs/source/03_getting_started/02_first_steps.md
@@ -141,7 +141,7 @@ example_iris_data:
 
 example_model:
   type: kedro_mlflow.io.artifacts.MlflowArtifactDataset
-  data_set:
+  dataset:
     type: pickle.PickleDataset
     filepath: data/06_models/trained_model.pkl
 ```

diff --git a/docs/source/04_experimentation_tracking/03_version_datasets.md b/docs/source/04_experimentation_tracking/03_version_datasets.md
@@ -29,7 +29,7 @@ You can change it to:
 ```yaml
 my_dataset_to_version:
     type: kedro_mlflow.io.artifacts.MlflowArtifactDataset
-    data_set:
+    dataset:
         type: pandas.CSVDataset  # or any valid kedro DataSet
         filepath: /path/to/a/LOCAL/destination/file.csv # must be a local file, wherever you want to log the data in the end
 ```
@@ -40,19 +40,19 @@ and this dataset will be automatically versioned in each pipeline execution.
 
 ### Can I pass extra parameters to the ``MlflowArtifactDataset`` for finer control?
 
-The ``MlflowArtifactDataset`` takes a ``data_set`` argument which is a python dictionary passed to the ``__init__`` method of the dataset declared in ``type``. It means that you can pass any argument accepted by the underlying dataset in this dictionary. If you want to pass ``load_args`` and ``save_args`` in the previous example, add them in the ``data_set`` argument:
+The ``MlflowArtifactDataset`` takes a ``dataset`` argument which is a python dictionary passed to the ``__init__`` method of the dataset declared in ``type``. It means that you can pass any argument accepted by the underlying dataset in this dictionary. If you want to pass ``load_args`` and ``save_args`` in the previous example, add them in the ``dataset`` argument:
 
 ```yaml
 my_dataset_to_version:
     type: kedro_mlflow.io.artifacts.MlflowArtifactDataset
-    data_set:
+    dataset:
         type: pandas.CSVDataset  # or any valid kedro DataSet
         filepath: /path/to/a/local/destination/file.csv
         load_args:
             sep: ;
         save_args:
             sep: ;
-        # ... any other valid arguments for data_set
+        # ... any other valid arguments for dataset
 ```
 
 ### Can I use the ``MlflowArtifactDataset`` in interactive mode?
@@ -64,7 +64,7 @@ from kedro_mlflow.io.artifacts import MlflowArtifactDataset
 from kedro_datasets.pandas import CSVDataset
 
 csv_dataset = MlflowArtifactDataSet(
-    data_set={
+    dataset={
         "type": CSVDataset,  # either a string "pandas.CSVDataset" or the class
         "filepath": r"/path/to/a/local/destination/file.csv",
     }
@@ -90,7 +90,7 @@ The ``MlflowArtifactDataset`` has an extra attribute ``run_id`` which specifies
 ```yaml
 my_dataset_to_version:
     type: kedro_mlflow.io.artifacts.MlflowArtifactDataset
-    data_set:
+    dataset:
         type: pandas.CSVDataset  # or any valid kedro DataSet
         filepath: /path/to/a/local/destination/file.csv  # must be a local filepath, no matter what is your actual mlflow storage (S3 or other)
     run_id: 13245678910111213  # a valid mlflow run to log in. If None, default to active run
@@ -105,7 +105,7 @@ You may want to reuse th artifact of a previous run to reuse it in another one,
 ```yaml
 my_dataset_to_reload:
     type: kedro_mlflow.io.artifacts.MlflowArtifactDataset
-    data_set:
+    dataset:
         type: pandas.CSVDataset  # or any valid kedro DataSet
         filepath: /path/to/a/local/destination/file.csv # must be a local filepath, no matter what is your actual mlflow storage (S3 or other)
     run_id: 13245678910111213  # a valid mlflow run with the existing artifact. It must be named "file.csv"
@@ -120,7 +120,7 @@ With below example, the artifact will be logged in mlflow within a `reporting` f
 ```yaml
 my_dataset_to_version:
     type: kedro_mlflow.io.artifacts.MlflowArtifactDataset
-    data_set:
+    dataset:
         type: pandas.CSVDataset  # or any valid kedro DataSet
         filepath: /path/to/a/local/destination/file.csv
     artifact_path: reporting  # relative path where the remote artifact must be stored. if None, saved in root folder.

diff --git a/docs/source/04_experimentation_tracking/04_version_models.md b/docs/source/04_experimentation_tracking/04_version_models.md
@@ -59,7 +59,7 @@ If you want to save your model both locally and remotely within the same run, yo
 ```yaml
 sklearn_model:
     type: kedro_mlflow.io.artifacts.MlflowArtifactDataset
-    data_set:
+    dataset:
         type: kedro_mlflow.io.models.MlflowModelLocalFileSystemDataset
         flavor: mlflow.sklearn
         filepath: data/06_models/sklearn_model

diff --git a/docs/source/04_experimentation_tracking/05_version_metrics.md b/docs/source/04_experimentation_tracking/05_version_metrics.md
@@ -9,7 +9,7 @@ MLflow defines a metric as "a (key, value) pair, where the value is numeric". Ea
 `kedro-mlflow` introduces 3 ``AbstractDataset`` to manage metrics:
 - ``MlflowMetricDataset`` which can log a float as a metric
 - ``MlflowMetricHistoryDataset`` which can log the evolution over time of a given metric, e.g. a list or a dict of float.
-- ``MlflowMetricsDataset``. It is a wrapper around a dictionary with metrics which is returned by node and log metrics in MLflow.
+- ``MlflowMetricsHistoryDataset``. It is a wrapper around a dictionary with metrics which is returned by node and log metrics in MLflow.
 
 ### Saving a single float as a metric with ``MlflowMetricDataset``
 
@@ -148,13 +148,13 @@ my_model_metric:
         mode: ... # OPTIONAL: "list" by default, one of {"list", "dict", "history"}
 ```
 
-### Saving several metrics with their entire history with ``MlflowMetricsDataset``
+### Saving several metrics with their entire history with ``MlflowMetricsHistoryDataset``
 
 Since it is an ``AbstractDataset``, it can be used with the YAML API. You can define it in your ``catalog.yml`` as:
 
 ```yaml
 my_model_metrics:
-    type: kedro_mlflow.io.metrics.MlflowMetricsDataset
+    type: kedro_mlflow.io.metrics.MlflowMetricsHistoryDataset
 ```
 
 You can provide a prefix key, which is useful in situations like when you have multiple nodes producing metrics with the same names which you want to distinguish. If you are using the ``v``, it will handle that automatically for you by giving as prefix metrics data set name. In the example above the prefix would be ``my_model_metrics``.
@@ -163,7 +163,7 @@ Let's look at an example with custom prefix:
 
 ```yaml
 my_model_metrics:
-    type: kedro_mlflow.io.metrics.MlflowMetricsDataset
+    type: kedro_mlflow.io.metrics.MlflowMetricsHistoryDataset
     prefix: foo
 ```
 
@@ -179,7 +179,7 @@ def metrics_node() -> Dict[str, Union[float, List[float]]]:
     }
 ```
 
-As you can see above, ``kedro_mlflow.io.metrics.MlflowMetricsDataset`` can take metrics as:
+As you can see above, ``kedro_mlflow.io.metrics.MlflowMetricsHistoryDataset`` can take metrics as:
 
 - ``Dict[str, key]``
 - ``List[Dict[str, key]]``
@@ -188,7 +188,7 @@ To store metrics we need to define metrics dataset in Kedro Catalog:
 
 ```yaml
 my_model_metrics:
-    type: kedro_mlflow.io.metrics.MlflowMetricsDataset
+    type: kedro_mlflow.io.metrics.MlflowMetricsHistoryDataset
 ```
 
 Within a kedro run, the ``MlflowHook`` will automatically prefix the metrics datasets with their name in the catalog. In our example, the metrics will be stored in Mlflow with the following keys: ``my_model_metrics.metric1``, ``my_model_metrics.metric2``.
@@ -197,7 +197,7 @@ It is also prossible to provide a prefix manually:
 
 ```yaml
 my_model_metrics:
-    type: kedro_mlflow.io.metrics.MlflowMetricsDataset
+    type: kedro_mlflow.io.metrics.MlflowMetricsHistoryDataset
     prefix: foo
 ```
 

diff --git a/docs/source/07_python_objects/01_DataSets.md b/docs/source/07_python_objects/01_DataSets.md
@@ -7,7 +7,7 @@
 ```yaml
 my_dataset_to_version:
     type: kedro_mlflow.io.artifacts.MlflowArtifactDataset
-    data_set:
+    dataset:
         type: pandas.CSVDataset  # or any valid kedro DataSet
         filepath: /path/to/a/local/destination/file.csv
 ```
@@ -17,14 +17,14 @@ or with additional parameters:
 ```yaml
 my_dataset_to_version:
     type: kedro_mlflow.io.artifacts.MlflowArtifactDataset
-    data_set:
+    dataset:
         type: pandas.CSVDataset  # or any valid kedro DataSet
         filepath: /path/to/a/local/destination/file.csv
         load_args:
             sep: ;
         save_args:
             sep: ;
-        # ... any other valid arguments for data_set
+        # ... any other valid arguments for dataset
     run_id: 13245678910111213  # a valid mlflow run to log in. If None, default to active run
     artifact_path: reporting  # relative path where the artifact must be stored. if None, saved in root folder.
 ```
@@ -36,7 +36,7 @@ from kedro_mlflow.io.artifacts import MlflowArtifactDataset
 from kedro_datasets.pandas import CSVDataset
 
 csv_dataset = MlflowArtifactDataset(
-    data_set={"type": CSVDataset, "filepath": r"/path/to/a/local/destination/file.csv"}
+    dataset={"type": CSVDataset, "filepath": r"/path/to/a/local/destination/file.csv"}
 )
 csv_dataset.save(data=pd.DataFrame({"a": [1, 2], "b": [3, 4]}))
 ```

diff --git a/kedro_mlflow/framework/hooks/mlflow_hook.py b/kedro_mlflow/framework/hooks/mlflow_hook.py
@@ -27,7 +27,7 @@
 from kedro_mlflow.io.metrics import (
     MlflowMetricDataset,
     MlflowMetricHistoryDataset,
-    MlflowMetricsDataset,
+    MlflowMetricsHistoryDataset,
 )
 from kedro_mlflow.mlflow import KedroPipelineModel
 from kedro_mlflow.pipeline.pipeline_ml import PipelineML
@@ -113,7 +113,7 @@ def after_context_created(
                 experiment_name = context._package_name
                 if experiment_name is None:
                     # context._package_name may be None if the session is created interactively
-                    metadata = _get_project_metadata(context._project_path)
+                    metadata = _get_project_metadata(context.project_path)
                     experiment_name = metadata.package_name
                 mlflow_config.tracking.experiment.name = experiment_name
 
@@ -140,40 +140,43 @@ def after_catalog_created(
     ):
         # we use this hooks to modif "MlflowmetricsDataset" to ensure consistency
         # of the metric name with the catalog name
-        for name, dataset in catalog._data_sets.items():
-            if isinstance(dataset, MlflowMetricsDataset) and dataset._prefix is None:
+        for name, dataset in catalog._datasets.items():
+            if (
+                isinstance(dataset, MlflowMetricsHistoryDataset)
+                and dataset._prefix is None
+            ):
                 if dataset._run_id is not None:
-                    catalog._data_sets[name] = MlflowMetricsDataset(
+                    catalog._datasets[name] = MlflowMetricsHistoryDataset(
                         run_id=dataset._run_id, prefix=name
                     )
                 else:
-                    catalog._data_sets[name] = MlflowMetricsDataset(prefix=name)
+                    catalog._datasets[name] = MlflowMetricsHistoryDataset(prefix=name)
 
             if isinstance(dataset, MlflowMetricDataset) and dataset.key is None:
                 if dataset._run_id is not None:
-                    catalog._data_sets[name] = MlflowMetricDataset(
+                    catalog._datasets[name] = MlflowMetricDataset(
                         run_id=dataset._run_id,
                         key=name,
                         load_args=dataset._load_args,
                         save_args=dataset._save_args,
                     )
                 else:
-                    catalog._data_sets[name] = MlflowMetricDataset(
+                    catalog._datasets[name] = MlflowMetricDataset(
                         key=name,
                         load_args=dataset._load_args,
                         save_args=dataset._save_args,
                     )
 
             if isinstance(dataset, MlflowMetricHistoryDataset) and dataset.key is None:
                 if dataset._run_id is not None:
-                    catalog._data_sets[name] = MlflowMetricHistoryDataset(
+                    catalog._datasets[name] = MlflowMetricHistoryDataset(
                         run_id=dataset._run_id,
                         key=name,
                         load_args=dataset._load_args,
                         save_args=dataset._save_args,
                     )
                 else:
-                    catalog._data_sets[name] = MlflowMetricHistoryDataset(
+                    catalog._datasets[name] = MlflowMetricHistoryDataset(
                         key=name,
                         load_args=dataset._load_args,
                         save_args=dataset._save_args,

diff --git a/kedro_mlflow/io/artifacts/mlflow_artifact_dataset.py b/kedro_mlflow/io/artifacts/mlflow_artifact_dataset.py
@@ -15,20 +15,20 @@ class MlflowArtifactDataset(AbstractVersionedDataset):
 
     def __new__(
         cls,
-        data_set: Union[str, Dict],
+        dataset: Union[str, Dict],
         run_id: str = None,
         artifact_path: str = None,
         credentials: Dict[str, Any] = None,
     ):
-        data_set, data_set_args = parse_dataset_definition(config=data_set)
+        dataset, dataset_args = parse_dataset_definition(config=dataset)
 
         # fake inheritance : this mlflow class should be a mother class which wraps
         # all dataset (i.e. it should replace AbstractVersionedDataset)
         # instead and since we can't modify the core package,
-        # we create a subclass which inherits dynamically from the data_set class
-        class MlflowArtifactDatasetChildren(data_set):
+        # we create a subclass which inherits dynamically from the dataset class
+        class MlflowArtifactDatasetChildren(dataset):
             def __init__(self, run_id, artifact_path):
-                super().__init__(**data_set_args)
+                super().__init__(**dataset_args)
                 self.run_id = run_id
                 self.artifact_path = artifact_path
                 self._logging_activated = True
@@ -134,7 +134,7 @@ def _load(self) -> Any:  # pragma: no cover
                 return super()._load()
 
         # rename the class
-        parent_name = data_set.__name__
+        parent_name = dataset.__name__
         MlflowArtifactDatasetChildren.__name__ = f"Mlflow{parent_name}"
         MlflowArtifactDatasetChildren.__qualname__ = (
             f"{parent_name}.Mlflow{parent_name}"