Skip to content

Commit

Permalink
FIX #62 - Support kedro 0.16.5 (and consequently fix #96)
Browse files Browse the repository at this point in the history
  • Loading branch information
takikadiri authored and Galileo-Galilei committed Oct 25, 2020
1 parent e8f9f09 commit 18bdb0a
Show file tree
Hide file tree
Showing 19 changed files with 192 additions and 348 deletions.
1 change: 0 additions & 1 deletion .flake8
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,4 @@ ignore = E203, E266, E501, W503
max-line-length = 88
max-complexity = 18
select = B,C,E,F,W,T4,B9
exclude = kedro_mlflow/template/project/run.py
per-file-ignores = **/__init__.py:F401
2 changes: 1 addition & 1 deletion .isort.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ line_length=88
ensure_newline_before_comments=True
sections=FUTURE,STDLIB,THIRDPARTY,FIRSTPARTY,LOCALFOLDER
known_first_party=kedro_mlflow
known_third_party=black,click,cookiecutter,flake8,isort,jinja2,kedro,mlflow,pandas,pytest,pytest_lazyfixture,setuptools,yaml
known_third_party=anyconfig,click,cookiecutter,jinja2,kedro,mlflow,packaging,pandas,pytest,pytest_lazyfixture,setuptools,yaml
9 changes: 7 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@

### Added

-
- `kedro-mlflow` now supports kedro 0.16.5 (#62)
- `kedro-mlflow` hooks can now be declared in `.kedro.yml` or `pyproject.toml` by adding `kedro_mlflow.framework.hooks.mlflow_pipeline_hook` and `kedro_mlflow.framework.hooks.mlflow_node_hook` into the hooks entry. _Only for kedro>=0.16.5_

### Fixed

Expand All @@ -16,7 +17,11 @@

### Changed

- `MlflowNodeHook` have now a before_pipeline_run hook which stores the ProjectContext and enable to retrieve configuration.
- `MlflowNodeHook` now has a before_pipeline_run hook which stores the ProjectContext and enable to retrieve configuration.

### Removed

`kedro mlflow init` command is no longer declaring hooks in `run.py`. You must now [register your hooks manually](docs/source/03_tutorial/02_setup.md#declaring-kedro-mlflow-hooks) in the ``run.py`` (kedro > 0.16.0), ``.kedro.yml`` (kedro >= 0.16.5) or ``pyproject.toml`` (kedro >= 0.16.5)

## [0.3.0] - 2020-10-11

Expand Down
1 change: 0 additions & 1 deletion docs/source/02_hello_world_example/02_first_steps.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ kedro mlflow init
You have the following message:
```console
'conf/base/mlflow.yml' successfully updated.
'run.py' successfully updated
```

The ``conf/base`` folder is updated:
Expand Down
91 changes: 42 additions & 49 deletions docs/source/03_tutorial/02_setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,13 @@ This plugins must be used in an existing kedro project. If you do not have a ked

For this tutorial and if you do not have a real-world project, I strongly suggest that you accept to include the proposed example to make a demo of this plugin out of the box.

## Update the template of your kedro project
In order to use the ``kedro-mlflow`` plugin, you need to perform 2 actions:
1. Create an ``mlflow.yml`` file for [configuring mlflow in a dedicated file](../05_python_objects/05_Configuration.md).
2. Update the ``src/PYTHON_PACKAGE/run.py`` to add the [necessary hooks](../05_python_objects/02_Hooks.md) to the project context. The ``MlflowPipelineHook`` manages the configuration and registers the PipelineML, while the ``MlflowNodeHook`` autolog the parameters.
## Activate `kedro-mlflow` in your kedro project
In order to use the ``kedro-mlflow`` plugin, you need to set up the its configuration and declare its hooks. those 2 actions are detailled in the following paragraph.

## Automatic template update (recommended)
### Default situation
The first and recommended possibility to setup this context is to use a [dedicated command line](../05_python_objects/04_CLI.md) offered by the plugin.
Position yourself with at the root (i.e. the folder with the ``.kedro.yml`` file)
### Setting up the kedro-mlflow configuration file
``kedro-mlflow`` is [configured](../05_python_objects/05_Configuration.md) through an ``mlflow.yml`` file. The recommended way to initialize the `mlflow.yml` is by using [the kedro-mlflow CLI](../05_python_objects/04_CLI.md).

Set the working directory at the root of your kedro project (i.e. the folder with the ``.kedro.yml`` file)

```console
$ cd path/to/your/project
Expand All @@ -44,48 +42,21 @@ Run the init command :
```console
$ kedro mlflow init
```

*Note : If the warning ``"You have not updated your template yet. This is mandatory to use 'kedro-mlflow' plugin. Please run the following command before you can access to other commands : '$ kedro mlflow init'`` is raised, this is a bug to be corrected and you can safely ignore it.*
If you have never modified your ``run.py`` manually, it should run smoothly and you should get the following message:
you should see the following message:
```console
'conf/base/mlflow.yml' successfully updated.
'run.py' successfully updated
```

### Special case: what happens if you have a custom ``run.py`` ?

You may have modified the ``run.py`` manually since the creation of the project. This may happen in the following situations:
- you have added ``hooks`` (of another plugin for instance)
- you have modified the ``ConfigLoader``, for instance to us a ``TemplatedConfigLoader`` to make your configuration dynamic and link the files with one another
- you have modified the ``get_pipelines`` functions to implement specific logic
-...
These are advanced features of ``Kedro`` and it if you have made such modifications they are very likely conscious; however some other plugins may have modified this file without any warning.

Whatever the reason is, if you ``run.py`` was modified since the project creation, the [previous process](#default-situation) will return the following warning message:
```console
You have modified your 'run.py' since project creation.
In order to use kedro-mlflow, you must either:
- set up your run.py with the following instructions :
INSERT_DOC_URL
- call the following command:
$ kedro mlflow init --force
```
In this situation, the ``mlflow.yml`` is still created, but the ``run.py`` is left unchanged to avoid messing up with your own changes. You can still erase your ``run.py`` and replace it with the one of the plugin with below command.

```console
kedro mlflow init --force
```
**USE AT YOUR OWN RISK: This will erase definitely all the modifications you made to your own ``run.py`` with no possible recovery.** In consequence, this is not the recommended way to setup the project if you have a custom ``run.py``. The best way to continue the setup is to [set up the hooks manually](#manual-update).
### Declaring kedro-mlflow hooks

## Manual update
``kedro_mlflow`` hooks implementations must be registered with Kedro. There are three ways of registring [hooks](https://kedro.readthedocs.io/en/latest/07_extend_kedro/04_hooks.html?highlight=hooks).

The ``MlflowPipelineHook`` and ``MlflowNodeHook`` hooks need to be registered in the the ``run.py`` file. The kedro documenation explain sinde tail [how to register a hook](https://kedro.readthedocs.io/en/latest/04_user_guide/15_hooks.html#registering-your-hook-implementations-with-kedro).
#### - Declaring hooks through code, in ``ProjectContext``

Your run.py should look like the following code snippet :
By declaring `mlflow_pipeline_hook` and `mlflow_node_hook` in ``(src/package_name/run.py) ProjectContext``:

```python
from kedro_mlflow.framework.hooks import MlflowNodeHook, MlflowPipelineHook
from <python_package>.pipeline import create_pipelines
from kedro_mlflow.framework.hooks import mlflow_pipeline_hook, mlflow_node_hook

class ProjectContext(KedroContext):
"""Users can override the remaining methods from the parent class here,
Expand All @@ -95,13 +66,35 @@ class ProjectContext(KedroContext):
project_name = "<project-name>"
project_version = "0.16.X" # must be >=0.16.0
hooks = (
MlflowNodeHook(flatten_dict_params=False),
MlflowPipelineHook(model_name="<python_package>",
conda_env="src/requirements.txt")
) # <-- the new lines to add
mlflow_pipeline_hook,
mlflow_node_hook
)
```
#### - Declaring hooks through static configuration in `.kedro.yml` or `pyproject.toml` **[Only for kedro >= 0.16.5]**

By declaring `mlflow_pipeline_hook` and `mlflow_node_hook` in ``.kedro.yml`` :

```
context_path: km_example.run.ProjectContext
project_name: "km_example"
project_version: "0.16.5"
package_name: "km_example"
hooks:
- km_example.hooks.project_hooks
- kedro_mlflow.framework.hooks.mlflow_pipeline_hook
- kedro_mlflow.framework.hooks.mlflow_node_hook
```

Or by declaring `mlflow_pipeline_hook` and `mlflow_node_hook` in ``pyproject.toml`` :

```
# <your_project>/pyproject.toml
[tool.kedro]
hooks=["kedro_mlflow.framework.hooks.mlflow_pipeline_hook",
"kedro_mlflow.framework.hooks.mlflow_node_hook"]
```

#### - Declaring hooks through auto-discovery **[Coming soon]**


Pay attention to the following elements:
- if you have other hooks (custom, from other plugins...), you can just add them to the hooks tuple
- you **must register both hooks** for the plugin to work
- the hooks are highly parametrizable, you can find a [detailed description of their parameters here](../05_python_objects/02_Hooks.md).
**Note that you must register both hooks for the plugin to work**
Binary file modified docs/source/imgs/initialized_project.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
80 changes: 12 additions & 68 deletions kedro_mlflow/framework/cli/cli.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,20 @@
import os
import subprocess
from pathlib import Path

import click
from kedro import __file__ as KEDRO_PATH
from kedro import __version__ as kedro_version
from kedro.framework.context import load_context
from packaging import version

from kedro_mlflow.framework.cli.cli_utils import (
render_jinja_template,
write_jinja_template,
)
from kedro_mlflow.framework.cli.cli_utils import write_jinja_template
from kedro_mlflow.framework.context import get_mlflow_config
from kedro_mlflow.utils import _already_updated, _get_project_globals, _is_kedro_project
from kedro_mlflow.utils import _already_updated, _is_kedro_project

try:
from kedro.framework.context import get_static_project_data
except ImportError: # pragma: no cover
from kedro_mlflow.utils import _get_project_globals as get_static_project_data # pragma: no cover


TEMPLATE_FOLDER_PATH = Path(__file__).parent.parent.parent / "template" / "project"

Expand Down Expand Up @@ -88,7 +91,7 @@ def init(force, silent):

# get constants
project_path = Path().cwd()
project_globals = _get_project_globals()
project_globals = get_static_project_data(project_path)
context = load_context(project_path)
conf_root = context.CONF_ROOT

Expand All @@ -99,73 +102,14 @@ def init(force, silent):
src=TEMPLATE_FOLDER_PATH / mlflow_yml,
is_cookiecutter=False,
dst=project_path / conf_root / "base" / mlflow_yml,
python_package=project_globals["python_package"],
python_package=project_globals["package_name"],
)
if not silent:
click.secho(
click.style(
f"'{conf_root}/base/mlflow.yml' successfully updated.", fg="green"
)
)
# make a check whether the project run.py is strictly identical to the template
# if yes, replace the script by the template silently
# if no, raise a warning and send a message to INSERT_DOC_URL
flag_erase_runpy = force
runpy_project_path = (
project_path
/ "src"
/ (Path(project_globals["context_path"]).parent.as_posix() + ".py")
)
if not force:
kedro_path = Path(KEDRO_PATH).parent
runpy_template_path = (
kedro_path
/ "templates"
/ "project"
/ "{{ cookiecutter.repo_name }}"
/ "src"
/ "{{ cookiecutter.python_package }}"
/ "run.py"
)
kedro_runpy_template = render_jinja_template(
src=runpy_template_path,
is_cookiecutter=True,
python_package=project_globals["python_package"],
project_name=project_globals["project_name"],
kedro_version=project_globals["kedro_version"],
)

with open(runpy_project_path, mode="r") as file_handler:
kedro_runpy_project = file_handler.read()

# beware : black formatting could change slightly this test which is very strict
if kedro_runpy_project == kedro_runpy_template:
flag_erase_runpy = True

if flag_erase_runpy:
os.remove(runpy_project_path)
write_jinja_template(
src=TEMPLATE_FOLDER_PATH / "run.py",
dst=runpy_project_path,
is_cookiecutter=True,
python_package=project_globals["python_package"],
project_name=project_globals["project_name"],
kedro_version=project_globals["kedro_version"],
)
if not silent:
click.secho(click.style("'run.py' successfully updated", fg="green"))
else:
click.secho(
click.style(
"You have modified your 'run.py' since project creation.\n"
+ "In order to use kedro-mlflow, you must either:\n"
+ " - set up your run.py with the following instructions :\n"
+ "INSERT_DOC_URL\n"
+ " - call the following command:\n"
+ "$ kedro mlflow init --force",
fg="yellow",
)
)


@mlflow_commands.command()
Expand Down
4 changes: 2 additions & 2 deletions kedro_mlflow/framework/hooks/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
from .node_hook import MlflowNodeHook
from .pipeline_hook import MlflowPipelineHook
from .node_hook import MlflowNodeHook, mlflow_node_hook
from .pipeline_hook import MlflowPipelineHook, mlflow_pipeline_hook
3 changes: 3 additions & 0 deletions kedro_mlflow/framework/hooks/node_hook.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,9 @@ def before_node_run(
mlflow.log_params(params_inputs)


mlflow_node_hook = MlflowNodeHook()


def flatten_dict(d, recursive: bool = True, sep="."):
def expand(key, value):
if isinstance(value, dict):
Expand Down
3 changes: 3 additions & 0 deletions kedro_mlflow/framework/hooks/pipeline_hook.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,9 @@ def on_pipeline_error(
mlflow.end_run()


mlflow_pipeline_hook = MlflowPipelineHook()


def _generate_kedro_command(
tags, node_names, from_nodes, to_nodes, from_inputs, load_versions, pipeline_name
):
Expand Down
69 changes: 0 additions & 69 deletions kedro_mlflow/template/project/run.py

This file was deleted.

Loading

0 comments on commit 18bdb0a

Please sign in to comment.