Skip to content

Commit

Permalink
PARTIAL #6 - Add Markdown links checker to avoid broken links in docu…
Browse files Browse the repository at this point in the history
…mentation
  • Loading branch information
Galileo-Galilei committed Feb 6, 2021
1 parent 1f8e715 commit ef3b0e2
Show file tree
Hide file tree
Showing 12 changed files with 37 additions and 23 deletions.
15 changes: 11 additions & 4 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@ name: test

on:
push:
branches: [ develop, master ]
branches: [develop, master]
pull_request:
branches: [ develop, master ]
branches: [develop, master]

jobs:
build:
lint_and_test:
runs-on: ${{ matrix.os }}
strategy:
matrix:
Expand All @@ -34,7 +34,7 @@ jobs:
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics --exclude kedro_mlflow/template/project/run.py
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --max-complexity=10 --max-line-length=127 --statistics --exclude kedro_mlflow/template/project/run.py
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics --exclude kedro_mlflow/template/project/run.py
- name: Test with pytest and generate coverage report
run: |
pytest --cov=./ --cov-report=xml
Expand All @@ -45,3 +45,10 @@ jobs:
file: ./coverage.xml
env_vars: OS,PYTHON
fail_ci_if_error: true
markdown-link-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: gaurav-nelson/github-action-markdown-link-check@v1
with:
config-file: 'mlc_config.json'
4 changes: 2 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
- `pipeline_ml_factory` now accepts that `inference` pipeline `inputs` may be in `training` pipeline `inputs` ([#71](https://github.com/Galileo-Galilei/kedro-mlflow/issues/71))
- `pipeline_ml_factory` now infer automatically the schema of the input dataset to validate data automatically at inference time. The output schema can be declared manually in `model_signature` argument ([#70](https://github.com/Galileo-Galilei/kedro-mlflow/issues/70))
- Add two DataSets for model logging and saving: `MlflowModelLoggerDataSet` and `MlflowModelSaverDataSet` ([#12](https://github.com/Galileo-Galilei/kedro-mlflow/issues/12))
- `MlflowPipelineHook` and `MlflowNodeHook` are now [auto-registered](https://kedro.readthedocs.io/en/latest/07_extend_kedro/04_hooks.html#registering-your-hook-implementations-with-kedro) if you use `kedro>=0.16.4` ([#29](https://github.com/Galileo-Galilei/kedro-mlflow/issues/29))
- `MlflowPipelineHook` and `MlflowNodeHook` are now [auto-registered](https://kedro.readthedocs.io/en/latest/07_extend_kedro/02_hooks.html#registering-your-hook-implementations-with-kedro) if you use `kedro>=0.16.4` ([#29](https://github.com/Galileo-Galilei/kedro-mlflow/issues/29))

### Fixed

Expand All @@ -61,7 +61,7 @@

### Removed

- `kedro mlflow init` command is no longer declaring hooks in `run.py`. You must now [register your hooks manually](docs/source/03_tutorial/02_setup.md#declaring-kedro-mlflow-hooks) in the `run.py` if you use `kedro>=0.16.0, <0.16.3` ([#62](https://github.com/Galileo-Galilei/kedro-mlflow/issues/62)).
- `kedro mlflow init` command is no longer declaring hooks in `run.py`. You must now [register your hooks manually](https://kedro-mlflow.readthedocs.io/en/stable/source/02_installation/02_setup.html#declaring-kedro-mlflow-hooks) in the `run.py` if you use `kedro>=0.16.0, <0.16.3` ([#62](https://github.com/Galileo-Galilei/kedro-mlflow/issues/62)).
- Remove `pipeline_ml` function which was deprecated in 0.3.0. It is now replaced by `pipeline_ml_factory` ([#105](https://github.com/Galileo-Galilei/kedro-mlflow/issues/105))
- Remove `MlflowDataSet` dataset which was deprecated in 0.3.0. It is now replaced by `MlflowArtifactDataSet` ([#105](https://github.com/Galileo-Galilei/kedro-mlflow/issues/105))

Expand Down
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

# What is kedro-mlflow?

``kedro-mlflow`` is a [kedro-plugin](https://kedro.readthedocs.io/en/stable/04_user_guide/10_developing_plugins.html) for lightweight and portable integration of [mlflow](https://mlflow.org/docs/latest/index.html) capabilities inside [kedro](https://kedro.readthedocs.io/en/stable/index.html) projects. It enforces [``Kedro`` principles](https://kedro.readthedocs.io/en/stable/12_faq/01_faq.html?highlight=principles#what-is-the-philosophy-behind-kedro) to make mlflow usage as production ready as possible. Its core functionalities are :
``kedro-mlflow`` is a [kedro-plugin](https://kedro.readthedocs.io/en/stable/07_extend_kedro/04_plugins.html) for lightweight and portable integration of [mlflow](https://mlflow.org/docs/latest/index.html) capabilities inside [kedro](https://kedro.readthedocs.io/en/stable/index.html) projects. It enforces [``Kedro`` principles](https://kedro.readthedocs.io/en/stable/12_faq/01_faq.html?highlight=principles#what-is-the-philosophy-behind-kedro) to make mlflow usage as production ready as possible. Its core functionalities are :

- **versioning**: `kedro-mlflow` intends to enhance reproducibility for machine learning experimentation. With `kedro-mlflow` installed, you can effortlessly register your parameters or your datasets with minimal configuration in a kedro run. Later, you will be able to browse your runs in the mlflow UI, and retrieve the runs you want. This is directly linked to [Mlflow Tracking](https://www.mlflow.org/docs/latest/tracking.html).
- **model packaging**: ``kedro-mlflow`` intends to be be an agnostic machine learning framework for people who want to write portable, production ready machine learning pipelines. It offers a convenient API to convert a Kedro pipeline to a ``model`` in the mlflow sense. Consequently, you can *API-fy* or serve your Kedro pipeline with one line of code, or share a model with without worrying of the preprocessing to be made for further use. This is directly linked to [Mlflow Models](https://www.mlflow.org/docs/latest/models.html).
Expand All @@ -39,19 +39,19 @@ If you want to use the ``develop`` version of the package which is the most up t
pip install --upgrade git+https://github.com/Galileo-Galilei/kedro-mlflow.git@develop
```

I strongly recommend to use ``conda`` (a package manager) to create an environment and to read [``kedro`` installation guide](https://kedro.readthedocs.io/en/stable/02_getting_started/01_prerequisites.html).
I strongly recommend to use ``conda`` (a package manager) to create an environment and to read [``kedro`` installation guide](https://kedro.readthedocs.io/en/latest/02_get_started/02_install.html).

# Getting started

The documentation contains:

- [A "hello world" example](https://kedro-mlflow.readthedocs.io/en/latest/source/02_hello_world_example/index.html) which demonstrates how you to **setup your project**, **version parameters** and **datasets**, and browse your runs in the UI.
- A more [detailed tutorial](https://kedro-mlflow.readthedocs.io/en/latest/source/03_tutorial/index.html) to show more advanced features (mlflow configuration through the plugin, package and serve a kedro ``Pipeline``...)
- [A "hello world" example](https://kedro-mlflow.readthedocs.io/en/stable/source/03_getting_started/index.html) which demonstrates how you to **setup your project**, **version parameters** and **datasets**, and browse your runs in the UI.
- A more [detailed tutorial](https://kedro-mlflow.readthedocs.io/en/stable/source/04_experimentation_tracking/index.html) to show more advanced features (mlflow configuration through the plugin, package and serve a kedro ``Pipeline``...)

Some frequently asked questions on more advanced features:

- You want to log additional metrics to the run? -> [Try ``MlflowMetricsDataSet``](https://kedro-mlflow.readthedocs.io/en/latest/source/03_tutorial/07_version_metrics.html) !
- You want to log nice dataviz of your pipeline that you register with ``MatplotlibWriter``? -> [Try ``MlflowArtifactDataSet`` to log any local files (.png, .pkl, .csv...) *automagically*](https://kedro-mlflow.readthedocs.io/en/latest/source/02_hello_world_example/02_first_steps.html#artifacts)!
- You want to log additional metrics to the run? -> [Try ``MlflowMetricsDataSet``](https://kedro-mlflow.readthedocs.io/en/stable/source/04_experimentation_tracking/05_version_metrics.html) !
- You want to log nice dataviz of your pipeline that you register with ``MatplotlibWriter``? -> [Try ``MlflowArtifactDataSet`` to log any local files (.png, .pkl, .csv...) *automagically*](https://kedro-mlflow.readthedocs.io/en/stable/source/04_experimentation_tracking/03_version_datasets.html)!
- You want to create easily an API to share your awesome model to anyone? -> [See if ``pipeline_ml_factory`` can fit your needs](https://github.com/Galileo-Galilei/kedro-mlflow/issues/16)
- You want to do something that is not straigthforward with current implementation? Open an issue, and let's see what happens!

Expand All @@ -65,7 +65,7 @@ This package is still in active development. We use [SemVer](https://semver.org/

The user must be aware that we will not reach `1.0.0` milestone before Kedro does (mlflow has already reached `1.0.0`).

If you want to see how to migrate from one version of `kedro-mlflow` to another, see the [migration guide](docs/source/03_tutorial/00_migration_guide.md).
If you want to see how to migrate from one version of `kedro-mlflow` to another, see the [migration guide](https://kedro-mlflow.readthedocs.io/en/stable/source/02_installation/03_migration_guide.html).

# Can I contribute?

Expand Down
2 changes: 1 addition & 1 deletion docs/source/01_introduction/01_introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ We discuss hereafter how the two libraries compete on the different functionalit
### Versioning: Kedro 1 - 1 Mlflow

The ``Kedro`` [``Journal`` aims at reproducibility](https://kedro.readthedocs.io/en/stable/04_user_guide/13_journal.html), but is not focused on machine learning. The `Journal` keeps track of two elements:
The ``Kedro`` [``Journal`` aims at reproducibility](https://kedro.readthedocs.io/en/latest/kedro.versioning.Journal.html), but is not focused on machine learning. The `Journal` keeps track of two elements:

- the CLI arguments, including *on the fly* parameters. This makes the command used to run the pipeline fully reproducible.
- the ``AbstractVersionedDataSet`` for which versioning is activated. It consists in copying the data whom ``versioned`` argument is ``True`` when the ``save`` method of the ``AbstractVersionedDataSet`` is called.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/02_installation/01_installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

I strongly recommend to use ``conda`` (a package manager) to create an environment in order to avoid version conflicts between packages.

I also recommend to read [Kedro installation guide](https://kedro.readthedocs.io/en/stable/02_getting_started/01_prerequisites.html) to set up your Kedro project.
I also recommend to read [Kedro installation guide](https://kedro.readthedocs.io/en/latest/02_get_started/02_install.html) to set up your Kedro project.

```console
conda create -n <your-environment-name> python=<3.[6-8].X>
Expand Down
6 changes: 3 additions & 3 deletions docs/source/02_installation/02_setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This section assume that [you have installed `kedro-mlflow` in your virtual envi

## Create a kedro project

This plugin must be used in an existing kedro project. If you do not have a kedro project yet, you can create it with ``kedro new`` command. [See the kedro docs for a tutorial](https://kedro.readthedocs.io/en/latest/02_getting_started/03_new_project.html).
This plugin must be used in an existing kedro project. If you do not have a kedro project yet, you can create it with ``kedro new`` command. [See the kedro docs for a tutorial](https://kedro.readthedocs.io/en/latest/02_get_started/04_new_project.html).

If you do not have a real-world project, you can use a kedro example and [follow the "Getting started" example](../03_getting_started/01_example_project.md) to make a demo of this plugin out of the box.

Expand Down Expand Up @@ -36,13 +36,13 @@ you should see the following message:

### Declaring kedro-mlflow hooks

``kedro_mlflow`` hooks implementations must be registered with Kedro. There are three ways of registring [hooks](https://kedro.readthedocs.io/en/latest/07_extend_kedro/04_hooks.html?highlight=hooks).
``kedro_mlflow`` hooks implementations must be registered with Kedro. There are three ways of registering [hooks](https://kedro.readthedocs.io/en/latest/07_extend_kedro/02_hooks.html).

**Note that you must register the two hooks provided by kedro-mlflow** (``MlflowPipelineHook`` and ``MlflowNodeHook``) for the plugin to work.

#### Declaring hooks through auto-discovery (for `kedro>=0.16.4`) [Default behaviour]

If you use `kedro>=0.16.4`, `kedro-mlflow` hooks are auto-registered automatically by default without any action from your side. You can [disable this behaviour](https://kedro.readthedocs.io/en/latest/07_extend_kedro/04_hooks.html#disable-auto-registered-plugins-hooks) in your `.kedro.yml` or your `pyproject.toml` file.
If you use `kedro>=0.16.4`, `kedro-mlflow` hooks are auto-registered automatically by default without any action from your side. You can [disable this behaviour](https://kedro.readthedocs.io/en/latest/07_extend_kedro/02_hooks.html#disable-auto-registered-plugins-hooks) in your `.kedro.yml` or your `pyproject.toml` file.

#### Declaring hooks through code, in ``ProjectContext`` (for `kedro>=0.16.0, <=0.16.3`)

Expand Down
2 changes: 1 addition & 1 deletion docs/source/02_installation/03_migration_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This page explains how to migrate an existing kedro project to a more up to date

### Catalog entries

Replace the follwoing entries:
Replace the following entries:

|old |new |
|:--------------------------------------|:------------------------------------------------|
Expand Down
2 changes: 1 addition & 1 deletion docs/source/03_getting_started/01_example_project.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ pip install kedro-mlflow==0.4.1

## Install the toy project

For this end to end example, we will use the [kedro starter](https://kedro.readthedocs.io/en/latest/02_getting_started/05_starters.html#creating-new-projects-with-kedro-starters) with the [iris dataset](https://github.com/quantumblacklabs/kedro-starter-pandas-iris).
For this end to end example, we will use the [kedro starter](https://kedro.readthedocs.io/en/latest/02_get_started/06_starters.html) with the [iris dataset](https://github.com/quantumblacklabs/kedro-starter-pandas-iris).

We use this project because:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ The hook **detects parameters through their prefix ``params:`` or the value ``pa

### How can I register a parameter if I use a ``TemplatedConfigLoader``?

If you [use a ``TemplatedConfigLoader``](https://kedro.readthedocs.io/en/stable/04_user_guide/03_configuration.html?highlight=TemplatedConfigLoader#templating-configuration) to enable dynamic parameters contruction at runtime or dependency between configuration files, and if we assume your ``src/<project-name>/run.py`` file looks like:
If you [use a ``TemplatedConfigLoader``](https://kedro.readthedocs.io/en/latest/04_kedro_project_setup/02_configuration.html#templating-configuration) to enable dynamic parameters contruction at runtime or dependency between configuration files, and if we assume your ``src/<project-name>/run.py`` file looks like:

```python
from kedro.config import TemplatedConfigLoader # new import
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Versioning Kedro DataSets

## What is artifact tracking?

Mlflow defines artifacts as "any data a user may want to track during code execution". This includes, but is not limited to:
Expand All @@ -13,7 +14,7 @@ Artifacts are a very flexible and convenient way to "bind" any data type to your

## How to version data in a kedro project?

kedro-mlflow introduces a new ``AbstractDataSet`` called ``MlflowArtifactDataSet``. It is a wrapper for any ``AbstractDataSet`` which decorates the underlying dataset ``save`` method and logs the file automatically in mlflow as an artifact each time the ``save`` method is called.
``kedro-mlflow`` introduces a new ``AbstractDataSet`` called ``MlflowArtifactDataSet``. It is a wrapper for any ``AbstractDataSet`` which decorates the underlying dataset ``save`` method and logs the file automatically in mlflow as an artifact each time the ``save`` method is called.

Since it is an ``AbstractDataSet``, it can be used with the YAML API. Assume that you have the following entry in the ``catalog.yml``:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ my_sklearn_model:
flavor: mlflow.sklearn
```
More informations on available parameters are available in the [dedicated section](docs\source\05_python_objects\01_DataSets.md#mlflowmodelloggerdataset).
More informations on available parameters are available in the [dedicated section](../07_python_objects/01_DataSets.md#mlflowmodelloggerdataset).
You are now able to use ``my_sklearn_model`` in your nodes. Since this model is registered in mlflow, you can also leverage the [mlflow model serving abilities](https://www.mlflow.org/docs/latest/cli.html#mlflow-models-serve) or [predicting on batch abilities](https://www.mlflow.org/docs/latest/cli.html#mlflow-models-predict), as well as the [mlflow models registry](https://www.mlflow.org/docs/latest/model-registry.html) to manage the lifecycle of this model.
Expand Down
6 changes: 6 additions & 0 deletions mlc_config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"aliveStatusCodes": [
429,
200
]
}

0 comments on commit ef3b0e2

Please sign in to comment.