Auto-pickle parameters in pipeline_ml_factory #158

Galileo-Galilei · 2021-01-25T22:40:25Z

Description

When you create a PipelineML (likely with pipeline_ml_factory), it the inference pipeline will be automatically logged as a mlflow model at the end of the training pipeline execution. All persisted datasets will be automatically logged as artifacts if they are inputs of the inference pipeline. Parameters are not persisted (they are MemoryDataset from Kedro's point of view), and thus cannot be reused in the inference pipeline. You must convert them in a persistable format (csv, yml, txt, json, pkl...) and add them to the catalog if you want to reuse them for inference.

Context

Why is this change important to you? How would you use it? How can it benefit other users?

Currently if your inference pipeline shares parameters with the training pipeline (this is quite common, e.g. the number of words in your vocabulary for nlp tasks), you got an error message which ask you to persist your parameter in the catalog.yml. This makes sense from mlflow point of view, but is very confusing for users, since the parameters are already logged for tracking. Many people don't know / understand that mlflow only tracks these parameters but convert them as string and does not enforce their reuse.

For clarity, it would be much easier if parameters were automatically persisted, e.g. as PickleDataSet (and this would be much more intuitive for end users).

Possible Implementation

Add a check in the MlflowPipelineHook.after_pipeline_run hook to test for the presence of parameters, and convert them as pickle when saving the model.

The text was updated successfully, but these errors were encountered:

…ineML

…lineML

Galileo-Galilei self-assigned this Jan 25, 2021

Galileo-Galilei added the enhancement New feature or request label Jan 25, 2021

Galileo-Galilei added this to the Release 0.5.0 milestone Jan 25, 2021

Galileo-Galilei added a commit that referenced this issue Feb 20, 2021

FIX #158 - Autopickle input parameters of inference pipeline in Pipel…

5d0fb2d

…ineML

Galileo-Galilei mentioned this issue Feb 20, 2021

FIX #158 - Autopickle input parameters of inference pipeline in PipelineML #165

Merged

6 tasks

Galileo-Galilei added a commit that referenced this issue Feb 21, 2021

FIX #158 - Auto-pickle parameters of the inference pipeline of a Pipe…

d25cf90

…lineML

Galileo-Galilei added a commit that referenced this issue Feb 21, 2021

FIX #158 - Auto-pickle parameters of the inference pipeline of a Pipe…

b5f2eb5

…lineML

Galileo-Galilei added a commit that referenced this issue Feb 21, 2021

FIX #158 - Auto-pickle parameters of the inference pipeline of a Pipe…

ea028a7

…lineML

Galileo-Galilei closed this as completed in #165 Feb 21, 2021

Galileo-Galilei added a commit that referenced this issue Feb 21, 2021

FIX #158 - Auto-pickle parameters of the inference pipeline of a Pipe…

7864f49

…lineML

Mercurrent mentioned this issue Feb 3, 2022

Auto-pickle parameters on modelify calls #282

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-pickle parameters in pipeline_ml_factory #158

Auto-pickle parameters in pipeline_ml_factory #158

Galileo-Galilei commented Jan 25, 2021

Auto-pickle parameters in pipeline_ml_factory #158

Auto-pickle parameters in pipeline_ml_factory #158

Comments

Galileo-Galilei commented Jan 25, 2021

Description

Context

Possible Implementation