Enable pipeline_ml to share inputs between inference and training pipelines #71

Galileo-Galilei · 2020-09-29T19:54:37Z

Notations:

training is the kedro.pipeline.Pipeline object passed as argument "training" of the pipeline_ml function
inference is the kedro.pipeline.Pipeline object passed as argument "inference" of the pipeline_ml function

Currently, inference.inputs() are forced to be part of training.all_outputs(). however in some situations the two pipelines may share some inputs too. For instance for some NLP models, a processing part include a list of stopwords to remove and these stopwords are inputs (and the same) both for training and inference.

Important points:

parameters are not persisted, if parameters are shared as inputs, they must be persited when packaging the model (as PickleDataSet for instance).

The text was updated successfully, but these errors were encountered:

takikadiri · 2020-10-04T12:44:00Z

The concepts of function, input and output are clearly not suffisent to express an ML pipeline.
An ML pipeline introduce naturally a concept of model, which is input and output at the same time. In your situation here, the parameters are 'models' too because they will be fitted to the Data.
An ML pipeline introduce also a concept of say ml_processor which contain the fit and predict logic at the same node unit.

Actually, the post-construction of the pipeline_ml from regular kedro pipeline work well for advanced user, even if it
requires a considerable cognitive effort.

We should consider building a pipeline_ml API in the futur (probably not 0.4.0) that help user building their pipeline_ml using some ml computing concept (fit, predict, transform, ...), kedro-mlflow will translate that in backend to regular kedro pipelines.

But for now, i agree, we can just add the possibility to add parameters to inference inputs, kedro-mlflow will pickle them and packge them inside model artifacts at training time.

…aining pipelines

Galileo-Galilei · 2020-10-20T21:25:48Z

I suggest that we don't persist parameters under the hood to avoid side effects: if a user want to use a shared parameter as an input for both training and inference, he must persist it voluntarily (either as the input or output of the shared node).

…aining pipelines

takikadiri · 2020-10-25T20:54:13Z

We can drop the raise of KedroMlflowPipelineMLDatasetsError , exclusively for "parameters" and "params: xxx" and persist them under the hood.
It's counter intuitive for users to persist params. Moreover, there is some use case where a shared (training+inference) nodes have some params in inputs. The user cannot easily provide a pickleDataSet in this use cases.

…aining pipelines

Galileo-Galilei self-assigned this Sep 29, 2020

Galileo-Galilei added the enhancement New feature or request label Sep 29, 2020

Galileo-Galilei added this to the Release 0.4.0 milestone Sep 29, 2020

Galileo-Galilei added a commit that referenced this issue Oct 20, 2020

FIX #71 - Enable pipeline_ml to share inputs between inference and tr…

d5e873e

…aining pipelines

Galileo-Galilei added a commit that referenced this issue Oct 20, 2020

FIX #71 - Enable pipeline_ml to share inputs between inference and tr…

c4a37c6

…aining pipelines

Galileo-Galilei added a commit that referenced this issue Oct 20, 2020

FIX #71 - Enable pipeline_ml to share inputs between inference and tr…

e30f707

…aining pipelines

Galileo-Galilei mentioned this issue Oct 20, 2020

Feature/pipeline ml inputs #101

Merged

6 tasks

Galileo-Galilei added a commit that referenced this issue Oct 25, 2020

FIX #71 - Enable pipeline_ml to share inputs between inference and tr…

ca7d520

…aining pipelines

Galileo-Galilei added a commit that referenced this issue Oct 26, 2020

FIX #71 - Enable pipeline_ml to share inputs between inference and tr…

7b8393c

…aining pipelines

takikadiri closed this as completed in #101 Oct 27, 2020

takikadiri pushed a commit that referenced this issue Oct 27, 2020

FIX #71 - Enable pipeline_ml to share inputs between inference and tr…

08f0645

…aining pipelines

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable pipeline_ml to share inputs between inference and training pipelines #71

Enable pipeline_ml to share inputs between inference and training pipelines #71

Galileo-Galilei commented Sep 29, 2020

takikadiri commented Oct 4, 2020 •

edited

Loading

Galileo-Galilei commented Oct 20, 2020

takikadiri commented Oct 25, 2020

Enable pipeline_ml to share inputs between inference and training pipelines #71

Enable pipeline_ml to share inputs between inference and training pipelines #71

Comments

Galileo-Galilei commented Sep 29, 2020

takikadiri commented Oct 4, 2020 • edited Loading

Galileo-Galilei commented Oct 20, 2020

takikadiri commented Oct 25, 2020

takikadiri commented Oct 4, 2020 •

edited

Loading