Make the KedroPipelineModel more portable #67

takikadiri · 2020-09-20T14:13:47Z

Hi,

When logging the KedroPipelineModel, we try to log everything the model need, so a downstream tool can recreate an appropriate python environnement for running it. The elements that it need are :

python version

Can be easily infered

pickle version

Can be easily infered

artifacts

We manage well this part by getting appopriate ML Datasets from pipeline_ml

conda_env

What we can do

User give us the path to his requirements file in the mlflow config file or create dynamically a dict or list and pass it as a parameter of pipeline_ml (the only two exposed part of our plugin). If the user do not give the conda_env we pass to the second attemp.
We use the setuptools to get the requires of the project. If the setup is not used by the sur, we pass to the last attemp
We do a "pip freeze" of his environment OR We do nothing and keep conda_env blank

project source code

The mlflow pyfunc model (KedroPipelineModel) need the project src package code being present in the PYTHONPATH/sys path in order to load the model pickle.

Today we expect the user to package and store their source code as python package and adding it as dependencies in his conda_env. So he can use his model in another machine (or environnment).
Despite it's a good practise to treat the source code as a python package, that add some setup overhead for the user, and some of them just want it to worker out of the box. Bundling the source code with the model will streamline the user experience

We can pass the project package source code at the logging time see here

Mlflow will prepended the source code paths to the system path before the model is loaded.

That will prevent issues like this one

Galileo-Galilei · 2020-09-29T20:11:50Z

We have to make some tests to see how much it improves portability but this is indeed a very common problem for users of the pipeline_ml function: they need to install their own package with pip to make the model work and (surprisingly!) they often do not do understand this. Your suggestion is reasonnable but I don't know if it is a good practice. I suggest that we do some tests on uor side to check if it really enhances the experience. This may be an argument of pipeline_ml too to let more advanced users be cleaner but helps beginners to iterate faster.

I don't put it in the 0.4.0 milestone because I think we really need to think more about this one.

…odel and KedroPipelineModel (#67)

Galileo-Galilei added the enhancement New feature or request label Sep 23, 2020

Galileo-Galilei added this to the Release 0.5.0 milestone Oct 18, 2020

Galileo-Galilei added the need-design-decision Several ways of implementation are possible and one must be chosen label Oct 18, 2020

Galileo-Galilei modified the milestones: Release 0.5.0, Release 0.6.0 Jan 25, 2021

Galileo-Galilei modified the milestones: Release 0.6.0, Release 0.6.1 Feb 21, 2021

Galileo-Galilei modified the milestones: Release 0.7.1, Release 0.7.2 Apr 10, 2021

Galileo-Galilei self-assigned this Sep 2, 2021

Galileo-Galilei mentioned this issue Nov 7, 2021

The kedro-mlflow version used to create a KedroPipelineModel should be enforced #104

Closed

Galileo-Galilei added a commit that referenced this issue Nov 9, 2021

✨ 💥 Change PipelineML signature to pass kwargs to mlflow.pyfunc.log_m…

d6e5e01

…odel and KedroPipelineModel (#67)

Galileo-Galilei mentioned this issue Nov 9, 2021

✨ 💥 Change PipelineML signature to pass kwargs to mlflow.pyfunc.log_model and KedroPipelineModel #265

Merged

6 tasks

Galileo-Galilei added a commit that referenced this issue Nov 10, 2021

✨ 💥 Change PipelineML signature to pass kwargs to mlflow.pyfunc.log_m…

1a174d3

…odel and KedroPipelineModel (#67)

Galileo-Galilei closed this as completed in #265 Nov 11, 2021

Galileo-Galilei added a commit that referenced this issue Nov 11, 2021

✨ 💥 Change PipelineML signature to pass kwargs to mlflow.pyfunc.log_m…

b71ecf2

…odel and KedroPipelineModel (#67)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the KedroPipelineModel more portable #67

Make the KedroPipelineModel more portable #67

takikadiri commented Sep 20, 2020 •

edited

Loading

Galileo-Galilei commented Sep 29, 2020 •

edited

Loading

Make the KedroPipelineModel more portable #67

Make the KedroPipelineModel more portable #67

Comments

takikadiri commented Sep 20, 2020 • edited Loading

python version

pickle version

artifacts

conda_env

project source code

Galileo-Galilei commented Sep 29, 2020 • edited Loading

takikadiri commented Sep 20, 2020 •

edited

Loading

Galileo-Galilei commented Sep 29, 2020 •

edited

Loading