Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the KedroPipelineModel more portable #67

Closed
takikadiri opened this issue Sep 20, 2020 · 1 comment · Fixed by #265
Closed

Make the KedroPipelineModel more portable #67

takikadiri opened this issue Sep 20, 2020 · 1 comment · Fixed by #265
Assignees
Labels
enhancement New feature or request need-design-decision Several ways of implementation are possible and one must be chosen
Milestone

Comments

@takikadiri
Copy link
Collaborator

takikadiri commented Sep 20, 2020

Hi,

When logging the KedroPipelineModel, we try to log everything the model need, so a downstream tool can recreate an appropriate python environnement for running it. The elements that it need are :

python version

Can be easily infered

pickle version

Can be easily infered

artifacts

We manage well this part by getting appopriate ML Datasets from pipeline_ml

conda_env

What we can do

  • User give us the path to his requirements file in the mlflow config file or create dynamically a dict or list and pass it as a parameter of pipeline_ml (the only two exposed part of our plugin). If the user do not give the conda_env we pass to the second attemp.
  • We use the setuptools to get the requires of the project. If the setup is not used by the sur, we pass to the last attemp
  • We do a "pip freeze" of his environment OR We do nothing and keep conda_env blank

project source code

The mlflow pyfunc model (KedroPipelineModel) need the project src package code being present in the PYTHONPATH/sys path in order to load the model pickle.

Today we expect the user to package and store their source code as python package and adding it as dependencies in his conda_env. So he can use his model in another machine (or environnment).
Despite it's a good practise to treat the source code as a python package, that add some setup overhead for the user, and some of them just want it to worker out of the box. Bundling the source code with the model will streamline the user experience

We can pass the project package source code at the logging time see here

Mlflow will prepended the source code paths to the system path before the model is loaded.

That will prevent issues like this one

@Galileo-Galilei Galileo-Galilei added the enhancement New feature or request label Sep 23, 2020
@Galileo-Galilei
Copy link
Owner

Galileo-Galilei commented Sep 29, 2020

We have to make some tests to see how much it improves portability but this is indeed a very common problem for users of the pipeline_ml function: they need to install their own package with pip to make the model work and (surprisingly!) they often do not do understand this. Your suggestion is reasonnable but I don't know if it is a good practice. I suggest that we do some tests on uor side to check if it really enhances the experience. This may be an argument of pipeline_ml too to let more advanced users be cleaner but helps beginners to iterate faster.

I don't put it in the 0.4.0 milestone because I think we really need to think more about this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request need-design-decision Several ways of implementation are possible and one must be chosen
Projects
None yet
2 participants