KedroPipelineModel requires unnecessary pipeline input dependencies to be executed #273

Debbby57 · 2021-12-20T15:45:42Z

Description

the KedroPipelineModel has a initial_catalog property which causes some problems. This initial_catalog can contain some Kedro Datasets but it's not necessary to log them when you train your model. because of this property I can't load my model anymore. I have to train it again.

I explain : when I trained my model I used a kedro home-made plugin to load a specific dataset (which has no impact for my model). After that, I updated this plugin independently of my ML project. Today, I want to load my model but I can't because the load function uses the old Kedro Catalog with my old plugin version which is not in my environnement anymore.

Context

It would be great if we can update the kedro-catalog (only dataset and not the artifacts for the model of course !) without having to retrain our models.

Possible Implementation

Log in Mlflow what is only necessary.

I hope my issue is clear.

thank you

The text was updated successfully, but these errors were encountered:

Galileo-Galilei · 2022-01-03T21:37:02Z

Hi, I can reproduce the issue, thank you very much for the feedback. To clarify, what happens here is the following:

the input of your inference pipeline is persisted in Kedro because you load it from the disk (e.g., pandas.ExcelDataSet)
after you log it in mlflow, it will be converted to a MemoryDataSet, and you directly pass a pandas Dataframe when you want to reuse it. Mlflow complains that you need to have openpyxl installed, while you never use it in your pipeline, and you don't need it to predict.

This extra dependency is not useful as you mention. I will remove it in a patch release soon.

Galileo-Galilei · 2022-02-09T22:45:48Z

For anyone having the same issue, notice that you can now export a pipeline as a mlflow model with the kedro mlflow modelify command.

#273)

Galileo-Galilei self-assigned this Jan 3, 2022

Galileo-Galilei added the bug Something isn't working label Jan 3, 2022

Galileo-Galilei changed the title ~~problem with the initial_catalog property of KedroPipelineModel~~ KedroPipelineModel requires unnecessary pipeline input dependencies to be executed Feb 9, 2022

Galileo-Galilei added a commit that referenced this issue Feb 13, 2022

🐛 Remove unnecessary dependency to input dataset in KedroPipelineModel (

f8691f3

#273)

Galileo-Galilei mentioned this issue Feb 13, 2022

Remove unnecessary dependency to input dataset in KedroPipelineModel #288

Merged

6 tasks

Galileo-Galilei closed this as completed in #288 Feb 13, 2022

Galileo-Galilei added a commit that referenced this issue Feb 13, 2022

🐛 Remove unnecessary dependency to input dataset in KedroPipelineModel (

c630e6b

#273)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KedroPipelineModel requires unnecessary pipeline input dependencies to be executed #273

KedroPipelineModel requires unnecessary pipeline input dependencies to be executed #273

Debbby57 commented Dec 20, 2021

Galileo-Galilei commented Jan 3, 2022

Galileo-Galilei commented Feb 9, 2022

KedroPipelineModel requires unnecessary pipeline input dependencies to be executed #273

KedroPipelineModel requires unnecessary pipeline input dependencies to be executed #273

Comments

Debbby57 commented Dec 20, 2021

Description

Context

Possible Implementation

Galileo-Galilei commented Jan 3, 2022

Galileo-Galilei commented Feb 9, 2022