Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-pickle parameters in pipeline_ml_factory #158

Closed
Galileo-Galilei opened this issue Jan 25, 2021 · 0 comments · Fixed by #165
Closed

Auto-pickle parameters in pipeline_ml_factory #158

Galileo-Galilei opened this issue Jan 25, 2021 · 0 comments · Fixed by #165
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@Galileo-Galilei
Copy link
Owner

Description

When you create a PipelineML (likely with pipeline_ml_factory), it the inference pipeline will be automatically logged as a mlflow model at the end of the training pipeline execution. All persisted datasets will be automatically logged as artifacts if they are inputs of the inference pipeline. Parameters are not persisted (they are MemoryDataset from Kedro's point of view), and thus cannot be reused in the inference pipeline. You must convert them in a persistable format (csv, yml, txt, json, pkl...) and add them to the catalog if you want to reuse them for inference.

Context

Why is this change important to you? How would you use it? How can it benefit other users?

Currently if your inference pipeline shares parameters with the training pipeline (this is quite common, e.g. the number of words in your vocabulary for nlp tasks), you got an error message which ask you to persist your parameter in the catalog.yml. This makes sense from mlflow point of view, but is very confusing for users, since the parameters are already logged for tracking. Many people don't know / understand that mlflow only tracks these parameters but convert them as string and does not enforce their reuse.

For clarity, it would be much easier if parameters were automatically persisted, e.g. as PickleDataSet (and this would be much more intuitive for end users).

Possible Implementation

Add a check in the MlflowPipelineHook.after_pipeline_run hook to test for the presence of parameters, and convert them as pickle when saving the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
1 participant