You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When you create a PipelineML (likely with pipeline_ml_factory), it the inference pipeline will be automatically logged as a mlflow model at the end of the training pipeline execution. All persisted datasets will be automatically logged as artifacts if they are inputs of the inference pipeline. Parameters are not persisted (they are MemoryDataset from Kedro's point of view), and thus cannot be reused in the inference pipeline. You must convert them in a persistable format (csv, yml, txt, json, pkl...) and add them to the catalog if you want to reuse them for inference.
Context
Why is this change important to you? How would you use it? How can it benefit other users?
Currently if your inference pipeline shares parameters with the training pipeline (this is quite common, e.g. the number of words in your vocabulary for nlp tasks), you got an error message which ask you to persist your parameter in the catalog.yml. This makes sense from mlflow point of view, but is very confusing for users, since the parameters are already logged for tracking. Many people don't know / understand that mlflow only tracks these parameters but convert them as string and does not enforce their reuse.
For clarity, it would be much easier if parameters were automatically persisted, e.g. as PickleDataSet (and this would be much more intuitive for end users).
Possible Implementation
Add a check in the MlflowPipelineHook.after_pipeline_run hook to test for the presence of parameters, and convert them as pickle when saving the model.
The text was updated successfully, but these errors were encountered:
Description
When you create a PipelineML (likely with pipeline_ml_factory), it the inference pipeline will be automatically logged as a mlflow model at the end of the training pipeline execution. All persisted datasets will be automatically logged as artifacts if they are inputs of the inference pipeline. Parameters are not persisted (they are MemoryDataset from Kedro's point of view), and thus cannot be reused in the inference pipeline. You must convert them in a persistable format (csv, yml, txt, json, pkl...) and add them to the catalog if you want to reuse them for inference.
Context
Currently if your inference pipeline shares parameters with the training pipeline (this is quite common, e.g. the number of words in your vocabulary for nlp tasks), you got an error message which ask you to persist your parameter in the catalog.yml. This makes sense from mlflow point of view, but is very confusing for users, since the parameters are already logged for tracking. Many people don't know / understand that mlflow only tracks these parameters but convert them as string and does not enforce their reuse.
For clarity, it would be much easier if parameters were automatically persisted, e.g. as PickleDataSet (and this would be much more intuitive for end users).
Possible Implementation
Add a check in the
MlflowPipelineHook.after_pipeline_run
hook to test for the presence of parameters, and convert them as pickle when saving the model.The text was updated successfully, but these errors were encountered: