-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a MlflowModelDataSet #12
Comments
Hi @Galileo-Galilei, I'd like to work on this issue. I'm thinking that a Data Catalog entry could look like this: my_model:
type: kedro_mlflow.io.MlflowModel
flavor: mlflow.<flavor>
path: data/06_models/my_model
save_model_args:
...
load_model_args:
...
log_model_args:
... What do you think? The first argument to save/log is usually a model and the second is a path so those would be populated automatically by The only tricky parts would be a |
Hello @kaemo, nice to see you back there! I totally agree, this feature is really missing to the package and I was going to address it but please go ahead. Some remarks / questions:
My guess is that end user expect that
Your catalog entry would look like this: my_model:
type: kedro_mlflow.io.MlflowModelDataSet # add DataSet for consistency
flavor: <flavor> # remove "mlflow."
path: data/06_models/my_model
load_args: # arguments to be passed to mlflow.<flavor>.load_model()
...
save_args: # arguments to be passed to mlflow.<flavor>.log_model()
...
# remove the arguments to be passed to save model What do you think? |
Happy to have more time to work on
Agreed, I'll name it
I thought about this parameter as a Python module that provides load/log functions (built-in MLflow or custom flavor). So it could be
I agree, let's go with |
@Galileo-Galilei, there's one thing that is problematic with the above approach. After Suppose we have a node lgbm:
type: kedro_mlflow.io.MlflowModelDataSet
flavor: mlflow.lightgbm
path: data/06_models/lgbm It is logged using I see two possible solutions:
|
Regarding the "mlflow" prefix
Totally agree, I have the very same idea in mind. That said,
I though we could append the Loading model from mlflowI had this problem in mind. This is indeed a design pattern, and we can choose between 2 solutions:
AFAIK, you can reconstruct the model uri by retrieving the run object: RUN_ID = self.load_args.get("run_id") or mlflow.active_run().info.run_id # either run id is provided or use the current one during pipeline execution (you don't know the run_id beffore running the pipeline)
mlflow_client=MlflowClient() # use internally mlflow.get_tracking_uri() previously set by the conf
run = mlflow_client.get_run(RUN_ID)
loaded_model = mlflow.xgboost.load_model(f"{run.info.artifact_uri}/{self.path}") # will download the model locally on a temp path (self.path="remote/model/folder/architecture", likely just "model") (I have just tried it out, it seems to work)
|
I've been writing the documentation for the connected PR #56 and I realized that there's a problematic case that needs some thought - the The challenge lies in passing either a There are two solutions that I see:
It allows for a fully declarative approach but adds complexity to my_custom_model:
type: kedro_mlflow.io.MlflowModelDataSet
flavor: mlflow.pyfunc
path: data/06_models/my_custom_model
save_args:
python_model: my_package.mlflow_models.MyPyfuncModel
This would be rather tricky because the @Galileo-Galilei, @akruszewski Let me know what you think guys. Maybe you have yet another idea. I'm leaning more towards the first solution as it seems less complicated and makes it clear to the user that the whole model specification is contained in the Data Catalog. In the case of the second solution merging Note: I've also checked other built-in flavors for arguments that are not basic data types supported by YAML and there are more like |
It needs some deeper thoughts to see if something better can come out, but the second solution is a no go for me (even if I understand that it should work):
On the other hand, the first solution seems reasonable (after all, it is what kedro itself does to instantiate the dataset). If arguments are needed, a custom |
I totally agree with @Galileo-Galilei and your opinion @kaemo that the first option is a better solution. As @Galileo-Galilei mentions, it would be better to pass constructor arguments in a separate argument ( |
The current version of
kedro-mlflow
only enable to log a a full pipeline as a mlflow model through theKedroPipelineModel
class. It would be useful to create anAbstractDataSet
class in order to enable model logging within the ``catalog.yml`.The text was updated successfully, but these errors were encountered: