-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor mlflow configs #77
Comments
Yes, I totally consider doing this. However credentials management can alsobe done at DataSet levels. We had uses cases when we wanted, inside a run (in a mlflow development database), to retrieve a model / an artifact from a different mlflow instance (a production one), for instance to combine/ compare two models. For this, we need to enable credentials management at DataSet level, something like: In credentials.yml # credentials.yml
kedro_mlflow:
<your idea here>
mlflow_creds1:
AWS_ACCESS_KEY_ID: <another password> In catalog.yml # catalog.yml
dataset_to_retrieve:
type: MlflowArtifactDataSet
load_args:
run_id: 123456798
credentials: my_mlflow_creds1
data_set:
type: pandas.CSVDataSet
load_args:
sep: ";"
This needs to be thoroughly designed before we freeze the way to perform such an operation. |
We are in front of two use cases of mlflow : 1 - mlflow as a tracking engineIn this use case
2 - Mlflow as a DatabaseHere we just leverage catalog and DataSets kedro mechanisms. In a regular
We just have to avoid starting an mlflow run when using it as a database. So from configuration perspective, your use case is natively possible with kedro. All the questions are redirected to MlflowArtifactDataSet implementation itself. In both cases credentials management is done at |
I have refactored the
Regarding the refactoring, i'd prefer explicit reference to mlflow objects for better comprehension, something like: server:
mlflow_tracking_uri: xxx
mlflow_model_registry: xxx
mlflow_artifact_store: xxx
credentials: xxx
entities:
experiment:
name: your_experiment_name
create: True
run:
id: null # if `id` is None, a new run will be created
name: null # if `name` is None, pipeline name will be used for the run name
nested: True
tracking:
params:
dict_params: # extra level not needed, but it will simplify refactoring in the future?
flatten:True
recursive: True
sep: "-"
metrics: # maybe one day, for some autologging?
tags: # maybe one day if we find a convenient API?
models: # maybe one day for autlogging? After all, it does not make sense to make a reference to hooks since users are not aware of what they do. The "functional" part is always related to mlflow, not to Kedro. |
As you know, the communication flow between the kedro app and mlflow can be represented as follows
In a ddition of what we already have in mlflow.yml, we can add some configuration entries for users in order to let them configure their connection to mlflow tracking and the artifact store (the red flows in the figure). these inputs may be different depending on the user's installation.
In this example we suppose that user have an mlflow tracking server with a Basic authentication and an artifact store on S3.
mlflow.hml
credentials.yml
That way we leverage the multi environment configs mechanisms that kedro offer, and at the same time, we are making progress in making the use of mlflow easier and more fluid for our users.
kedro-mlflow hooks can easily access to those configs and export them as environ.
Fix #31 and #15
Let me know what you think
The text was updated successfully, but these errors were encountered: