-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚧 Adds MLflow materializer #358
🚧 Adds MLflow materializer #358
Conversation
how to save a model into mlflow: https://mlflow.org/docs/latest/quickstart.html#store-a-model-in-mlflow |
model flavors can be found here or below (but missing crate?) >>> import mlflow
>>> mlflow.__version__
'2.7.1'
>>> [attr for attr in dir(mlflow) if hasattr(getattr(mlflow, attr), 'log_model')]
[
'catboost', 'diviner', 'fastai', 'gluon', 'h2o', 'johnsnowlabs', 'langchain',
'lightgbm', 'mleap', 'onnx', 'openai', 'paddle', 'pmdarima', 'prophet',
'pyfunc', 'pytorch', 'sentence_transformers', 'sklearn', 'spacy', 'spark',
'statsmodels', 'tensorflow', 'transformers', 'xgboost'
] top three flavors (probably): sklearn, tensorflow, pytorch. no hard data, just vibes. |
example of load/save flow for sklearn model flavor from mlflow quickstart. from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
import mlflow
from mlflow.models import infer_signature
run_id = None
db = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)
with mlflow.start_run() as run:
rf = RandomForestRegressor(n_estimators=100, max_depth=6, max_features=3)
rf.fit(X_train, y_train)
save_predictions = rf.predict(X_test)
signature = infer_signature(X_test, save_predictions)
mlflow.sklearn.log_model(rf, "model", signature=signature)
run_id = run.info.run_id
model = mlflow.sklearn.load_model(f"runs:/{run_id}")
load_predictions = model.predict(X_test)
assert save_predictions == load_predictions disclaimer: I have not tested this |
This comment was marked as off-topic.
This comment was marked as off-topic.
@bryangalindo we should come up with the Hamilton UX to help guide this. i.e. what's the API we want to expose for Hamilton? |
Ok let's chat during our sync. Thanks! |
High-level tasks:Analysis:
Reader/Writer Development:
Materializer Development:
|
d0ee86e
to
0fc5dac
Compare
Hey @bryangalindo -- a thought on a feature that might be helpful. Here's an outline of what the API should look like -- the data saver/materialization implementatino should support this. from hamilton.function_modifiers import source
dr = driver.Driver(...)
dr.materialize(
to.mlflow(
id="mlflow_save",
dependencies=["my_cool_model"],
model_input=source("training_data"),
model_output=source("predictions"),
)
) Then the materializer would call
and possibly more connections. Does this make sense? This is all supported btw -- materializers can take in |
Closed in favor of #945 |
🚧 WIP 🚧
Changes
How I tested this
Notes
Checklist