First, make sure you have followed the setup instructions for ModelDB and have built the client.
import edu.mit.csail.db.ml.modeldb.client._
import edu.mit.csail.db.ml.modeldb.client.ModelDbSyncer._
ModelDBSyncer is the object that logs models and operations to the ModelDB backend. You can initialize the syncer either from a config file (e.g. modeldb/client/syncer.json) or explicitly via arguments.
// initialize syncer from config file
ModelDbSyncer.setSyncer(new ModelDbSyncer(SyncerConfig(path_to_config)))
OR
// initialize syncer explicitly
ModelDbSyncer.setSyncer(
// what project are you working on
new ModelDbSyncer(projectConfig = NewOrExistingProject(
"Demo", // project name
"modeldbuser", // user name
"Project to hold all models from the demo" // project description
),
experimentConfig = new DefaultExperiment,
experimentRunConfig = new NewExperimentRun
)
)
Next, when you want to log an operation to ModelDB, use the ModelDB sync variants of functions. So the original fit calls from spark.ml would turn into fitSync, save calls would turn into saveSync and so on.
val logReg = new LogisticRegression()
val logRegModel = logReg.fitSync(trainDf)
val predictions = logRegModel.transformSync(test)
logRegModel.saveSync("simple_lr")
Use the ModelDB metrics class (SyncableMetrics) or use the spark Evaluator classes with the evaluateSync method.
val metrics = SyncableMetrics.ComputeMulticlassMetrics(lrModel, predictions, labelCol, predictionCol)
OR
val evaluator = new BinaryClassificationEvaluator()
val metric = evaluator.evaluateSync(predictions, logRegModel)
The full code for this example can be found here.
Be sure to link the client library built above to your code (e.g. by adding to your classpath).
That's it! Explore the models you built in your workflow at http://localhost:3000.
More complex spark.ml workflows using ModelDB are located here and here.