Getting Started with ModelDB on spark.ml

1. Setup

First, make sure you have followed the setup instructions for ModelDB and have built the client.

2. Incorporate ModelDB into an ML workflow

a. Import the ModelDB client library classes

import edu.mit.csail.db.ml.modeldb.client._
import edu.mit.csail.db.ml.modeldb.client.ModelDbSyncer._

b. Create a ModelDB syncer

ModelDBSyncer is the object that logs models and operations to the ModelDB backend. You can initialize the syncer either from a config file (e.g. modeldb/client/syncer.json) or explicitly via arguments.

// initialize syncer from config file
ModelDbSyncer.setSyncer(new ModelDbSyncer(SyncerConfig(path_to_config)))

OR

// initialize syncer explicitly
ModelDbSyncer.setSyncer(
      // what project are you working on
      new ModelDbSyncer(projectConfig = NewOrExistingProject(
        "Demo", // project name
        "modeldbuser", // user name
        "Project to hold all models from the demo" // project description
      ),
      experimentConfig = new DefaultExperiment,
      experimentRunConfig = new NewExperimentRun
      )
    )

c. Log models and pre-processing operations

Next, when you want to log an operation to ModelDB, use the ModelDB sync variants of functions. So the original fit calls from spark.ml would turn into fitSync, save calls would turn into saveSync and so on.

val logReg = new LogisticRegression()
val logRegModel = logReg.fitSync(trainDf)
val predictions = logRegModel.transformSync(test)

logRegModel.saveSync("simple_lr")

d. Log metrics

Use the ModelDB metrics class (SyncableMetrics) or use the spark Evaluator classes with the evaluateSync method.

val metrics = SyncableMetrics.ComputeMulticlassMetrics(lrModel, predictions, labelCol, predictionCol)

OR

val evaluator = new BinaryClassificationEvaluator()
val metric = evaluator.evaluateSync(predictions, logRegModel)

The full code for this example can be found here.

e. Run your model!

Be sure to link the client library built above to your code (e.g. by adding to your classpath).

3. Explore models

That's it! Explore the models you built in your workflow at http://localhost:3000.

More complex spark.ml workflows using ModelDB are located here and here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark_ml.md

spark_ml.md

Getting Started with ModelDB on spark.ml

1. Setup

2. Incorporate ModelDB into an ML workflow

a. Import the ModelDB client library classes

b. Create a ModelDB syncer

c. Log models and pre-processing operations

d. Log metrics

e. Run your model!

3. Explore models

Files

spark_ml.md

Latest commit

History

spark_ml.md

File metadata and controls

Getting Started with ModelDB on spark.ml

1. Setup

2. Incorporate ModelDB into an ML workflow

a. Import the ModelDB client library classes

b. Create a ModelDB syncer

c. Log models and pre-processing operations

d. Log metrics

e. Run your model!

3. Explore models