Skip to content

Commit

Permalink
Readme for VW #29
Browse files Browse the repository at this point in the history
  • Loading branch information
vsuthichai committed Nov 9, 2016
1 parent 5493368 commit 95e35ec
Showing 1 changed file with 25 additions and 3 deletions.
28 changes: 25 additions & 3 deletions vw/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,18 +14,19 @@ space. Spark acts as the distributed computation framework responsible
for parallelizing VW execution on nodes.

The hyperparameter space that is searched over in VW includes but is not
limited to the namespaces, the learning rate, l1, l2. In fact, you
limited to the namespaces, the learning rate, l1, l2, etc. In fact, you
can search over any parameter of your choice, as long as the space
of the parameter can be properly defined within Spotz. The parameter
names are the same as the command line arguments passed to VW.

For example to search within the learning rate space between 0 and 1, 'l'
specifies the learning rate as if you had passed that same parameter name
to VW on the command line.
to VW on the command line.

```scala
val space = Map(
("l", UniformDouble(0, 1))
("q", Combinations(Seq("a", "b", "c"), k = 2, replacement = true))
)
```

Expand Down Expand Up @@ -63,5 +64,26 @@ val objective = new SparkVwCrossValidationObjective(
)
val searchResult = optimizer.minimize(objective, space)
val bestPoint = searchResult.bestPoint
println(bestPoint)
val bestPoint = searchResult.bestLoss
```

All the cross validation logic resides in the objective function.
Internally, what happens is that the dataset is split into
K parts. For every i'th part, a VW cache file is generated for the
remaining K - 1 parts and that i'th part. The VW cache file
for the K - 1 parts is used for training and the i'th part cache
file is used for testing. These cache files are distributed
through Spark onto the worker nodes participating in
the Spark job. For a single K fold cross validation run,
sampled hyperparameters from the defined space are used as
arguments to VW and there are K training and test runs of VW using the same
same sampled hyperparameters, one training and test run for each fold.
The test losses for all folds are averaged to compute a single
loss for the entire K fold cross validation run. Spotz will keep track of
this loss and its respective sampled hyperparameters.

Repeat this K fold cross validation process with newly sampled
hyperparameter values and a newly computed loss for as many trials
as necessary until the stop strategy criteria has been fulfilled,
and finally return the best loss and its respective hyperparameters.

0 comments on commit 95e35ec

Please sign in to comment.