-
Notifications
You must be signed in to change notification settings - Fork 10
CreateTrainJob.m
This page provides documentation on how to run CDeep3M training using CreateTrainJob.m script. The CreateTrainJob.m script calls caffe to train models using training data provided.
WARNING THIS PAGE MAY CONTAIN ERRORS AND ARE NOT LIABLE FOR ANY DAMAGES, FOLLOW THESE INSTRUCTIONS AT YOUR OWN RISK.
This document assumes you have already connected to an EC2 instance running a p3.2xlarge or p2.xlarge instance and already have uploaded training data to the instance and run PreprocessTrainingData.m script on the training data to generate a set of 16 .h5 files in a directory.
CreateTrainJob.m should be invoked on the command line directory and it expects two arguments.
-
<Input train data directory>
- This should be a path to the directory containing the augmented training data generated by PreprocessTrainingData.m
-
<output dir>
- Directory where CreateTrainJob.m will put the model directories and scripts to run the training.
Example usage:
cd ~cdeep3m
./CreateTrainJob.m ../foo ../out
Running above will output the following upon success:
Verifying input training data is valid ... success
Copying over model files and creating run scripts ... success
A new directory has been created: /home/ubuntu/out
In this directory are 3 directories 1fm,3fm,5fm which
correspond to 3 caffe models that need to be trainedas well as two scripts:
[caffe_train.sh](caffe_train.sh) -- Runs caffe for a single model
[run_all_train.sh](run_all_train.sh) -- Runs [caffe_train.sh](caffe_train.sh) serially for all 3 models
CreateTrainJob.m will create the directory specified by the second argument passed into the script <output dir> and put in several files and directories described below:
-
1fm, 3fm, and 5fm directories
- Contains the X model definition along with configuration files telling caffe how to learn (solver.prototxt)
-
- Shell script to run caffe to train for an individual model (1fm, 3fm, or 5fm). Output from running this will be written to out.log file stored in <model>/log directory (example: 1fm/log/out.log) This script is usually run by run_all_train.sh, but can be invoked directly.
-
- Shell script that invokes caffe_train.sh on all three models (1fm, 3fm, and 5fm)