forked from mbaityje/plankifier
-
Notifications
You must be signed in to change notification settings - Fork 0
SidxA/plankifier
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Zooplankton classification software kit --------------------------------------------------------------- contents name run description 0 convnet python3 convnet.py script that builds a cnn and trains it on a given dataset of images 1 features python3 features.py script that builds a model consisting of a cnn for images and a mlp for tabular feture data 2 binary python3 binary.py script that build a cnn for binary classification for a chosen class and mixes a balance of other classes 3 analyze python3 analyze.py script that reads in training output of the cnn and visualizes logarithmic time evolution and the val_loss differences between hyperparameters 4 binary-compare python3 binary-compare.py script that reads in training output of the binary classifier and visualizes the impact of the data set on validation ---------------------------------------------------------------- 0: convnet the most important parsed parameter is datapath, all other default values should lead to decent results if one argument is changed from the default value, the output name will contain the change example for a training run with 100 epochs, the adam optimizer with amsgrad, a specific data directory and live training output into a log file in the dir the script runs in: python3 convnet.py -datapath='~/specific/data/' -totEpochs=100 -opt='adam_2' -verbose=1 >> trainingresults.log & argument type default description cpu bool False performs training only on cpus gpu bool False performs training on gpus datapath str './data/' directory which must contain classes as subdirectories with a directory 'training_images' inside outpath str './out/' (created) directory for the training output, a subdirectory will be created with the parameters of the run inside the name verbose int 1 one of [0,1,2] for amount of output of training documentation totEpochs int 10 total number of epochs for the training opt str 'sgd_1' Choice of the minimization algorithm bs int 8 Batch size lr float 0.0001 Learning Rate height int 128 Image height, must be the same as width width int 128 Image width, must be the same as height depth int 3 Number of channels (3 for RGB) testSplit float 0.2 Fraction of examples in the validation set aug bool True Perform data augmentation augtype string 'standard' Augmentation type augparameter float 0 Augmentation parameter when testing one type of augmentaion, ignored for standard augmentation implemented optimizer choices (the learning rate is set for all by -lr): -opt description 'adam_1' Adam without amsgrad, beta_1=0.9, beta2=0.999 'adam_2' Adam with amsgrad, beta_1=0.9, beta2=0.999 'sgd_1' stochastic gradient descent without nesterov 'sgd_2' stochastic gradient descent with nesterov 'sgd_3' stochastic gradient descent with nesterov and momentum of 0.1 'sgd_4' stochastic gradient descent without nesterov and momentum of 0.1 'rmsprop' RMSprop with rho = 0.9 'adagrad' Adagrad 'adadelta' Adadelta with rho = 0.95 'adamax' Adamax with beta_1 = 0.9, beta_2 = 0.999 'nadam' Nadam with beta_1 = 0.9, beta_2 = 0.999 implemented choices for individual data augmentation: -augtype -augparameter description 'rotate' Degree range for random rotations 'v_shift' width shift: fraction of total width, if < 1, or pixels if >= 1 'h_shift' height shift: fraction of total height, if < 1, or pixels if >= 1 'shear' Shear Intensity (Shear angle in counter-clockwise direction in degrees) 'zoom' Range for random zoom [lower, upper] = [1-args.augparameter, 1+args.augparameter] 'h_flip' enables flippling, no -augparameter required 'v_flip' enables flippling, no -augparameter required 'brightness' Range for picking a brightness shift value from [lower, upper] = [args.augparameter,1-args.augparameter] 'rescale' multiply the data by the value provided after applying all other transformations 'standard' performs mixed augmentation with rotation_range=360, width_shift_range=0.2, height_shift_range=0.2, shear_range=0.3, zoom_range=0.2, horizontal_flip=True, vertical_flip=True no -augparameter required ---------------------------------------------------------------- 1: features this script not only takes the image data as an input but also the tabular features files, which have to be in the class directories the most important parsed parameter is datapath, all other default values should lead to decent results argument type default description cpu bool False performs training only on cpus gpu bool False performs training on gpus datapath str './small_data/' directory which must contain classes as subdirectories with a directory 'training_images' inside outpath str './out/' (created) directory for the training output, a subdirectory will be created with the parameters of the run inside the name verbose int 1 one of [0,1,2] for amount of output of training documentation totEpochs int 10 total number of epochs for the training bs int 8 Batch size lr float 0.0001 Learning Rate height int 128 Image height, must be the same as width width int 128 Image width, must be the same as height depth int 3 Number of channels (3 for RGB) testSplit float 0.2 Fraction of examples in the validation set so far, SGD is implemented and no data augmentation is performed ---------------------------------------------------------------- 2: binary this script takes the key argument which is the class that is going to be identified. It then takes a balanced mix from all other classes given by datapath. for the cnn the optimizing methods 'adam' , 'sgs' and 'rmsprop' are implemented. For Binary classification, RMSprop seems to yield best results. Binary Crossentropy is the loss function. argument type default description cpu bool False performs training only on cpus gpu bool False performs training on gpus datapath str './data/' directory which must contain classes as subdirectories with a directory 'training_images' inside outpath str './out/' (created) directory for the training output, a subdirectory will be created with the parameters of the run inside the name opt str 'sgd' Choice of the minimization algorithm totEpochs int 10 total number of epochs for the training bs int 8 Batch size lr float 0.0001 Learning Rate height int 128 Image height, must be the same as width width int 128 Image width, must be the same as height depth int 3 Number of channels (3 for RGB) testSplit float 0.2 Fraction of examples in the validation set key str 'dinobryon' to be identified class. Must be the name of a subdirectory of datapath limit int 0 if 0: takes all images, if !=0: takes only a number of images. Result will be 50/50 class/nonclass images number1 int 256 nodenumbers of the first cnn layer number2 int 128 nodenumbers of the second cnn layer number3 int 64 nodenumbers of the third cnn layer ---------------------------------------------------------------- 3: analyze this script takes the training output (the epoch log files) and visualizes them the argument path is there for a folder with the title of the hyperparameter beeing changed and should have subdirectories of the individual runs with the parameter value in the names as generated by convnet.py. the argument epochnumber is self explanatory and not mandatory. ---------------------------------------------------------------- 4: analyze this script takes the training output (the epoch log files) and visualizes them the argument path is there for a folder with subdirectories of the individual runs with the key and limit parameters as names as generated by binary.py the argument epochnumber is self explanatory and not mandatory.
About
Code for plankton dataset creation and classification
Resources
Stars
Watchers
Forks
Packages 0
No packages published
Languages
- Python 100.0%