Weight Pruning and Distributed DL for Space Efficiency with TensorFlow Lite

A description of the project
We develop a space optimized CNN for image classification through synchronous distributed training, weight pruning, and quantization in Vertex AI on Google Cloud Platform.

A description of the repository
This repository contains the codes and all relevant training logs/models produced during the project.
The training logs and models for ResNet 20 and ResNet 44 are stored in separate folders respectively.

Under the folder /codes we include all training codes for the trials of exploration. These include testing constant vs. polynomially decaying sparsity in weight pruning and also mixed precision vs. full precision training on both CIFAR-10 and CIFAR-100 datasets.

The final results are two different models, one with final sparsity 60% and one with 70% sparsity. The notebooks that produced the results of our final model trials are sixty_sparsity.ipynb and seventy_sparsity.ipynb in the /codes folder.

Hyperparameters are found through 29 trial runs on a purely pruning based model. We selected the maximal sparsity, 60%, that gives the desired validation accuracy. The training log and trained models are stored under /Final_Model/super_0.6 and /Final_Model/super_0.7

Example commands to execute the code

We included all codes as jupyter notebooks, and all the notebooks can be run on GCP VMs. Note that for the distributed training of the model, it needs to be run on Vertex AI. This will vastly speed up training time. The set up of Vertex AI is described in detail in this video: https://www.youtube.com/watch?v=rAGauhXYgw4&list=WL&index=1 . When logged into Vertex AI workbench, press the "JupyterLab" button to launch jupyterlab, and upload the jupyter notebook using the UI. Then, on the upper-right corner, select the machine configuration. We used 2 Tesla V100 GPUs each with 4 CPUs and 15 GB RAM. Now, the distributed training notebook can be run just as a standard jupyter notebook. Run final_demo.ipynb to train a Cifar100 model. You must first change the logname in resnet_training() to your appropriate file path to store training data. Feel free to experiment with hyperparameters such as initial/final sparsity, pruning frequency, pruning schedule, etc as instructed in the demo notebook. Another notebook, mixed_precision.ipynb is also included which demonstrates training with mixed float16 datatypes for computations and float32 for variables.

Results (including charts/tables) and observations

60% Sparsity Model
When testing polynomial decaying sparsity with a final sparsity of 60%, we find introducing data augmentation more than offsets the lost accuracy from fewer neurons. Accuracy on the testing set is highest using the resnet 44, with both shallow and deep networks seeing a 1.5x increase. When quantization is applied using TensorFlow Lite, the same networks can be stored in nearly half the bytes of the original model without dampening accuracy. This model is thus superior in terms of size and accuracy.
The Best Overall model is ResNet 44 Model:
~ It achieved ~ x1.5 test accuracy improvement.
~ Size of quantized file is ~ x1.8 memory reduction for both.

70% Sparsity Model
Next, we attempted to fine-tune the model slightly by keeping all the other parameters the same but changing final sparsity to 70%. The goal here would be to store an even smaller model, with 10% more zero weights, without losing much accuracy. The increase in test accuracy is similar to that of the model with 60% sparsity, losing just 1% accuracy for this size reduction.
The Best Overall model is ResNet 44 Model. It has ~ x1.5 test accuracy improvement.
The improvements of size of quantized file are listed below:
~ x1.8 memory reduction for Resnet 20
~ x1.4 memory reduction for Resnet 44

The Best Model
Overall, the model that performs best in terms of test accuracy is the resnet 44 model with data augmentation and 60% polynomially decaying sparsity. However, the loss in accuracy is very minimal between 60% and 70% sparsity. Given you are willing to sacrifice the one percent accuracy, you could opt for the other approach of using the 70% sparsity model. We do see that quantization roughly halves memory. Overall, if you were going to train this dataset with resnet 20, the optimized model with 70% sparsity would be the best choice for a very small model, faster training time and only 3% less accuracy.
The best accuracy is achieved by the ResNet 44 model with 60% sparsity, narrow margin with 70% sparsity.

Mixed Precision
Overall, storing variables as float32 numbers while doing computations in float16 does speed up the training time for these two models, even on a GPU like a T4. This results in a speed up of 1.2 times for both networks. Also, average per epoch time is decreased as well. This method succeeds in decreasing training time without hindering accuracy, and there is even a 2% accuracy improvement for resnet20. Though these gains are small, they can't be discounted when deadlines are short and more models need to be trained. In order to isolate the effects of decreased training time to synchronous training, we chose not to include mixed precision in our final model.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
Final Model		Final Model
ResNet 20 Models		ResNet 20 Models
ResNet 20 Training Data		ResNet 20 Training Data
ResNet 44 Training Data		ResNet 44 Training Data
ResNet44 Models		ResNet44 Models
codes		codes
README.md		README.md
final_demo.ipynb		final_demo.ipynb
mixed_precision.ipynb		mixed_precision.ipynb
resnet_forty_baseline_44_T4.h5		resnet_forty_baseline_44_T4.h5
resnet_twenty_baseline_20_T4.h5		resnet_twenty_baseline_20_T4.h5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Weight Pruning and Distributed DL for Space Efficiency with TensorFlow Lite

About

Releases

Packages

Languages

katlass/Space-Optimized-Computer-Vision

Folders and files

Latest commit

History

Repository files navigation

Weight Pruning and Distributed DL for Space Efficiency with TensorFlow Lite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages