Skip to content

This repository contains the code for benchmarking the performance of Neural Networks between PyTorch and Flux on Julia for High Performance Computing. The benchmarking is done on the MNIST dataset.

Notifications You must be signed in to change notification settings

the-praxs/dl-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Benchmarking of Neural Network performance between PyTorch and Flux on Julia for High Performance Computing

Team Members

Pratyush Shukla (ps4534) and Yufeng Duan (yd2284)

Description

Our aim is to test the performance of Julia’s Deep Learning ecosystem using Flux.jl package against Python’s PyTorch and assess whether Julia’s motto of “Looks like Python, feels like Lisp, runs like C/Fortran” is justified. We aim to perform a detailed benchmarking analysis of PyTorch and Flux.jl performances on Neural Network training and hyperparameter optimization over the High Performance Computing system.

Code Structure

dl-benchmark
│   README.md
│   benchmark.jl
|   benchmark.py
│
└───model
    │   lenet.py
    │   resnet.py 
    │   lenet.jl
    │   resnet.jl  

Usage

Pre-requisites

Python requires the following dependencies -

Python >= 3.7.0
pytorch >= 1.4.0
argparse >= 1.2.0
time >= 1.6.0

Julia requires the following dependencies -

Julia >= 1.5.1
Flux >= 0.12.4
MLDatasets >= 0.6.0
CUDA >= 3.7.0

Running the scripts

Clone the repository: git clone https://github.com/the-praxs/dl-benchmark.git

For benchmarking in Python: python benchmark.py

Args Values Description
device cuda (default) / cpu Select between CUDA-enabled GPU or CPU for model training
data mnist (default) / fashion Use MNIST or FashionMNIST dataset
num_workers 2 (default) / Integer Number of sub-processes to use for data loading
batch_size 128 (default) / Integer Batch size for data loading
model resnet (default) / lenet Select ResNet-18 or LeNet model to train
optimizer sgd (default) / adam Use Stochastic Gradient Descent (SGD) or Adam optimizer
lr 0.1 (default) / Integer Learning Rate parameter for the optimizer
epochs 10 (default) / Integer Number of epochs for model training

For benchmarking in Julia: julia benchmark.jl

To use number of sub-processes use this command: julia -p <number of processes> benchmark.jl

We use Adam as the default optimizer.

Args Values Description
use_cuda true (default) / false Select between CUDA-enabled GPU or CPU for model training
batchsize 128 (default) / Integer Batch size for data loading
η 0.1 (default) / Integer Learning Rate parameter for the optimizer
λ 5e-4 (default) / Integer L2 regularizer parameter implemented as weight decay
epochs 10 (default) / Integer Number of epochs for model training
seed 42 (default) / Integer Seeed for data reproducibility
infotime 1 (default) / Integer Report every infotime epochs

Results and Observations

Benchmarking performed for MNIST dataset on LeNet model with default values of the scripts.

Benchmarking TTA against number of workers

Python:

Workers Best Training Accuracy (%) Total Data Loading Time (s) Total Epoch Training Time (s) Total Training Function Time (s) Average Training Loss (s)
2 98.517 1.286 13.246 31.824 0.11
4 98.38 1.203 11.462 22.798 0.105
8 98.482 1.275 9.836 23.995 0.107
16 98.922 1.275 8.856 29.945 0.079

Julia:

Workers Best Training Accuracy (%) Total Data Loading Time (s) Total Epoch Training Time (s) Total Training Function Time (s) Average Training Loss (s)
2 99.37 1.198 16.939 18.624 0.031
4 99.407 1.122 18.04 19.61 0.029
8 99.417 1.139 17.414 19.051 0.029
16 99.41 1.228 17.945 19.647 0.03

Epoch training time against Number of Workers

TTA against Number of Workers

Benchmarking TTA against batch size

Python:

Workers Best Training Accuracy (%) Total Data Loading Time (s) Total Epoch Training Time (s) Total Training Function Time (s) Average Training Loss (s)
32 68.638 2.757 41.956 70.38 0.027
128 99.37 1.198 16.939 18.624 1.511
512 98.032 0.889 3.336 16.747 0.247
2048 94.113 0.807 1.109 15.968 1.426
8196 40.37 0.76 0.686 17.325 2.326

Julia:

Workers Best Training Accuracy (%) Total Data Loading Time (s) Total Epoch Training Time (s) Total Training Function Time (s) Average Training Loss (s)
32 99.523 1.346 50.161 52.54 0.027
128 99.37 1.198 16.939 18.624 0.031
512 98.286 1.169 6.651 8.145 0.052
2048 96.7 1.128 4.455 5.849 0.099
8196 90.503 1.144 4.614 6.031 0.309

Epoch training time against Batch Size

TTA against Batch Size

From the above results, we oberve that total loop time is more for Julia than Python. This is because PyTorch utilizes CUDA libraries that are developed in C++ and communicate with other Python libraries that have underlying implementation in C++. Hence, communication between different parts of the function is faster in Python than Julia. Howevever, Julia outperforms Python in terms of total training time so TTA is overall better in case of Julia than Python. Julia requires lesser number of workers and batch size for obtaining higher accuracy than Python. Because Julia uses LLVM as the compiler and does Just-In-Time compilation it has faster raw computation speed. Julia also utilizes Multiple Dispatch that maps a tuple of arguments to a return value. This means a particular set of arguments will result in one return type that is selected by Julia at run-time from multiple calls. This type of polymorphism makes Julia exceptionally fast.

However Julia lacks in terms of documentation and strong community support vis-a-vis Python. Since Julia is a relatively new language, it depends on the communities to enhance its usability in various domains. As it focuses more on the scientific community, its more versatile to use in R&D enviornments rather than enterprise production systems. Volatile deprecation of methods in Julia also tends to affect its backwards compatibility with modules.

It can be concluded that Julia is suitable for those interested in exploring Artificial Intelligence as a topic of research on a deeper level than those who want to explore Applied Artificial Intelligence for the industries.

Dataset and Code:

References

About

This repository contains the code for benchmarking the performance of Neural Networks between PyTorch and Flux on Julia for High Performance Computing. The benchmarking is done on the MNIST dataset.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published