Benchmarking of Neural Network performance between PyTorch and Flux on Julia for High Performance Computing
Pratyush Shukla (ps4534) and Yufeng Duan (yd2284)
Our aim is to test the performance of Julia’s Deep Learning ecosystem using Flux.jl package against Python’s PyTorch and assess whether Julia’s motto of “Looks like Python, feels like Lisp, runs like C/Fortran” is justified. We aim to perform a detailed benchmarking analysis of PyTorch and Flux.jl performances on Neural Network training and hyperparameter optimization over the High Performance Computing system.
dl-benchmark
│ README.md
│ benchmark.jl
| benchmark.py
│
└───model
│ lenet.py
│ resnet.py
│ lenet.jl
│ resnet.jl
Python requires the following dependencies -
Python >= 3.7.0
pytorch >= 1.4.0
argparse >= 1.2.0
time >= 1.6.0
Julia requires the following dependencies -
Julia >= 1.5.1
Flux >= 0.12.4
MLDatasets >= 0.6.0
CUDA >= 3.7.0
Clone the repository: git clone https://github.com/the-praxs/dl-benchmark.git
For benchmarking in Python: python benchmark.py
Args | Values | Description |
---|---|---|
device | cuda (default) / cpu | Select between CUDA-enabled GPU or CPU for model training |
data | mnist (default) / fashion | Use MNIST or FashionMNIST dataset |
num_workers | 2 (default) / Integer | Number of sub-processes to use for data loading |
batch_size | 128 (default) / Integer | Batch size for data loading |
model | resnet (default) / lenet | Select ResNet-18 or LeNet model to train |
optimizer | sgd (default) / adam | Use Stochastic Gradient Descent (SGD) or Adam optimizer |
lr | 0.1 (default) / Integer | Learning Rate parameter for the optimizer |
epochs | 10 (default) / Integer | Number of epochs for model training |
For benchmarking in Julia: julia benchmark.jl
To use number of sub-processes use this command: julia -p <number of processes> benchmark.jl
We use Adam as the default optimizer.
Args | Values | Description |
---|---|---|
use_cuda | true (default) / false | Select between CUDA-enabled GPU or CPU for model training |
batchsize | 128 (default) / Integer | Batch size for data loading |
η | 0.1 (default) / Integer | Learning Rate parameter for the optimizer |
λ | 5e-4 (default) / Integer | L2 regularizer parameter implemented as weight decay |
epochs | 10 (default) / Integer | Number of epochs for model training |
seed | 42 (default) / Integer | Seeed for data reproducibility |
infotime | 1 (default) / Integer | Report every infotime epochs |
Benchmarking performed for MNIST dataset on LeNet model with default values of the scripts.
Python:
Workers | Best Training Accuracy (%) | Total Data Loading Time (s) | Total Epoch Training Time (s) | Total Training Function Time (s) | Average Training Loss (s) |
---|---|---|---|---|---|
2 | 98.517 | 1.286 | 13.246 | 31.824 | 0.11 |
4 | 98.38 | 1.203 | 11.462 | 22.798 | 0.105 |
8 | 98.482 | 1.275 | 9.836 | 23.995 | 0.107 |
16 | 98.922 | 1.275 | 8.856 | 29.945 | 0.079 |
Julia:
Workers | Best Training Accuracy (%) | Total Data Loading Time (s) | Total Epoch Training Time (s) | Total Training Function Time (s) | Average Training Loss (s) |
---|---|---|---|---|---|
2 | 99.37 | 1.198 | 16.939 | 18.624 | 0.031 |
4 | 99.407 | 1.122 | 18.04 | 19.61 | 0.029 |
8 | 99.417 | 1.139 | 17.414 | 19.051 | 0.029 |
16 | 99.41 | 1.228 | 17.945 | 19.647 | 0.03 |
Python:
Workers | Best Training Accuracy (%) | Total Data Loading Time (s) | Total Epoch Training Time (s) | Total Training Function Time (s) | Average Training Loss (s) |
---|---|---|---|---|---|
32 | 68.638 | 2.757 | 41.956 | 70.38 | 0.027 |
128 | 99.37 | 1.198 | 16.939 | 18.624 | 1.511 |
512 | 98.032 | 0.889 | 3.336 | 16.747 | 0.247 |
2048 | 94.113 | 0.807 | 1.109 | 15.968 | 1.426 |
8196 | 40.37 | 0.76 | 0.686 | 17.325 | 2.326 |
Julia:
Workers | Best Training Accuracy (%) | Total Data Loading Time (s) | Total Epoch Training Time (s) | Total Training Function Time (s) | Average Training Loss (s) |
---|---|---|---|---|---|
32 | 99.523 | 1.346 | 50.161 | 52.54 | 0.027 |
128 | 99.37 | 1.198 | 16.939 | 18.624 | 0.031 |
512 | 98.286 | 1.169 | 6.651 | 8.145 | 0.052 |
2048 | 96.7 | 1.128 | 4.455 | 5.849 | 0.099 |
8196 | 90.503 | 1.144 | 4.614 | 6.031 | 0.309 |
From the above results, we oberve that total loop time is more for Julia than Python. This is because PyTorch utilizes CUDA libraries that are developed in C++ and communicate with other Python libraries that have underlying implementation in C++. Hence, communication between different parts of the function is faster in Python than Julia. Howevever, Julia outperforms Python in terms of total training time so TTA is overall better in case of Julia than Python. Julia requires lesser number of workers and batch size for obtaining higher accuracy than Python. Because Julia uses LLVM as the compiler and does Just-In-Time compilation it has faster raw computation speed. Julia also utilizes Multiple Dispatch that maps a tuple of arguments to a return value. This means a particular set of arguments will result in one return type that is selected by Julia at run-time from multiple calls. This type of polymorphism makes Julia exceptionally fast.
However Julia lacks in terms of documentation and strong community support vis-a-vis Python. Since Julia is a relatively new language, it depends on the communities to enhance its usability in various domains. As it focuses more on the scientific community, its more versatile to use in R&D enviornments rather than enterprise production systems. Volatile deprecation of methods in Julia also tends to affect its backwards compatibility with modules.
It can be concluded that Julia is suitable for those interested in exploring Artificial Intelligence as a topic of research on a deeper level than those who want to explore Applied Artificial Intelligence for the industries.
- Deep Learning with Julia using Flux.jl
- THE MNIST DATABASE of handwritten digits
- Flux.jl on MNIST — Variations of a theme
- Flux.jl on MNIST — A performance analysis
- A Swift Introduction To Flux For Julia (With CUDA)
- Torch-TensorRT
- Accelerating PyTorch with CUDA Graphs
- Neural Network Benchmarks
- High-Performance GPU Computing in the Julia Programming Language
- PyTorch from a Flux ML Perspective