Skip to content

C3-ASV-Team/pytorch-cifar

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tutorial: Train CIFAR10 with PyTorch on NERSC Cori-GPU

Here is a tutorial how to train deep learning models on the CIFAR10 dataset on Cori-GPU platform using PyTorch.

Submit an interactive job

First you'd need to request one or more GPU using the following script. See this page for further details.

module load cgpu
salloc -C gpu -N 1 -t 60 -c 10 -G 1 -A m3691

Then run the following commands to kick off training.

module load pytorch/v1.5.0-gpu
srun python main.py

Submit a batch job

Run the following commands for submitting a batch job.

sbatch train_cgpu.sh

The dashboard on my.nersc.gov sometimes cannot correctly display jobs running on the GPU cluster, so a better way is to run jobstats in the terminal to view the job status. When the job starts running, its status will change from PENDING to RUNNING.

In the batch mode, the results will be redirected to <job_id>.out, under your working directory by default.

Continuously training on NERSC

Run the following command for continuously training on NERSC

python -u train_nersc.py --name cifar --interval 60 > cifar.log &

The interval is # minutes between two status checking for re-launch. -u to force no buffering.

To quickly test the script's validity, try setting time in train_cgpu.sh to be 3 minutes and run

python train_nersc.py --interval 1

You can build your own script based on this one.


Prerequisites

  • Python 3.6+
  • PyTorch 1.0+

Accuracy

Model Acc.
VGG16 92.64%
ResNet18 93.02%
ResNet50 93.62%
ResNet101 93.75%
RegNetX_200MF 94.24%
RegNetY_400MF 94.29%
MobileNetV2 94.43%
ResNeXt29(32x4d) 94.73%
ResNeXt29(2x64d) 94.82%
DenseNet121 95.04%
PreActResNet18 95.11%
DPN92 95.16%

Learning rate adjustment

I manually change the lr during training:

  • 0.1 for epoch [0,150)
  • 0.01 for epoch [150,250)
  • 0.001 for epoch [250,350)

Resume the training with python main.py --resume --lr=0.01

About

95.16% on CIFAR10 with PyTorch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.5%
  • Shell 0.5%