An example use case for the pAdam optimizer, for training ResNet18 on CIFAR10.
Ensure you have the following dependencies installed:
- PyTorch
- torchvision
- NumPy
- pickle
- pandas
- rich
- Clone the repository:
git clone https://github.com/Nadav-out/PAdam
- Navigate to the project directory:
cd PAdam
- Run the training script:
python python/train_CIFAR10.py [optional arguments]
--progress_bar
: Enable a rich live layout progress bar for visual feedback during training.--verbose
: Enable verbose output for detailed logging information.--absolute_paths
: Use absolute paths for data and output directories.--data_dir <path>
: Set the directory for data (default:../data/CIFAR10
).--out_dir <path>
: Specify the output directory for results (default:../results/CIFAR10
).--save_checkpoints
: Save checkpoints during training to allow resuming.--save_summary
: Save a comprehensive summary of the training session, including training and validation losses, accuracies, learning rates, and model sparsity details.--num_workers <number>
: Number of workers for data loading (default:4
).--batch_size <size>
: Batch size for training (default:400
).--epochs <number>
: Number of training epochs (default:100
).--optimizer_name <name>
: Name of the optimizer to use (AdamW
orPAdam
, default:PAdam
).--max_lr <rate>
: Maximum learning rate (default:1e-3
).--lambda_p <value>
: Lambda parameter value for PAdam optimizer (default:1e-3
).--p_norm <value>
: P-norm value for PAdam optimizer (default:0.8
).--beta1 <value>
: Beta1 for Adam optimizer (default:0.9
).--beta2 <value>
: Beta2 for Adam optimizer (default:0.999
).--grad_clip <value>
: Gradient clipping value (default:0.0
).--warmup_epochs <number>
: Number of warmup epochs, can be fractional (default:2
).--lr_decay_frac <fraction>
: Fraction of max_lr to decay to (default:1.0
).--min_lr <rate>
: Minimum learning rate (default:1e-5
).--small_weights_threshold <value>
: Threshold for considering weights as small (default:1e-13
).--device <device>
: Device to use for training (cuda
orcpu
, default:cuda
).--compile
: (requires PyTorch >2.0) compiles the model for faster training.--wandb_log
: Enable logging to Weights & Biases for experiment tracking.--wandb_project <name>
: Specify the Weights & Biases project name (default:PAdam
).--wandb_run_name <name>
: Set the Weights & Biases run name (default includesResNet18
with a timestamp).--wandb_group <name>
: Group related runs in Weights & Biases.
For detailed explanations and additional parameters, refer to the argument parser setup in the training script.