Trainer for audio-diffusion-pytorch
audio-diffusion-pytorch-trainer notebook:
(Optional) Create virtual environment and activate it
python3 -m venv venv
source venv/bin/activate
Install requirements
pip install -r requirements.txt
Add environment variables, rename .env.tmp
to .env
and replace with your own variables (example values are random)
DIR_LOGS=/logs
DIR_DATA=/data
# Required if using wandb logger
WANDB_PROJECT=audioproject
WANDB_ENTITY=johndoe
WANDB_API_KEY=a21dzbqlybbzccqla4txa21dzbqlybbzccqla4tx
# Required if using Common Voice dataset
HUGGINGFACE_TOKEN=hf_NUNySPyUNsmRIb9sUC4FKR2hIeacJOr4Rm
Run test experiment, see the exp
folder for other experiments (create your own .yaml
file there to run a custom experiment!)
python train.py exp=base_test
Run on GPU(s)
python train.py exp=base_test trainer.gpus=1
Resume run from a checkpoint
python train.py exp=base_test +ckpt=/logs/ckpts/2022-08-17-01-22-18/'last.ckpt'
How do I use the CommonVoice dataset?
Before running an experiment on commonvoice dataset you have to:
- Create a Huggingface account if you don't already have one here
- Accept the terms of the version of common voice dataset you will be using by clicking on it and selecting "Access repository".
- Add your access token to the
.env
file, for exampleHUGGINGFACE_TOKEN=hf_NUNySPyUNsmRIb9sUC4FKR2hIeacJOr4Rm
.
How do I load the model once I'm done training?
If you want to load the checkpoint to restore training with the trainer you can do python train.py exp=my_experiment +ckpt=/logs/ckpts/2022-08-17-01-22-18/'last.ckpt'
.
Otherwise if you want to instantiate a model from the checkpoint:
from main.mymodule import Model
model = Model.load_from_checkpoint(
checkpoint_path='my_checkpoint.ckpt',
learning_rate=1e-4,
beta1=0.9,
beta2=0.99,
in_channels=1,
patch_size=16,
all_other_paratemeters_here...
)
to get only the PyTorch .pt
checkpoint you can save the internal model weights as torch.save(model.model.state_dict(), 'torchckpt.pt')
.
Why no checkpoint is created at the end of the epoch?
If the epoch is shorter than log_every_n_steps
it doesn't save the checkpoint at the end of the epoch, but after the provided number of steps. If you want to checkpoint more frequently you can add every_n_train_steps
to the ModelCheckpoint e.g.:
model_checkpoint:
_target_: pytorch_lightning.callbacks.ModelCheckpoint
monitor: "valid_loss" # name of the logged metric which determines when model is improving
save_top_k: 1 # save k best models (determined by above metric)
save_last: True # additionaly always save model from last epoch
mode: "min" # can be "max" or "min"
verbose: False
dirpath: ${logs_dir}/ckpts/${now:%Y-%m-%d-%H-%M-%S}
filename: '{epoch:02d}-{valid_loss:.3f}'
every_n_train_steps: 10
Note that logging the checkpoint so frequently is not recommended in general, since it takes a bit of time to store the file.