https://github.com/voicepaw/so-vits-svc-fork
- Make a folder with the name of the voice you're training, cd to that directory
- Put voice samples in a subdirectory named
dataset_raw_raw
- Run
svc pre-split
, this will generatedataset_raw
- Run
post_split.sh
, this will split all samples from dataset_raw into clips of 30 second max length, and put them in dataset/44k/$(pwd) - Run
svc pre-config
, this will generateconfigs/44k/config.json
- Edit
configs/44k/config.json
, modifyepochs
to some reasonable value like 500. Modifykeep_ckpts
to be the number of backlogs you want. I usually pick 5. - Run
svc pre-hubert
, this will generate stuff indataset
. Note that clips were split to max 30 second length because hubert takes more VRAM the longer the clip is. If you run out of VRAM you'll have to adjust the max clip length inpost_split.sh
- Run
svc train
, this will generate the models inlogs\44k
. Use theG_
models andconfig.json
for inferrence.