Is it Necessary to Train CLAP Alongside LDM and VAE for my own Data? #46

songbh56 · 2024-08-26T07:52:07Z

Hello, I'm considering training the AudioLDM model using my own dataset, and I'm curious about the necessity of training the CLAP component along with the LDM and VAE.

From my understanding of the "AudioLDM: Text-to-Audio Generation with Latent Diffusion Models" paper, CLAP is used for contrastive language-audio pretraining, which learns the joint embedding space between text and audio. However, my dataset is quite specific and I'm wondering if it's feasible to train only the LDM and VAE parts for my use case.

or maybe I missed some of the CLAP training code, can you help me point it out?

haoheliu · 2024-08-28T08:25:47Z

Hi @songbh56 I would suggest using the pretrained CLAP checkpoint, which should works OK for your usecase. I think CLAP will not likely be the bottleneck of the system.
If you do want to finetune the CLAP I suggest you checkout this repo https://github.com/LAION-AI/CLAP. Thanks

songbh56 · 2024-08-28T08:35:45Z

Okay, I understand. Thank you very much for your reply and suggestions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it Necessary to Train CLAP Alongside LDM and VAE for my own Data? #46

Is it Necessary to Train CLAP Alongside LDM and VAE for my own Data? #46

songbh56 commented Aug 26, 2024

haoheliu commented Aug 28, 2024

songbh56 commented Aug 28, 2024

Is it Necessary to Train CLAP Alongside LDM and VAE for my own Data? #46

Is it Necessary to Train CLAP Alongside LDM and VAE for my own Data? #46

Comments

songbh56 commented Aug 26, 2024

haoheliu commented Aug 28, 2024

songbh56 commented Aug 28, 2024