Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it Necessary to Train CLAP Alongside LDM and VAE for my own Data? #46

Open
songbh56 opened this issue Aug 26, 2024 · 2 comments
Open

Comments

@songbh56
Copy link

Hello, I'm considering training the AudioLDM model using my own dataset, and I'm curious about the necessity of training the CLAP component along with the LDM and VAE.

From my understanding of the "AudioLDM: Text-to-Audio Generation with Latent Diffusion Models" paper, CLAP is used for contrastive language-audio pretraining, which learns the joint embedding space between text and audio. However, my dataset is quite specific and I'm wondering if it's feasible to train only the LDM and VAE parts for my use case.

or maybe I missed some of the CLAP training code, can you help me point it out?

@haoheliu
Copy link
Owner

Hi @songbh56 I would suggest using the pretrained CLAP checkpoint, which should works OK for your usecase. I think CLAP will not likely be the bottleneck of the system.
If you do want to finetune the CLAP I suggest you checkout this repo https://github.com/LAION-AI/CLAP. Thanks

@songbh56
Copy link
Author

Okay, I understand. Thank you very much for your reply and suggestions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@haoheliu @songbh56 and others