Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some question regarding the Training #26

Open
Respaired opened this issue Jan 9, 2024 · 0 comments
Open

Some question regarding the Training #26

Respaired opened this issue Jan 9, 2024 · 0 comments

Comments

@Respaired
Copy link

Respaired commented Jan 9, 2024

1- Should I or Can I train a model with Higher sample rate (let's say 48khz) instead of 16khz or this architecture just simply don't support that? I was wondering what would happen if i simply start the training on a higher sample rate with the provided model.

2- How about Style Transfer from an Audio / Music to another alongside prompts? I mean conditioning an audio based on another or on the melody or the sounds based on a reference audio sample. I'm asking this since it appears Models trained with this code have their own specific Inference regime compared to the original repo.

3- Is there going to be any implementation for Stereo generation?

4- Also In the paper it was mentioned only a single RTX3090 can be used to train the model. (You also used 8x A100 for the AudioLDM2), I was wondering how long did it take, since I think it wasn't mentioned in the paper or perhaps I'm missing it.

5- last but not the least, how much should I increase unet_in_channels and number of res blocks, embed dim etc. in the config file to get a model size as big as the largest AudioLDM?

Thanks in advance. Looking forward to AudioLDM2's code as well if there's going to be any.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant