Some question regarding the Training #26

Respaired · 2024-01-09T17:38:56Z

1- Should I or Can I train a model with Higher sample rate (let's say 48khz) instead of 16khz or this architecture just simply don't support that? I was wondering what would happen if i simply start the training on a higher sample rate with the provided model.

2- How about Style Transfer from an Audio / Music to another alongside prompts? I mean conditioning an audio based on another or on the melody or the sounds based on a reference audio sample. I'm asking this since it appears Models trained with this code have their own specific Inference regime compared to the original repo.

3- Is there going to be any implementation for Stereo generation?

4- Also In the paper it was mentioned only a single RTX3090 can be used to train the model. (You also used 8x A100 for the AudioLDM2), I was wondering how long did it take, since I think it wasn't mentioned in the paper or perhaps I'm missing it.

5- last but not the least, how much should I increase unet_in_channels and number of res blocks, embed dim etc. in the config file to get a model size as big as the largest AudioLDM?

Thanks in advance. Looking forward to AudioLDM2's code as well if there's going to be any.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some question regarding the Training #26

Some question regarding the Training #26

Respaired commented Jan 9, 2024 •

edited

Loading

Some question regarding the Training #26

Some question regarding the Training #26

Comments

Respaired commented Jan 9, 2024 • edited Loading

Respaired commented Jan 9, 2024 •

edited

Loading