You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1- Should I or Can I train a model with Higher sample rate (let's say 48khz) instead of 16khz or this architecture just simply don't support that? I was wondering what would happen if i simply start the training on a higher sample rate with the provided model.
2- How about Style Transfer from an Audio / Music to another alongside prompts? I mean conditioning an audio based on another or on the melody or the sounds based on a reference audio sample. I'm asking this since it appears Models trained with this code have their own specific Inference regime compared to the original repo.
3- Is there going to be any implementation for Stereo generation?
4- Also In the paper it was mentioned only a single RTX3090 can be used to train the model. (You also used 8x A100 for the AudioLDM2), I was wondering how long did it take, since I think it wasn't mentioned in the paper or perhaps I'm missing it.
5- last but not the least, how much should I increase unet_in_channels and number of res blocks, embed dim etc. in the config file to get a model size as big as the largest AudioLDM?
Thanks in advance. Looking forward to AudioLDM2's code as well if there's going to be any.
The text was updated successfully, but these errors were encountered:
1- Should I or Can I train a model with Higher sample rate (let's say 48khz) instead of 16khz or this architecture just simply don't support that? I was wondering what would happen if i simply start the training on a higher sample rate with the provided model.
2- How about Style Transfer from an Audio / Music to another alongside prompts? I mean conditioning an audio based on another or on the melody or the sounds based on a reference audio sample. I'm asking this since it appears Models trained with this code have their own specific Inference regime compared to the original repo.
3- Is there going to be any implementation for Stereo generation?
4- Also In the paper it was mentioned only a single RTX3090 can be used to train the model. (You also used 8x A100 for the AudioLDM2), I was wondering how long did it take, since I think it wasn't mentioned in the paper or perhaps I'm missing it.
5- last but not the least, how much should I increase unet_in_channels and number of res blocks, embed dim etc. in the config file to get a model size as big as the largest AudioLDM?
Thanks in advance. Looking forward to AudioLDM2's code as well if there's going to be any.
The text was updated successfully, but these errors were encountered: