Skip to content

Audio-to-audio translation #14

Discussion options

You must be logged in to vote

I don't think it would make sense to mix diffusion with GANs. If you need only the UNet1d it makes sense to use it the way you are using it. However, I'm quite confident you could solve this with a pure diffusion based approach (and get much better results) as follows:

from audio_diffusion_pytorch import AudioDiffusionModel

model = AudioDiffusionModel(
    in_channels=1,
    context_channels=[1]
)

# Train model with pairs of audio sources, i.e. predict target given source 
source = torch.randn(1, 1, 2 ** 18) # [batch, in_channels, samples], 2**18 ≈ 12s of audio at a frequency of 22050
target = torch.randn(1, 1, 2 ** 18)
loss = model(target, channels_list=[source])
loss.backward() # Do t…

Replies: 2 comments 4 replies

Comment options

You must be logged in to vote
3 replies
@cameronfabbri
Comment options

@cameronfabbri
Comment options

@flavioschneider
Comment options

Answer selected by cameronfabbri
Comment options

You must be logged in to vote
1 reply
@jtwigg
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants