Skip to content

Variational Autoencoder in the mel-spectrogram domain for one-shot audio synthesis

License

Notifications You must be signed in to change notification settings

moiseshorta/MelSpecVAE

Repository files navigation

MelSpecVAE

Author: Moisés Horta Valenzuela, 2021

Website: moiseshorta.audio

Twitter: @hexorcismos

Español: Open In Colab

English: Open In Colab

MelSpecVAE is a Variational Autoencoder that can synthesize Mel-Spectrograms which can be inverted into raw audio waveform. Currently you can train it with any dataset of .wav audio at 44.1khz Sample Rate and 16bit bitdepth.

Listen to audio examples here: https://soundcloud.com/h-e-x-o-r-c-i-s-m-o-s/sets/melspecvae-variational

Features:

  • Interpolate through 2 different points in the latent space and synthesize the 'in between' sounds.
  • Generate short one-shot audio
  • Synthesize arbitrarily long audio samples by generating seeds and sample from the latent space. Noise types for generating Z-vectors are uniform, Perlin and fractal.

Credits:

  • VAE neural network architecture coded following 'The Sound of AI' Youtube tutorial series by Valerio Velardo
  • Some utility functions from Marco Passini's MelGAN-VC Jupyter Notebook.

About

Variational Autoencoder in the mel-spectrogram domain for one-shot audio synthesis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published