Skip to content

Latest commit

 

History

History
113 lines (78 loc) · 3.07 KB

README.md

File metadata and controls

113 lines (78 loc) · 3.07 KB

ArchiSound

A collection of pre-trained audio models in PyTorch from audio-encoders-pytorch and audio-diffusion-pytorch.

Install

pip install archisound

PyPI - Python Version

Autoencoders

  • dmae1d-ATC32-v3

    Usage and Info
    from archisound import ArchiSound
    
    autoencoder = ArchiSound.from_pretrained("dmae1d-ATC32-v3")
    
    x = torch.randn(1, 2, 2**18)
    z = autoencoder.encode(x) # [1, 32, 512]
    y = autoencoder.decode(z, num_steps=20) # [1, 2, 262144]
    Info
    Input type Audio (stereo @ 48kHz)
    Number of parameters 86M
    Compression Factor 32x
    Downsampling Factor 512x
    Bottleneck Type Tanh
  • dmae1d-ATC64-v2

    Usage and Info
    from archisound import ArchiSound
    
    autoencoder = ArchiSound.from_pretrained("dmae1d-ATC64-v2")
    
    x = torch.randn(1, 2, 2**18)
    z = autoencoder.encode(x) # [1, 32, 256]
    y = autoencoder.decode(z, num_steps=20) # [1, 2, 262144]
    Info
    Input type Audio (stereo @ 48kHz)
    Number of parameters 185M
    Compression Factor 64x
    Downsampling Factor 1024x
    Bottleneck Type Tanh
  • autoencoder1d-AT-v1

    Usage and Info
    from archisound import ArchiSound
    
    autoencoder = ArchiSound.from_pretrained('autoencoder1d-AT-v1')
    
    x = torch.randn(1, 2, 2**18)    # [1, 2, 262144]
    z = autoencoder.encode(x)       # [1, 32, 8192]
    y = autoencoder.decode(z)       # [1, 2, 262144]
    Info
    Input type Audio (stereo @ 48kHz)
    Number of parameters 20.7M
    Compression Factor 2x
    Downsampling Factor 32x
    Bottleneck Type Tanh
    Known Limitations Slight blurriness in high frequency spectrogram reconstruction
  • dmae1d-ATC64-v1

    Usage and Info

    A diffusion based autoencoder with high compression ratio. Requires audio_diffusion_pytorch==0.0.92.

    from archisound import ArchiSound
    
    autoencoder = ArchiSound.from_pretrained("dmae1d-ATC64-v1")
    
    x = torch.randn(1, 2, 2**18)
    z = autoencoder.encode(x) # [1, 32, 256]
    y = autoencoder.decode(z, num_steps=20) # [1, 2, 262144]
    Info
    Input type Audio (stereo @ 48kHz)
    Number of parameters 234.2M
    Compression Factor 64x
    Downsampling Factor 1024x
    Bottleneck Type Tanh