- Code release. (Jul. 27, 2024)
- Online demo at Github See here. (Aug. 13, 2023)
- Supports 16-48 kHz at variable bitrates. (Jul. 27, 2024)
In this paper, we present SuperCodec, a neural speech codec that replaces the standard feedforward up- and downsampling layers with Selective Up-sampling Back Projection (SUBP) and Selective Down-sampling Back Projection (SDBP) modules. Our proposed method efficiently preserves the information, on the one hand, and attains rich features from lower to higher layers of the network, on the other. Additionally, we propose a selective feature fusion block in the SUBP and SDBP to consolidate the input feature maps
Supercodec |
---|
-
Clone this repo:
git clone https://github.com/exercise-book-yq/Supercodec.git
-
CD into this repo:
cd Supercodec
-
Install python requirements:
pip install -r requirements.txt
# train
python train.py --config config_v1.json
# inference
python inferece.py --checkpoint_file [generator checkpoint file path]
Objective evaluation testing on our test set from VCTK at 16 kHz sampling rate. We compare our proposed method with existing various codecs trained with the same configuration.
Model | Bitrate | ViSQOL | STOI(%) | WARP-Q(↓) |
---|---|---|---|---|
Supercodec | 1 kbps | 3.118 | 84.80 | 2.219 |
TiCodec | 1 kbps | 2.490 | 80.21 | 2.578 |
HiFiCodec | 1 kbps | 2.060 | 75.19 | 2.840 |
EnCodec | 1 kbps | 2.202 | 76.53 | 2.687 |
Objective evaluation testing on our test set from VCTK at 24 kHz sampling rate. We compare our proposed method with existing various codecs trained with the same configuration.
Model | Bitrate | ViSQOL | STOI(%) | WARP-Q(↓) |
---|---|---|---|---|
Supercodec | 1.5 kbps | 3.322 | 85.61 | 2.147 |
TiCodec | 1.5 kbps | 2.639 | 79.03 | 2.539 |
HiFiCodec | 1.5 kbps | 2.026 | 76.80 | 2.761 |
EnCodec | 1.5 kbps | 2.202 | 79.81 | 2.569 |
All models are non-causal and trained on LibriTTS.