GigaAM: the family of open-source acoustic models for speech processing

GigaAM

GigaAM (Giga Acoustic Model) is a Conformer-based wav2vec2 foundational model (around 240M parameters). We trained GigaAM on nearly 50 thousand hours of diversified speech audio in the Russian language.

Resources:

GigaAM-CTC

GigaAM-CTC is an Automatic Speech Recognition model. We fine-tuned the GigaAM Encoder with Connectionist Temporal Classification using the NeMo toolkit on publicly available Russian labeled data:

dataset	size, hours	weight
Golos	1227	0.6
SOVA	369	0.2
Russian Common Voice	207	0.1
Russian LibriSpeech	93	0.1

Resources:

The following table summarizes the performance of different models in terms of Word Error Rate on open Russian datasets:

model	parameters	Golos Crowd	Golos Farfield	OpenSTT Youtube	OpenSTT Phone calls	OpenSTT Audiobooks	Mozilla Common Voice	Russian LibriSpeech
Whisper-large-v3	1.5B	17.4	14.5	11.1	31.2	17.0	5.3	9.0
NeMo Conformer-RNNT	120M	2.6	7.2	24.0	33.8	17.0	2.8	13.5
GigaAM-CTC	242M	3.1	5.7	18.4	25.6	15.1	1.7	8.1

GigaAM-Emo

GigaAM-Emo is an acoustic model for Emotion Recognition. We fine-tuned the GigaAM Encoder on the Dusha dataset.

Resources:

The following table summarizes the performance of different models on the Dusha dataset:

		Crowd			Podcast
	Unweighted Accuracy	Weighted Accuracy	Macro F1-score	Unweighted Accuracy	Weighted Accuracy	Macro F1-score
DUSHA baseline (MobileNetV2 + Self-Attention)	0.83	0.76	0.77	0.89	0.53	0.54
АБК (TIM-Net)	0.84	0.77	0.78	0.90	0.50	0.55
GigaAM-Emo	0.90	0.87	0.84	0.90	0.76	0.67

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
GigaAM License_NC.pdf		GigaAM License_NC.pdf
README.md		README.md
README_ru.md		README_ru.md
gigaam_scheme.svg		gigaam_scheme.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GigaAM: the family of open-source acoustic models for speech processing

Table of contents

GigaAM

GigaAM-CTC

GigaAM-Emo

Links

About

Releases

Packages

at2me/GigaAM

Folders and files

Latest commit

History

Repository files navigation

GigaAM: the family of open-source acoustic models for speech processing

Table of contents

GigaAM

GigaAM-CTC

GigaAM-Emo

Links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages