Godot Whisper

Features

Realtime audio transcribe.
Audio transcribe with recorded audio.
Runs on separate thread.
Metal for Apple devices.
OpenCL for rest.

How to install

Go to a github release, copy paste the addons folder to the samples folder. Restart godot editor.

Requirements

Sconstruct(if you want to build locally)
A language model, can be downloaded in godot editor.

AudioStreamToText

AudioStreamToText - this node can be used in editor to check transcribing. Simply add a WAV audio source and click start_transcribe button.

Normal times for this, using tiny.en model are about 0.3s. This only does transcribing.

NOTE: Currently this node supports only some .WAV files. The transcribe function takes as input a PackedFloat32Array buffer. Currently the only format supported is if the .WAV is AudioStreamWAV.FORMAT_8_BITS and AudioStreamWAV.FORMAT_16_BITS. For other it will simply not work and you will have to write a custom decoder for the .WAV file. Godot does support decoding it at runtime, check how CaptureStreamToText node works.

CaptureStreamToText

This runs also resampling on the audio(in case mix rate is not exactly 16000 it will process the audio to 16000). Then it runs every transcribe_interval transcribe function.

Initial Prompt

For Chinese, if you want to select between Traditional and Simplified, you need to provide an initial prompt with the one you want, and then the model should keep that same one going. See Whisper Discussion #277.

Also, if you have problems with punctuation, you can give it an initial prompt with punctuation. See Whisper Discussion #194.

Language Model

Go to any StreamToText node, select a Language Model to Download and click Download. You might have to alt tab editor or restart for asset to appear. Then, select language_model property.

Global settings

Go to Project -> Project Settings -> General -> Audio -> Input (Check Advance Settings).

You will see a bunch of settings there.

Also, as doing microphone transcribing requires the data to be at a 16000 sampling rate, you can change the audio driver mix rate to 16000: audio/driver/mix_rate. This way the resampling won't need to do any work, winning you some valuable 50-100ms for larger audio, but at the price of audio quality.

Video Tutorial

How to build

scons target=template_release generate_bindings=no arch=universal precision=single
rm -rf samples/godot_whisper/addons
cp -rf bin/addons samples/godot_whisper/addons

Contributors ✨

Thanks goes to these wonderful people (emoji key):

_{Dragos Daian}
💻

_{K. S. Ernest (iFire) Lee}
💻

This project follows the all-contributors specification. Contributions of any kind welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
.github/workflows		.github/workflows
bin		bin
include		include
scripts		scripts
src		src
thirdparty		thirdparty
.all-contributorsrc		.all-contributorsrc
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.gitignore		.gitignore
AUTHORS.md		AUTHORS.md
LICENSE		LICENSE
README.md		README.md
SConstruct		SConstruct
THIRDPARTY.txt		THIRDPARTY.txt
banner_godot_whisper.jpg		banner_godot_whisper.jpg
whisper_cpp.gif		whisper_cpp.gif
whisper_logo.png		whisper_logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Godot Whisper

Features

How to install

Requirements

AudioStreamToText

CaptureStreamToText

Initial Prompt

Language Model

Global settings

Video Tutorial

How to build

Contributors ✨

About

Releases 8

Packages

Contributors 6

Languages

License

V-Sekai/godot-whisper

Folders and files

Latest commit

History

Repository files navigation

Godot Whisper

Features

How to install

Requirements

AudioStreamToText

CaptureStreamToText

Initial Prompt

Language Model

Global settings

Video Tutorial

How to build

Contributors ✨

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 8

Packages 0

Contributors 6

Languages

Packages