Skip to content

Cloud AI live transcription and translation service plugin

License

Notifications You must be signed in to change notification settings

locaal-ai/cloudvocal

Repository files navigation

CloudVocal - Professional Cloud AI Transcription & Translation Plugin

GitHub GitHub Workflow Status Total downloads GitHub release (latest by date) GitHub stars Discord
Download:

Introduction

CloudVocal brings professional-grade cloud transcription and translation to your OBS streams and recordings. Powered by industry-leading cloud providers, it delivers exceptional accuracy and real-time performance for your live streaming needs. ✅ Professional-grade accuracy, ✅ support for 100+ languages, ✅ enterprise-level reliability, and ✅ blazing-fast performance!

If this plugin has been valuable consider adding a ⭐ to this GH repo, rating it on OBS, subscribing to my YouTube channel where I post updates, and supporting my work on GitHub, Patreon 🙏

CloudVocal integrates seamlessly with leading cloud providers to deliver enterprise-grade speech recognition and translation services. Simply configure your API credentials and start streaming with professional-quality captions and translations.

Features

Current Features:

  • Professional-grade transcription with industry-leading accuracy
  • Real-time translation using enterprise cloud translation services
  • Support for 100+ languages with dialect recognition
  • Streaming-optimized performance with minimal latency
  • Multiple cloud provider options for transcription and translation
  • Caption output in multiple formats (.txt, .srt)
  • Sync'ed captions with OBS recording timestamps
  • Direct streaming to platforms (YouTube, Twitch) with embedded captions
  • Partial transcriptions for a streaming-captions experience

Roadmap:

  • Custom vocabulary and pronunciation support
  • Professional terminology handling for specific industries
  • Advanced text filtering and customization options
  • Speaker diarization for multi-speaker environments
  • Advanced profanity filtering options
  • Custom translation glossaries
  • Additional subtitle format support
  • Enhanced analytics and caption quality metrics

Usage

Tutorial videos and screenshots - coming soon!

Download and Installation

Check out the latest releases for downloads and install instructions.

Configuration

  1. Download and install the appropriate version for your operating system
  2. Add CloudVocal as a filter to your audio source
  3. Configure your cloud provider credentials in the plugin settings
  4. Select your desired transcription and translation options
  5. Select an output text source for the captions and translations, send the captions to the stream or a file

Building

The plugin can be built on Windows, macOS, and Linux platforms. The build process is straightforward as all processing happens in the cloud.

Both Mac OSX and Linux rely on Conan for dependencies. Install Conan, e.g. pip install conan, and install the dependencies:

$ conan profile detect --force
$ conan install . --output-folder=./build_conan --build=missing -g CMakeDeps

Mac OSX

$ ./.github/scripts/build-macos --config Release

Linux

$ ./.github/scripts/build-linux

Windows

Windows also needs Conan for OpenSSL. Run conan to get the dependency (make sure to run conan on the conanfile_win.txt):

> pip install conan
> conan profile detect --force
> conan install .\conanfile_win.txt --output-folder=./build_conan --build=missing -g CMakeDeps 

Build the plugin:

> .\.github\scripts\Build-Windows.ps1 -Configuration Release

If you're developing the plugin, I find this command to be useful for direct deploymet into OBS after building:

> .\.github\scripts\Build-Windows.ps1 -Configuration RelWithDebInfo -SkipDeps && Copy-Item -Force -Recurse .\release\RelWithDebInfo\* "C:\Program Files\obs-studio\"

Other Plugins

Check out our other plugins:

  • LocalVocal for on-device real-time transcription and translation
  • Background Removal for removing background from live portrait video
  • Detect for real-time on-device object detection
  • CleanStream for real-time profanity filter
  • URL/API Source for real-time data integrations
  • Squawk for real-time on-device speech generation (text-to-speech)