Skip to content
This repository has been archived by the owner on Aug 8, 2024. It is now read-only.
/ svxlink Public archive
forked from sm0svx/svxlink

Enhanced svxlink Reflector fork with Voice Activity Detection (VAD) to eliminate 'Kerchunking' and improve transmission clarity for specified clients.

License

Notifications You must be signed in to change notification settings

BrainicHQ/svxlink

 
 

Repository files navigation

SvxLink Reflector with Anti Kerchunking VAD (Voice Activity Detection) Feature

This fork of the SvxLink project introduces an innovative Voice Activity Detection (VAD) functionality, specifically designed to enhance the user experience and operational efficiency for repeater and simplex channel operations. This feature adeptly addresses the prevalent issue of "kerchunking"—the act of unintentionally keying up a transmitter without transmitting meaningful voice communication, which leads to unnecessary repeater use and potential interference.

This addition represents a significant enhancement to the SvxLink project, leveraging sophisticated machine learning models to discern between intentional voice communications and unintentional or noise-induced transmissions. By doing so, it not only optimizes bandwidth usage but also ensures clearer and more reliable communication channels for the ham radio community.

Note: This feature is my unique contribution to the SvxLink project, providing advanced, user-friendly voice activity detection for the amateur radio community’s benefit.

Implementation

The VAD feature is implemented using a sophisticated machine learning model, leveraging the ONNX runtime for efficient real-time audio processing. This model analyzes incoming audio streams to detect the presence of human speech with high accuracy, thereby enabling the system to differentiate between intentional voice communication and unintentional transmissions or background noise.

Key components of the VAD implementation include:

  • Opus Audio Decoding: Converts Opus-encoded audio data into PCM format, preparing it for analysis by the VAD model.

  • PCM to Float Conversion: Transforms PCM audio data into floating-point format, normalizing the audio samples for consistent processing.

  • Machine Learning Model: A pre-trained ONNX model that evaluates audio data to determine the presence of voice activity.

  • Dynamic Thresholding: Utilizes a configurable threshold to determine voice activity, allowing for fine-tuning based on environmental conditions and user preferences.

Benefits

Integrating VAD into SvxLink provides several key benefits:

  • Reduced Bandwidth Usage: By filtering out non-voice transmissions, the system conserves bandwidth, allowing for more efficient use of the available spectrum.

  • Enhanced Audio Quality: Minimizes the transmission of background noise and other non-speech elements, resulting in clearer communication channels.

  • Improved User Experience: Reduces the occurrence of unintentional transmissions, ensuring that the repeater is available for meaningful communications.

  • Operational Efficiency: Automates the process of monitoring and managing voice transmissions, reducing the need for manual intervention by repeater operators.

Installation

  1. Download the Silero VAD model file from the official repository: https://github.com/snakers4/silero-vad/blob/master/files/silero_vad.onnx

  2. Download the Microsoft ONNX Runtime library for your platform from the official repository releases: https://github.com/microsoft/onnxruntime/releases

  3. Extract the ONNX Runtime library directory

  4. Set the ONNXRUNTIME_ROOT_DIR environment variable to the path of the extracted ONNX Runtime library directory by running the following command:

export ONNXRUNTIME_ROOT_DIR=/path/to/onnxruntime

Configuration

Edit the svxreflector.conf file to enable and configure the VAD feature. The following parameters can be adjusted to customize the VAD functionality:

[VAD_SETTINGS]
# Enable or disable the VAD feature
IS_VAD_ENABLED=true
# Path to the Silero VAD model file
SILERO_MODEL_PATH=/home/silviu/silero-vad/files/silero_vad.onnx
# Sample rate of the audio stream in Hz for the VAD model (Do not change this value unless you know what you are doing)
SAMPLE_RATE=16000
# Number of samples in the audio stream that the VAD model processes at once (should be the same as the model's input size)
WINDOW_SIZE_SAMPLES=1536
# Threshold for the VAD model to consider a frame as speech (0.0 - 1.0) - the higher the value, the more strict the VAD model is
THRESHOLD=0.3
# Number of samples sent to the VAD model at once (should be a multiple of WINDOW_SIZE_SAMPLES) - the higher the value, the more accurate the VAD model is
PROCESSED_SAMPLE_BUFFER_SIZE=7680
# The gate sample size is the number of samples that the VAD model uses to determine if the audio stream is speech or not (should be a multiple of PROCESSED_SAMPLE_BUFFER_SIZE)
VAD_GATE_SAMPLE_SIZE=46080
# List of callsigns for which the VAD is enabled (comma-separated)
VAD_ENABLED_CALLSIGNS=client1,client2,client3
# Number of first milliseconds in buffer that are replaced with silence before sending the audio stream to the VAD model to minimize the false positives
START_SILENCE_REPLACEMENT_BUFFER_MS=90

OSS software used in this project:

Known issues:

As the application is single-threaded, the VAD model processing might block the audio stream processing, causing audio artifacts. Currently this can be improved by keeping the VAD model processing time as low as possible by using the smallest possible PROCESSED_SAMPLE_BUFFER_SIZE value and by using a powerful CPU.

Acknowledgements and Contributions

This Anti Kerchunking VAD feature is a unique contribution by Silviu Stroe (YO6SAY) (brainic.io) as part of a fork of the original SvxLink project after extensive research and many hours of debugging and testing.

The development and integration of this feature reflect my commitment to enhancing the amateur radio community’s experience and operational efficiency and the main reason of getting me as a licensed radio amateur.

Thanks for the testers that helped me in testing the feature during the development process: YO3DEL (Dan N.), YO6NAM (Răzvan M.)


SvxLink

Build Status Join%20Chat

The SvxLink Server is a general purpose voice services system, which when connected to a transceiver, can act as both an advanced repeater system and can also operate on a simplex channel. One could call it a radio operating system.

SvxLink is very extensible and modular. Voice services are implemented as modules which are isolated from each other. Modules can be implemented in either C++ or TCL. Examples of modules are:

  • Help  — A help system

  • Parrot  — Play back everything that is received

  • EchoLink  — Connect to other EchoLink stations

  • DtmfRepeater  — Repeater received DTMF digits

  • TclVoiceMail  — Send voice mail to other local users

  • PropagationMonitor — Announce propagation warnings from dxmaps.com

  • SelCall  — Send selective calling sequences by entering DTMF codes

  • MetarInformation  — Play airport weather information

  • Frn  — Connect to Free Radio Network (FRN) servers

Qtel

Qtel, the Qt EchoLink client, is a graphical application used to access the EchoLink network.

Resources

These are some of the resources connected to SvxLink:

About

Enhanced svxlink Reflector fork with Voice Activity Detection (VAD) to eliminate 'Kerchunking' and improve transmission clarity for specified clients.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 81.7%
  • Roff 5.0%
  • Tcl 4.4%
  • C 3.6%
  • CMake 2.1%
  • Shell 0.9%
  • Other 2.3%