A .NET Standard adapter for the WebRTC voice activity detection (VAD) component. The WebRTC VAD uses a Gaussian Mixture Model to detect speech, typically with better performance than the more common energy threshold model.
See below for a brief overview, or visit the wiki for more in-depth documentation.
WebRtcVadSharp is available on NuGet:
Install-Package WebRtcVadSharp
This will install the .NET Standard adapter (WebRtcVadSharp.dll) and an unmanaged library (WebRtcVad.dll) containing the supporting WebRTC algorithms.
In the simplest case, you just need to instantiate a
WebRtcVad object and supply it with a byte[]
of raw audio.
bool DoesFrameContainSpeech(byte[] audioFrame)
{
using var vad = new WebRtcVad();
return vad.HasSpeech(audioFrame, SampleRate.Is8kHz, FrameLength.Is10ms);
}
ℹ️ Note that WebRtcVad implements IDisposable
, so a using
block is necessary.
ℹ️ This library (and WebRTC itself) only supports raw, 16-bit linear PCM audio, and will not work with WAV files or other container formats. For hints on converting your audio, see issue #6.
The underlying VAD code can be configured along three axes:
- Frame length: 10ms, 20ms and 30ms frames are supported.
- Sample rate: 8kHz, 16kHz, 32kHz and 48kHz sample rates are supported.
- Operating mode: four levels of "aggressiveness" are supported.
These options may be set via properties on the WebRtcVad object. More documentation on each is available in the wiki.
The code in this repository dual licensed. The .NET code and the DLL exports are covered by the MIT license, while the WebRTC code — imported from the WebRTC repository — is licensed under Google's WebRTC license.