Skip to content

Latest commit

 

History

History
92 lines (72 loc) · 3.46 KB

README.md

File metadata and controls

92 lines (72 loc) · 3.46 KB

audio.vadwebrtc

This repository contains an R package which is an Rcpp wrapper around the webrtc Voice Activity Detection module.

example-vad.mp4

The package was created with as main goal to remove non-speech audio segments before doing an automatic transcription using audio.whisper to avoid transcription hallucinations. It contains

  • functions to detect the location of voice in audio using a Gaussian Mixture Model implemented in webrtc
  • functions to extract audio where there is voice / silence in a new audio file
  • functionality to rewrite the timepoints of transcribed sentences where specific sections with non-audio are removed to make sure the timepoints of the transcriptions without silences align with the original audio signal

Installation

  • The package is currently not on CRAN
  • For the development version of this package: remotes::install_github("bnosac/audio.vadwebrtc")

Look to the documentation of the functions: help(package = "audio.vadwebrtc")

Example

Get a audio file in 16 bit with mono PCM samples (pcm_s16le codec) with a sampling rate of either 8Khz, 16KHz or 32Khz

library(audio.vadwebrtc)
file <- system.file(package = "audio.vadwebrtc", "extdata", "test_wav.wav")
vad  <- VAD(file, mode = "normal")
vad
Voice Activity Detection 
  - file: D:/Jan/R/win-library/4.1/audio.vadwebrtc/extdata/test_wav.wav 
  - sample rate: 16000 
  - VAD type: webrtc-gmm, VAD mode: normal, VAD by milliseconds: 10, VAD frame_length: 160
    - Percent of audio containing a voiced signal: 90.2% 
    - Seconds voiced: 6.3 
    - Seconds unvoiced: 0.7
vad$vad_segments
 vad_segment start  end has_voice
           1  0.00 0.08     FALSE
           2  0.09 3.30      TRUE
           3  3.31 3.71     FALSE
           4  3.72 6.78      TRUE
           5  6.79 6.99     FALSE

Example of a simple plot of these audio and voice segments

library(av)
x <- read_audio_bin(file)
plot(seq_along(x) / 16000, x, type = "l", xlab = "Seconds", ylab = "Signal")
abline(v = vad$vad_segments$start, col = "red", lwd = 2)
abline(v = vad$vad_segments$end, col = "blue", lwd = 2)

Or show it interactively alongside R package wavesurfer: wavesurfer

library(wavesurfer)
library(shiny)
file <- system.file(package = "audio.vadwebrtc", "extdata", "test_wav.wav")
vad  <- VAD(file, mode = "lowbitrate")
anno <- data.frame(audio_id = vad$file, 
                   region_id = vad$vad_segments$vad_segment, 
                   start = vad$vad_segments$start, 
                   end = vad$vad_segments$end, 
                   label = ifelse(vad$vad_segments$has_voice, "Voiced", "Silent"))
anno <- subset(anno, label %in% "Silent")
  
wavs_folder <- system.file(package = "audio.vadwebrtc", "extdata")
shiny::addResourcePath("wav", wavs_folder)
ui <- fluidPage(
  wavesurferOutput("my_ws", height = "128px"),
  tags$p("Press spacebar to toggle play/pause."),
)
server <- function(input, output, session) {
  output$my_ws <- renderWavesurfer({
    wavesurfer(audio = paste0("wav/", "test_wav.wav"), annotations = anno) %>%
      ws_set_wave_color('#5511aa') %>%
      ws_cursor()
  })
}
shinyApp(ui = ui, server = server)

Support in text mining

Need support in text mining? Contact BNOSAC: http://www.bnosac.be