chromecast_vad

A Keras implementation of a RNN voice activity detector to control Chromecast device volume.

The model, a two layers bidirectional LSTM followed by a dense layer, takes a spectrogram as input and output a single value (speech or no speech).

The client app connects to the chromecast device of your choice, waits for the music to start playing on this device and starts listening on the host computer. It listens for 2 seconds and sends the recorded audio data to a Flask web app for analysis. The web app sends the received audio data to the neural network for inference. Based on the prediction made by the neural network, the client app will decrease or increase the chromecast device's volume.

Dependencies

Preprocessing

python run_preprocessing.py:
Convert various length mp3/wav files into 2 seconds wav files (audio sampled at 44100 Hz, mono channel).

Training Set

python dataset.py
Convert preprocessed audio files in samples of 2 seconds wav files, X and Y numpy arrays.

X is the numpy array of a spectrogram with 101 frequencies.
Y is a 0/1 numpy array (speech or not)

Model Training

python run_experiments.py

Running

Build the docker image of the Flask app.py web service.

This web service take a 2s audio file as input, use the RNN model to predict there is speech in the audio files and returns a true/false prediction.

docker_build_image.sh
Create a Docker image with app.py over ufoym/deepo:keras-py36-cpu (a python 3.6, Keras on CPU image).

docker_run_webapp.sh
To start the web server we just built.

python chromecast_live_volume.py
To start listening and controlling the volume of the chromecast device of your choice.

Dataset

A synthesized dataset created from merging background noise, music and speech.

Common Voice by Mozilla
an open and publicly available dataset of voices that everyone can use to train speech-enabled applications.
QUT-NOISE-SRE Databases
D. Dean, A. Kanagasundaram, H. Ghaemmaghami, M. Hafizur, S. Sridharan (2015) “The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition”. In Proceedings of Interspeech 2015, September, Dresden, Germany.
Music played by a home speaker recorded using Audacity

Live Demo

Live Demo Output

$ python chromecast_live_volume.py

Looking for chromecast devices...

Connected to: Cuisine
initial volume = 0.45
no music playing
no music playing
no music playing
no music playing
> recording
  speech probability = 0.01
  steps_without_speech = 1
> recording
  speech probability = 0.01
  steps_without_speech = 2
> recording
  speech probability = 0.00
  steps_without_speech = 3
> recording
  speech probability = 0.05
  steps_without_speech = 4
> recording
  speech probability = 1.00
  *** SPEECH DETECTED ***
  set volume to 0.30
> recording
  speech probability = 1.00
  *** SPEECH DETECTED ***
> recording
  speech probability = 1.00
  *** SPEECH DETECTED ***
> recording
  speech probability = 1.00
  *** SPEECH DETECTED ***
> recording
  speech probability = 1.00
  *** SPEECH DETECTED ***
> recording
  speech probability = 0.09
  steps_without_speech = 1
> recording
  speech probability = 0.07
  steps_without_speech = 2
> recording
  speech probability = 0.03
  steps_without_speech = 3
  set volume to 0.45
> recording
  speech probability = 0.01
  steps_without_speech = 4
> recording
  speech probability = 0.01
  steps_without_speech = 5

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data		data
models		models
notebooks		notebooks
preprocessing		preprocessing
raw_data		raw_data
scripts		scripts
test		test
.dockerignore		.dockerignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
chromecast_live_volume.py		chromecast_live_volume.py
dataset.py		dataset.py
docker_build_image.sh		docker_build_image.sh
docker_push_image.sh		docker_push_image.sh
docker_run_webapp.sh		docker_run_webapp.sh
requirements.txt		requirements.txt
run_experiments.py		run_experiments.py
run_experiments2.py		run_experiments2.py
send_audio.py		send_audio.py
vad_model.py		vad_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

chromecast_vad

Dependencies

Preprocessing

Training Set

Model Training

Running

Dataset

Live Demo

Live Demo Output

About

Releases

Packages

Languages

menardai/chromecast_vad

Folders and files

Latest commit

History

Repository files navigation

chromecast_vad

Dependencies

Preprocessing

Training Set

Model Training

Running

Dataset

Live Demo

Live Demo Output

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages