ActivateAI

The Real-Time assisted detection system provides the ability to detect the trigger word 'activate'. It is the technology that allows devices like Amazon Alexa and Google Home to wake up upon hearing a certain word. It is a Natural Language Processing Model developed with advanced Recurrent Neural Networks. Everytime it hears you say the word 'activate', it will produce a chiming sound.

🤖 Technology Stack

Framewoks: Keras, Tensorflow
Libraries : Pydub, Pyaudio, scipy, numpy, Matplotlib

🎧 Generation of Data

3 types of audio recordings are present
- Positives
- Negatives
- Backgrounds
Positives are the audios which have the trigger word 'activate' that has to detected.
Negatives are the audios which contain words other than the trigger word.
Backgrounds are the audios which contain random noises
To generate training example, we insert positives and negatives into the backgrounds in a non-overlapping condition.

📈 Data preprocessing

A microphone records little variations in air pressure over time, and it is these little variations in air pressure that your ear also perceives as sound. You can think of an audio recording is a long list of numbers measuring the little air pressure changes detected by the microphone.
It is quite difficult to figure out from this "raw" representation of audio whether the word "activate" was said. In order to help our sequence model more easily learn to detect triggerwords, we will compute a spectrogram of the audio.
Visual representation of frequencies of a given signal with time is called Spectrogram. In a spectrogram representation plot one axis represents the time, the second axis represents frequencies and the colors represent magnitude (amplitude) of the observed frequency at a particular time.
We compute the following spectogram from our training example.

🌐 Recurrent Neural Network model

The architecture of the model consists of 1-D convolutional layers, GRU layers, and dense layers.
The bottom most layer is a 1D Convolution layer.It converts the input of length 5511 timestamps into 1375 output timestamps.
Convolution layer is followed by batch normalization, activation and a drop-out layer.
GRUs(Gated recurrent units) are improved version of standard recurrent neural network. GRU aims to solve the vanishing gradient problem which comes with a standard recurrent neural network.
A unidirectional RNN is used rather than a bi-directional RNN, since we want to detect the trigger word immediately after its said.

🛠️ Project Setup

https://github.com/HarshShah03325/ActivateAI.git

pip install -r requirements.txt

python main.py

Results

The model shows an accuracy of 92.3 percent on the dev-set.
The relatively lower values of recall, precision and f1_score indicates that accuracy is not a good metric for evaluation.
Since the labels are heavily skewed with 0's, a neural network that just predicts 0's would get accuracy of around 90 percent.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
models		models
raw_data		raw_data
README.md		README.md
chime.wav		chime.wav
create_train_data.py		create_train_data.py
dataset.py		dataset.py
evaluate.py		evaluate.py
main.py		main.py
model.py		model.py
requirements.txt		requirements.txt
settings.py		settings.py
tools.py		tools.py
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ActivateAI

🤖 Technology Stack

🎧 Generation of Data

📈 Data preprocessing

🌐 Recurrent Neural Network model

🛠️ Project Setup

Results

About

Releases

Packages

Languages

HarshShah03325/ActivateAI

Folders and files

Latest commit

History

Repository files navigation

ActivateAI

🤖 Technology Stack

🎧 Generation of Data

📈 Data preprocessing

🌐 Recurrent Neural Network model

🛠️ Project Setup

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages