Skip to content

HarshShah03325/ActivateAI

Repository files navigation

ActivateAI

The Real-Time assisted detection system provides the ability to detect the trigger word 'activate'. It is the technology that allows devices like Amazon Alexa and Google Home to wake up upon hearing a certain word. It is a Natural Language Processing Model developed with advanced Recurrent Neural Networks. Everytime it hears you say the word 'activate', it will produce a chiming sound.

🤖 Technology Stack

  • Framewoks: Keras, Tensorflow
  • Libraries : Pydub, Pyaudio, scipy, numpy, Matplotlib
  • 3 types of audio recordings are present
    • Positives
    • Negatives
    • Backgrounds
  • Positives are the audios which have the trigger word 'activate' that has to detected.
  • Negatives are the audios which contain words other than the trigger word.
  • Backgrounds are the audios which contain random noises
  • To generate training example, we insert positives and negatives into the backgrounds in a non-overlapping condition.
  • A microphone records little variations in air pressure over time, and it is these little variations in air pressure that your ear also perceives as sound. You can think of an audio recording is a long list of numbers measuring the little air pressure changes detected by the microphone.
  • It is quite difficult to figure out from this "raw" representation of audio whether the word "activate" was said. In order to help our sequence model more easily learn to detect triggerwords, we will compute a spectrogram of the audio.
  • Visual representation of frequencies of a given signal with time is called Spectrogram. In a spectrogram representation plot one axis represents the time, the second axis represents frequencies and the colors represent magnitude (amplitude) of the observed frequency at a particular time.
  • We compute the following spectogram from our training example.

  • The architecture of the model consists of 1-D convolutional layers, GRU layers, and dense layers.
  • The bottom most layer is a 1D Convolution layer.It converts the input of length 5511 timestamps into 1375 output timestamps.
  • Convolution layer is followed by batch normalization, activation and a drop-out layer.
  • GRUs(Gated recurrent units) are improved version of standard recurrent neural network. GRU aims to solve the vanishing gradient problem which comes with a standard recurrent neural network.
  • A unidirectional RNN is used rather than a bi-directional RNN, since we want to detect the trigger word immediately after its said.

https://github.com/HarshShah03325/ActivateAI.git
pip install -r requirements.txt
python main.py
  • The model shows an accuracy of 92.3 percent on the dev-set.
  • The relatively lower values of recall, precision and f1_score indicates that accuracy is not a good metric for evaluation.
  • Since the labels are heavily skewed with 0's, a neural network that just predicts 0's would get accuracy of around 90 percent.

About

NLP model which performs trigger word detection.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages