Skip to content

Google AIY Voice kit for listening to Japanese music using raspberry pi, scikit-learn, and Google Speech API

Notifications You must be signed in to change notification settings

yuibi/homemade_pi3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Streaming Japanese Music on Google AIY Voice Kit

The main objective of this project is to listen to Japanese music on Raspberry Pi 3 at home and while driving my car. The motivation behind this project is both Google Home and Amazon Alexa have very limited options when it comes to Japanese music subscription services outside Japan, but I wanted my kids to have exposure to Japanese music on a regular basis.

To start the smart speaker, either say your own wake word (e.g. "Alexa") or push the arcade button. Once you finish talking, Google Speech API will convert it to text, my custom gradient boosting model will predict intent (e.g. stream a certain radio station, search and stream music on YouTube, increase volume, skip to next song, etc), and execute commands.

I used Google Speech API for ASR/speech-to-text, and scikit-learn for gradient boosting model to capture intent. Open JTalk is used for text-to-speech. Actual music streaming pieces are dependent on other people's hard work (e.g. Radiko script, youtube-dl, etc).

Requirements

Hardware

(I spent about $40 at Micro Center)

Software

  • Raspbian
  • Python >3.4
  • Google AIY
  • Google Cloud Platform subscription for Google Speech API
  • Open JTalk

Instructions

Set up Google AIY Voice Kit

  1. Follow this tutorial to assemble hardware and set up Google Speech API.
  2. Clone Google AIY repo on home directory.
    git clone https://github.com/google/aiyprojects-raspbian.git
  3. Overwrite aiyprojects-raspbian/src with the content of raspbian_aiy_smart_speaker on this repo, which includes Japanese language support for Google Speech API, text-to-speech, and my smart speaker code.
  4. Enable service:
    sudo mv my_cloudspeech.service /lib/systemd/system/
    sudo systemctl enable my_cloudspeech.service

1

Configure Open JTalk

  1. Install Open JTalk:
    sudo apt-get update
    sudo apt-get install open-jtalk open-jtalk-mecab-naist-jdic hts-voice-nitech-jp-atr503-m001
  2. Download different voice:
    wget https://sourceforge.net/projects/mmdagent/files/MMDAgent_Example/MMDAgent_Example-1.6/MMDAgent_Example-1.6.zip/download -O MMDAgent_Example-1.6.zip
    unzip MMDAgent_Example-1.6.zip MMDAgent_Example-1.6/Voice/*
    sudo cp -r MMDAgent_Example-1.6/Voice/mei/ /usr/share/hts-voice

Set up Radiko script and YouTube add-on

  1. Install dependencies for Radiko:
    sudo apt-get install rtmpdump swftools libxml2-utils libav-tools
  2. Install mplayer for Radiko playback:
    sudo apt-get install mplayer
  3. Install YouTube add-on:
    sudo pip3 install mps-youtube youtube-dl
  4. Install vlc for YouTube playback:
    sudo apt-get install vlc
  5. Set vlc as the default player for mps-youtube:
    mpsyt set player vlc, set playerargs, exit

Deploy machine learning model

  1. Install dependencies for scikit-learn:
    source env/bin/activate
    sudo apt-get install liblapack-dev
    sudo apt-get install build-essential python-dev python-setuptools python-numpy python-scipy libatlas-dev libatlas3gf-base
    sudo pip3 install --user --install-option="--prefix=" -U scipy scikit-learn
    sudo pip3 install pandas janome
  2. Run gbt.py to build a model (alternatively, use 32-bit machine for model training)

DONE!

2

Machine learning model comparison

For capturing intent, I used Gradient Boosting (scikit-learn), XGBoost, and LSTM (keras/tensorflow). While LSTM with word embedding (trained on Japanese Wikipedia) had slightly higher accuracy, the model size was too big to deploy to a Raspberry Pi 3. After trial and error, I ended up using the Gradient Boosting model due to simpler deployment.

About

Google AIY Voice kit for listening to Japanese music using raspberry pi, scikit-learn, and Google Speech API

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published