Streaming Japanese Music on Google AIY Voice Kit

The main objective of this project is to listen to Japanese music on Raspberry Pi 3 at home and while driving my car. The motivation behind this project is both Google Home and Amazon Alexa have very limited options when it comes to Japanese music subscription services outside Japan, but I wanted my kids to have exposure to Japanese music on a regular basis.

To start the smart speaker, either say your own wake word (e.g. "Alexa") or push the arcade button. Once you finish talking, Google Speech API will convert it to text, my custom gradient boosting model will predict intent (e.g. stream a certain radio station, search and stream music on YouTube, increase volume, skip to next song, etc), and execute commands.

I used Google Speech API for ASR/speech-to-text, and scikit-learn for gradient boosting model to capture intent. Open JTalk is used for text-to-speech. Actual music streaming pieces are dependent on other people's hard work (e.g. Radiko script, youtube-dl, etc).

Requirements

Hardware

Raspberry Pi 3
Google AIY Voice Kit
Micro USB charger (1.5A)
Micro SD card

(I spent about $40 at Micro Center)

Software

Raspbian
Python >3.4
Google AIY
Google Cloud Platform subscription for Google Speech API
Open JTalk

Instructions

Set up Google AIY Voice Kit

Follow this tutorial to assemble hardware and set up Google Speech API.
Clone Google AIY repo on home directory.
git clone https://github.com/google/aiyprojects-raspbian.git
Overwrite aiyprojects-raspbian/src with the content of raspbian_aiy_smart_speaker on this repo, which includes Japanese language support for Google Speech API, text-to-speech, and my smart speaker code.
Enable service:
sudo mv my_cloudspeech.service /lib/systemd/system/
sudo systemctl enable my_cloudspeech.service

Configure Open JTalk

Install Open JTalk:
sudo apt-get update
sudo apt-get install open-jtalk open-jtalk-mecab-naist-jdic hts-voice-nitech-jp-atr503-m001
Download different voice:
wget https://sourceforge.net/projects/mmdagent/files/MMDAgent_Example/MMDAgent_Example-1.6/MMDAgent_Example-1.6.zip/download -O MMDAgent_Example-1.6.zip
unzip MMDAgent_Example-1.6.zip MMDAgent_Example-1.6/Voice/*
sudo cp -r MMDAgent_Example-1.6/Voice/mei/ /usr/share/hts-voice

Set up Radiko script and YouTube add-on

Install dependencies for Radiko:
sudo apt-get install rtmpdump swftools libxml2-utils libav-tools
Install mplayer for Radiko playback:
sudo apt-get install mplayer
Install YouTube add-on:
sudo pip3 install mps-youtube youtube-dl
Install vlc for YouTube playback:
sudo apt-get install vlc
Set vlc as the default player for mps-youtube:
mpsyt set player vlc, set playerargs, exit

Deploy machine learning model

Install dependencies for scikit-learn:
source env/bin/activate
sudo apt-get install liblapack-dev
sudo apt-get install build-essential python-dev python-setuptools python-numpy python-scipy libatlas-dev libatlas3gf-base
sudo pip3 install --user --install-option="--prefix=" -U scipy scikit-learn
sudo pip3 install pandas janome
Run gbt.py to build a model (alternatively, use 32-bit machine for model training)

DONE!

Machine learning model comparison

For capturing intent, I used Gradient Boosting (scikit-learn), XGBoost, and LSTM (keras/tensorflow). While LSTM with word embedding (trained on Japanese Wikipedia) had slightly higher accuracy, the model size was too big to deploy to a Raspberry Pi 3. After trial and error, I ended up using the Gradient Boosting model due to simpler deployment.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
model_development		model_development
raspbian_aiy_smart_speaker		raspbian_aiy_smart_speaker
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Streaming Japanese Music on Google AIY Voice Kit

Requirements

Instructions

Machine learning model comparison

About

Releases

Packages

Languages

yuibi/homemade_pi3

Folders and files

Latest commit

History

Repository files navigation

Streaming Japanese Music on Google AIY Voice Kit

Requirements

Instructions

Machine learning model comparison

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages