Skip to content

slifty/tvarchive-duplitron

Repository files navigation

The Duplitron 5000

This repository contains the Duplitron, a RESTful API that makes it possible to discover and manage repeated audio from a larger corpus. It was originally created to support the Internet Archive's Political Ad Archive.

The Duplitron takes in media (audio or video), compares it with existing media, and categorizes sections of the media based on the result of that comparison. It uses ffmpeg to manipulate the media files and then uses audfprint (via a docker image) to process the media files and identify segments of repeated content.

The API itself is powered by Laravel.

The code is written to support PSR-2

Dependencies

To run this code you need:

  • PHP >= 5.5.9
  • OpenSSL PHP Extension
  • PDO PHP Extension
  • Mbstring PHP Extension
  • Tokenizer PHP Extension
  • Composer for dependency management
  • Docker for running audfprint
  • ffmpeg
  • PSQL

Installing

  1. Install ffmpeg

  2. Install Docker for running audfprint

  3. Make it possible for your web provider to talk to docker. You can either create a docker group and add your web user to that group (easier), or set up TLS verification for docker (harder).

  4. Install Composer

  5. Clone the audfprint docker image and make sure it runs

    docker pull slifty/audfprint
    docker run slifty/audfprint
  6. Clone this repository into a living breathing live php enabled directory mod rewrite should be enabled

  7. Install composer dependencies

    	cd /path/to/your/clone/here
    	composer install
  8. Copy .env.example to .env in the root directory, then edit it

    	cd /path/to/your/clone/here
    	cp .env.example .env
    	vi .env
    • RSYNC_IDENTITY_FILE: a path to a private key that web root has 500 access to, with any media files you plan on importing
    • FPRINT_STORE: a path to the /storage/audfprint director inside of the repository
    • DOCKER_HOST: the location and port of the docker you set up for audfprint
  9. Install supervisor and enable the job queue.

    	cp duplitron-worker.conf.example /etc/supervisor/conf.d/fingerprinting-worker.conf
    	vi /etc/supervisor/conf.d/fingerprinting-worker.conf
    	sudo supervisorctl reread
    	sudo supervisorctl update
    	sudo supervisorctl start fingerprinting-worker:*

// Run ssh -fN -o"ControlPath none" -L 9999:tv-se.archive.org:8983 vm-home1.archive.org // so that requests are proxied through

Installing on a fresh copy of ubuntu 14.04...

sudo apt-get update
sudo apt-get -y install git apache2 postgresql php5-common libapache2-mod-php5 php5-cli curl php5-pgsql
sudo apt-get -y --force-yes install autoconf automake build-essential libass-dev libfreetype6-dev libsdl1.2-dev libtheora-dev libtool libva-dev libvdpau-dev libvorbis-dev libxcb1-dev libxcb-shm0-dev libxcb-xfixes0-dev pkg-config texinfo zlib1g-dev
sudo apt-get install yasm libx264-dev cmake mercurial libmp3lame-dev libopus-dev
mkdir ~/ffmpeg_sources
cd ~/ffmpeg_sources ; hg clone https://bitbucket.org/multicoreware/x265 ; cd ~/ffmpeg_sources/x265/build/linux ; PATH="$HOME/bin:$PATH" cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX="$HOME/ffmpeg_build" -DENABLE_SHARED:bool=off ../../source ; make ; make install ; make distclean
cd ~/ffmpeg_sources ; wget -O fdk-aac.tar.gz https://github.com/mstorsjo/fdk-aac/tarball/master ; tar xzvf fdk-aac.tar.gz ; cd mstorsjo-fdk-aac* ; autoreconf -fiv ; ./configure --prefix="$HOME/ffmpeg_build" --disable-shared ; make ; make install ; make distclean
cd ~/ffmpeg_sources ; wget http://storage.googleapis.com/downloads.webmproject.org/releases/webm/libvpx-1.5.0.tar.bz2 ; tar xjvf libvpx-1.5.0.tar.bz2 ; cd libvpx-1.5.0 ; PATH="$HOME/bin:$PATH" ./configure --prefix="$HOME/ffmpeg_build" --disable-examples --disable-unit-tests ; PATH="$HOME/bin:$PATH" make ; make install ; make clean
cd ~/ffmpeg_sources ; wget http://ffmpeg.org/releases/ffmpeg-snapshot.tar.bz2 ; tar xjvf ffmpeg-snapshot.tar.bz2 ; cd ffmpeg ; PATH="$HOME/bin:$PATH" PKG_CONFIG_PATH="$HOME/ffmpeg_build/lib/pkgconfig" ./configure  --prefix="$HOME/ffmpeg_build" --pkg-config-flags="--static" --extra-cflags="-I$HOME/ffmpeg_build/include" --extra-ldflags="-L$HOME/ffmpeg_build/lib" --bindir="$HOME/bin" --enable-gpl --enable-libass --enable-libfdk-aac --enable-libfreetype --enable-libmp3lame --enable-libopus --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-nonfree ; PATH="$HOME/bin:$PATH" make ; make install ; make distclean ; hash -r
sudo cp ~/bin/ffmpeg /usr/local/bin
sudo cp ~/bin/ffprobe /usr/local/bin
sudo apt-get install -y frei0r-plugins git python python-scipy python-pip python-matplotlib software-properties-common wget libfreetype6-dev libpng-dev pkg-config python-dev
sudo pip install -U distribute
sudo pip install docopt git+git://github.com/bmcfee/librosa.git joblib
cd /usr/local/src ; sudo wget --no-check-certificate http://www.mega-nerd.com/SRC/libsamplerate-0.1.8.tar.gz ; sudo tar xvfz libsamplerate-0.1.8.tar.gz ; cd libsamplerate-0.1.8 && sudo ./configure && sudo make && sudo make install
sudo pip install scikits.samplerate
cd /usr/local/src ; sudo git clone https://github.com/dpwe/audfprint.git
curl -sS https://getcomposer.org/installer | sudo php -- --install-dir=/usr/local/bin --filename=composer
cd /var/www ; sudo git clone https://github.com/slifty/tvarchive-fingerprinting.git
sudo chown -R $(whoami):www-data /var/www/tvarchive-fingerprinting/
cd /var/www/tvarchive-fingerprinting ; composer install
printf "<VirtualHost *:80>\n\tDocumentRoot /var/www/tvarchive-fingerprinting/public\n\tErrorLog \${APACHE_LOG_DIR}/error.log\n\tCustomLog \${APACHE_LOG_DIR}/access.log combined\n</VirtualHost>" | sudo tee /etc/apache2/sites-available/tvarchive-fingerprinting.conf
sudo chown -R $(whoami):www-data /var/www/tvarchive-fingerprinting/
sudo apt-get install supervisor
sudo cp /var/www/tvarchive-fingerprinting/fingerprinting-worker.conf.example /etc/supervisor/conf.d
sudo a2dissite 000-default
sudo a2ensite tvarchive-fingerprinting
sudo a2enmod rewrite
sudo apachectl restart
sudo -upostgres createdb tvarchive_fingerprinting
sudo -upostgres createuser tvarchive_fingerprinting -P
sudo -upostgres psql -c "GRANT ALL PRIVILEGES ON DATABASE tvarchive_fingerprinting to tvarchive_fingerprinting"
sudo chmod 777 -R /var/www/tvarchive-fingerprinting/storage

Next you have to update your apache config to AllowOverride all in the /var/www directory.

sudo vi /etc/apache2/apache2.conf

Next you have to set up your .env as appropriate.

cp /var/www/tvarchive-fingerprinting/.env.example /var/www/tvarchive-fingerprinting/.env
php /var/www/tvarchive-fingerprinting/artisan key:generate
vi /var/www/tvarchive-fingerprinting/.env

Some key values:

AUDFPRINT_PATH=/usr/local/src/audfprint/audfprint.py
DB_DATABASE=tvarchive_fingerprinting
DB_USERNAME=tvarchive_fingerprinting
FFMPEG_BINARY_PATH=/usr/local/bin/ffmpeg
FFMPEG_BINARY_PATH=/usr/local/bin/ffprobe
FPRINT_STORE=/var/www/tvarchive-fingerprinting/storage/audfprint/

Now migrate...

php /var/www/tvarchive-fingerprinting/artisan migrate:refresh

Finally, set up your supervisor processes

	cp duplitron-worker.conf.example /etc/supervisor/conf.d/fingerprinting-worker.conf
	vi /etc/supervisor/conf.d/fingerprinting-worker.conf
	sudo supervisorctl reread
	sudo supervisorctl update
	sudo supervisorctl start fingerprinting-worker:*

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages