Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
make-csv.rb		make-csv.rb
split.rb		split.rb

How to create tasks for Amazon Mechanical Turk

Login.

Click on Create.
Using an existing project, at right click on Copy.
Edit the copy as needed.

Deprecated

Most of the rest of these instructions are obsolete. Instead, see the revised mcasr version.

Get a collection of recordings of speech, in .wav or .flac format.

Concatenate them, monophonic (channels 1 combines stereo channels into mono) and with a consistent sampling rate (-r 22050), into /tmp/a.wav.

Examples:

sox in/*.wav -r 22050 /tmp/a.wav channels 1
for f in in/*.flac; do sox "$f" in/"$( basename ${f%.flac}.wav )"; done; sox in/*.wav ...

(There's a bug in sox when directly concatenating .flac files, so convert those to .wav first.)

Remove silent intervals, and split the concatenation into monophonic clips of 1.25 seconds, in .mp3 and .ogg format.

./split.rb

This script reads /tmp/a.wav and writes /tmp/turkAudio.tar. It takes about 1 minute per hour of input.

On ifp-serv-03:

cd /workspace/speech_web/mc/
mkdir myTest; cd myTest
Into here, copy turkAudio.tar that was made by split.rb.
tar xf turkAudio.tar

Create a "Batch file."

./make-csv.rb 5001 > foo.csv

(The number 5001 is 1 more than the biggest filename, e.g., 05000.mp3.)

Submit the batch file to Mechanical Turk's Publish Batch.

If needed, first split the file into quarters (each starting with the original's first line), and submit it only one quarter at a time. That yields intermediate results more quickly, because one quarter of the clips get completed before any other clips start. It also lets you fund the account a little at a time.

If you like, as transcriptions trickle in, collect them and run PTgen on the results so far.
Click on Mechanical Turk's Manage results, Download csv.
mv Batch*.csv PTgen/test/myTest/batchfiles-raw
cat PTgen/test/myTest/batchfiles-raw/Batch*.csv > PTgen/test/myTest/data/batchfiles/myLanguageCode/batchfile/Batches-downloaded.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mturk

mturk

README.md

How to create tasks for Amazon Mechanical Turk

Login.

Deprecated

Get a collection of recordings of speech, in .wav or .flac format.

Remove silent intervals, and split the concatenation into monophonic clips of 1.25 seconds, in .mp3 and .ogg format.

On ifp-serv-03:

Create a "Batch file."

Submit the batch file to Mechanical Turk's Publish Batch.

Files

mturk

Directory actions

More options

Directory actions

More options

Latest commit

History

mturk

Folders and files

parent directory

README.md

How to create tasks for Amazon Mechanical Turk

Login.

Deprecated

Get a collection of recordings of speech, in .wav or .flac format.

Remove silent intervals, and split the concatenation into monophonic clips of 1.25 seconds, in .mp3 and .ogg format.

On ifp-serv-03:

Create a "Batch file."

Submit the batch file to Mechanical Turk's Publish Batch.