Skip to content

Getting started YandexToloka

Aleksandr Nedorezov edited this page Jul 12, 2017 · 13 revisions

Getting-Started with LabelThem on Yandex.Toloka

Introduction

This guide will walk you through the process of setting up and running an instance of LabelThem on Yandex Toloka crowdsourcing platform.

Create a requester account

We will start by creating an account in Toloka. If you already have an account please skip to configuration

There are two types of users in Yandex Toloka - Performers and Crowdsourcers. Performers are your labor forces responsible for completing the tasks and crowdsourcers are the ones who post the tasks. Since we want to host a task we will create a crowdsourcer account.

  1. Go to Yandex Toloka and click on "Crowdsourcers" at the top. NOTE: Yandex Toloka provides a sandbox version where you are able to test and verify your projects. We suggest you start there first and the transfer your project to the production environment later. Link to Yandex Toloka Sandbox.

  2. Click on "Register". You can sign in if you already have a Yandex account, otherwise, click on "Create account" and complete the account registration process. After that you will be automatically taken to Toloka Sign-up page.

  3. On the Toloka sign-up page, select the account type that is relevant for you (either individual or legal-entity). NOTE: This choice cannot be edited later! It also defines the available ways of your account depositing.

  4. Complete the rest of the fields and click "Next". You will be taken to the user-agreement. Read it through and accept.

  5. Finally, you are ready to create your first application!

Deploying LabelThem to Yandex Toloka

Create and configure a new project in Yandex Toloka:

Now since you have an account on Toloka it is time to create the task and deploy LabelThem.

  1. Click on "Create project": new_prj.png

  2. In the template selection view, scroll to the bottom and select Blank: blank_prj.png

  3. Select a name for you project and add a description. HINT: Try adding a short guideline to make your project more user-friendly

  4. Now, scroll down to the "SPECIFICATIONS" section and click on the angular brackets icon: brackets.png

  5. Paste the following code into "Input Data" field:

{
  "image_rel": {
    "type": "string",
    "required": true
  },
  "json_params": {
    "type": "string",
    "required": true
  }
}
  1. Paste the following code into "Output Data" field:
{
  "result": {
    "type": "string",
    "required": true
  }
}
  1. Scroll to the Task Interface section. Replace the contents of html field with the contents of main.html file located in PROJECT_ROOT\front folder:

NOTE: You can download the source files that are needed to run the system on Yandex Toloka crowdsourcing platform (main.html, app.js, and concat.min.css) from the "releases" repository page.

NOTE: If you want to get latest commited features and bugfixes, you can retrieve systems source code from the develop-toloka branch, and generate source files for Yandex.Toloka on your own. For the instructions on this process, please refer to this guide.

html.png

  1. Similarly, replace the contents of js field with the contents of app.js file located in PROJECT_ROOT\build folder and the contents of css field with the contents of concat.min.css located in PROJECT_ROOT\build\css folder

  2. If you click on preview, you should see something similar: toloka_preview.png Don't worry about empty blocks and absence of images - we will fix this later.

  3. Click save at the bottom of the page

The next step is to connect yandex disk to your project

Connect Yandex Disk

The next step is to connect Yandex Disk where your data will be hosted.

  1. Open "Profile" tab at the top of the page

  2. Open "External Services Integration"

  3. Click on "Connect Yandex Disk" (Need to add screenshots, I cannot do this since I have already connected). You will be taked to the confirmation window.

  4. Click on "Add proxy" and fill in all necessary fields:

add_proxy.png

Unique Name will be used later in TSV generation, so take a note of it.

Folder Name - a folder with this name will be created on Yandex Disk. More specifically, this folder will be located at /Applications/Яндекс.Толока/Folder Name (/Applications/Яндекс.Толока/LT-DataFolder in the example)

  1. Click save and make sure that a folder with a given name was created on Yandex Disk

disk_folder.png

  1. Upload your data to the newly created folder. Keep in mind that this folder should not contain any subfolders

The next step is to add a new pool

Adding a pool

  1. Go to "Projects" tab

  2. Select your project

  3. Select "Add Pool":

add_pool.png

  1. Here you can specify name and description for your pool and configure parameters. Key parameters are price per task and time on a single task. Leave "Training" and "Level Required" empty

pool_cong.png

  1. Finally, configure Speed/Quality trade-off by dragging slider to the left (more speed) or to the right (more users)

  2. Hit "Save" once you are satisfied with your settings

The next step is to connect the data on Yandex Disk with newly created pool.

Generating a TSV

TSV file is required to connect the data that you uploaded to the folder on Yandex Disk with the pool used by the task.

To generate a TSV file, we provide a script in Python (the script is tested with Python 2.7 and Python 3.5). To use the script, you need to specify the location of your data folder and the location of your JSON file with parameters. Note that each time you modify the data or modify the parameters, you will need to re-run the script and create a new pool.

NOTE: The scripts supports only latin letters and names.

Next, we will walk you through the process of setting up the tsv file and uploading it to Yandex Toloka.

  1. To generate TSV, run the generate_tsv.py script located in PROJECT_ROOT\utils folder. To do so, in your terminal execute the following command (it assumes that you have cd'd to the PROJECT_ROOT folder and have python 2.7 or 3.5 installed):

python utils/generate_tsv.py -d path/to/data/folder -p path/to/json/params -i toloka-id

For the script to work, you need to supply a few parameters: path to the data (may be relative, must point to the folder (on your local machine) that you uploaded to Yandex Toloka. All of this folders content will be specified in a generated .tsv file), path to json with params (also may be relative) (the structure of the json with classes and parameters is described on "Classes and parameters JSON description" wiki page) and id of the folder that you created in Yandex Toloka. The first two params are straightforward, however, the last one is generated by connecting Yandex Disk to Toloka. If you forgot the id of your folder, navigate to profile and click on External Services Integration. Find a Proxy list section and select the id that you generated:

toloka_id.png

Keep in mind, that don't need the whole string, only the highlighted part!

A sample call of the script may look as follows:

python utils/generate_tsv.py -d ../archive/ -p front/json/classesandparameters.json -i LT-Data

This script will generate a tasks.tsv file in the folder from which it was executed.

  1. Uploading the TSV. Navigate to Projects tab in Yandex Toloka and select your project

  2. Select a pool or create a new one

  3. In the pool viwe, click on Upload

ul_task_2.png

  1. In the pop-up menu select Set Manually and specify 1 task per page

ul_task_1.png

  1. The task will be uploaded to Yandex Toloka. To verify that everything works, click on Preview button and make sure that you images and params are displayed correctly:

NOTE: you cannot add annotations while in preview mode!

pool_preview.png

  1. Now you are ready to run your first task! Simply click on Play button in the pool view and your task will be available to users in Yandex Toloka!

NOTE: Keep in mind that if a user starts a task and then abandons it without completion before running out of time, this task becomes active for that user for the remainder of the time. In case you decide to make the task unavailable (e.g. by closing an active pool), this task will still be accessible by users, who have this task as active.

  1. If you want to download the results, click on "Download results" button in the upper right corner and select which field to include

Happy annotating!