Skip to content

Getting Started Amazon AWS

Davide edited this page Apr 13, 2016 · 25 revisions

Prerequisites

Make sure that the vagrant-aws-plugin is installed ($ vagrant plugin install vagrant-aws).

Download and extract the latest version of Idomaar

Access https://github.com/crowdrec/idomaar/releases/tag/v3.5.1-aws, download the package and unzip the content.

Prerequisites to run the Idomaar on aws:

You have the option of use the prebuilt images (AMI) publicly available on Oregon or to provision the instance starting from an Ubuntu image, default behaviour is to startup a basic Ubuntu image, if you want to start from a pre-built image (and speedup the process) use the following environment variable:
AMI_PREBUILT=true

The following parameters can be overwritten using environment variables:
KEYPAIR_NAME: default value 'idomaar', it specify the key name, it has to be one of the key present in "Key pairs" under "Network & security" section of EC2 console.
REGION: default value 'us-west-2', it specify the region where the instance will be instantiated, list of available regions at http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions
INSTANCE_TYPE: default value 't2.small', it specify the type of running instance, list of instance types at https://aws.amazon.com/ec2/instance-types/

Assuming ROOTDIR is the directory where you unzipped the Idomaar package.

  1. Start the computing environment
    a simplified computing environment using 0MQ interface can be run with:
    1. cd ROOTDIR/computingenvironments/01.linux/01.centos/01.mahout AMI_PREBUILT=true AWS_ACCESS_KEY_ID=<<YOUR ACCESS KEY>> AWS_SECRET_ACCESS_KEY=<<YOUR SECRET KEY>> AWS_PRIVATEKEY_PATH=~/PUBLICKEY.pem vagrant up --provider=aws

Retrieve the IP of the created machines.

  1. start the recommendation engine on the computing environment:
    1. AMI_PREBUILT=true AWS_ACCESS_KEY_ID=<<YOUR ACCESS KEY>> AWS_SECRET_ACCESS_KEY=<<YOUR SECRET KEY>> AWS_PRIVATEKEY_PATH=~/PUBLICKEY.pem vagrant ssh -c 'sudo /vagrant/algorithms/01.example/itembasedrec.sh start'

  2. Execute the Orchestrator The template evaluation, with Mahout and MovieTweetings data, can be executed with:
    1. cd ROOTDIR/datastreammanager
    2. AMI_PREBUILT=true AWS_ACCESS_KEY_ID=<<YOUR ACCESS KEY>> AWS_SECRET_ACCESS_KEY=<<YOUR SECRET KEY>> AWS_PRIVATEKEY_PATH=~/PUBLICKEY.pem vagrant up --provider=aws
    3. cd ROOTDIR
    4. AMI_PREBUILT=true AWS_ACCESS_KEY_ID=<<YOUR ACCESS KEY>> AWS_SECRET_ACCESS_KEY=<<YOUR SECRET KEY>> AWS_PRIVATEKEY_PATH=~/PUBLICKEY.pem ./idomaar.sh --new-topic --comp-env-address tcp://<<COMPUTING_ENV_IP>>:2760 --training-uri https://raw.githubusercontent.com/crowdrec/datasets/master/01.MovieTweetings/datasets/snapshots_10K/evaluation/training/data.dat --test-uri https://raw.githubusercontent.com/crowdrec/datasets/master/01.MovieTweetings/datasets/snapshots_10K/evaluation/test/data.dat
    Where COMPUTING_ENV_IP is the one retrieved at step 2.

You can also use data from s3 using an s3 uri as in the following example:
--training-uri s3://idomaar-test/demo/training-data.dat
--test-uri s3://idomaar-test/demo/test-data.dat

  1. Evaluation result
    At the end of the process you should be able to see the evaluation result like the following:
    INFO [datastream] Mode: recall@N
    INFO [datastream] Number of total elements in the GT: ...
    ...
    ...
    INFO [datastream] Mode: precision
    INFO [datastream] Precision: ...
    INFO [datastream] Mode: precision@N
    INFO [datastream] Precision@1: ...
    ...
    ...

The generic command line command is:
./idomaar.sh --new-topic --comp-env-address tcp://0MQHOSTNAME:0MQPORT --training-uri TRAINING_URI --test-uri TEST_URI

--new-topic create the topic in Kafka orchestrator
--comp-env-address is the hostname and port of the 0MQ instance of the computing environment
--training-uri URI for training data in idomaar format
--test-uri URI for test data in Idomaar format

Troubleshooting

AWS authorization error In order to decode, and visualize the description of the error, on linux you should use awscli console (python is a prerequisite):
$ sudo pip install awscli
$ aws configure
$ aws sts decode-authorization-message --encoded-message cat errorcode.txt``