Skip to content
Alice Minotto edited this page Sep 13, 2017 · 14 revisions

Register and run apps with Agave API


Registering an App

Preliminary steps

To perform the following steps you need to sign up for a CyVerse account here.

1 - Setting up Agave API Access

See the FAQ.

2 - System registration (skip if using EI hardware)

Agave is a RESTful API, meaning that we can interact with it using POST and GET http requests. The cyverse-cli tools are essentially wrappers around these types of requests to make these requests shorter.

Agave tracks two kinds of resources: Systems and Apps. There are 2 types of Systems: Storage and Execution. Apps run on Execution Systems using data from Storage Systems to produce desired results. So to run an app, we need an Execution system first. Systems are described using JSON files, which are then posted to the API. An Execution System JSON consists of 4 parts, which will be described below.

Execution System JSON - System Basics

The first part consists of system basics: id, type etc. See an example below:

Expand source

"id"            : "myTutorialMachine",
"name"          : "A machine for the EI Agave tutorial",
"type"          : "EXECUTION",
"executionType" : "CLI",
"scheduler"     : "FORK",

The variables mostly speak for themselves. The executionType variable can be either CLI, CONDOR or HPC depending on the type of scheduler running on the system. In this case, we assume there is no scheduler running and so we choose CLI, with FORK as a scheduler. See the Agave docs for more details on the Scheduler variables.

Execution System JSON - Storage

All Execution systems need to define storage as scratch space. For this example, we'll assume you have a scratch directory mounted somewhere on /mnt/ (an SSHFS for example)

Expand source

"storage": {
  "host" : "yourhost.example.org",
  "port" : 22,
  "protocol" : "SFTP",
  "homedir" : "/mnt/scratch/username",
  "rootdir" : "/mnt/scratch",
  "auth" : {
    "type": "PASSWORD",
    "username"  : "username",
    "password" : "password"
  },
}

If you are uncomfortable with putting your password in plaintext, see below for specifying an auth object with SSHKEYS.

Execution System JSON - Queues

All execution systems need a default Queue to which jobs are submitted. In our example, we are using a simple CLI system, so there are no scheduler queues that we need to deal with. This means we can get away with a simple specification like this:

Expand source

"queues": [ { 
    "name": "normal", 
    "default": true,
    "maxRequestedTime": "24:00:00",
    "maxJobs": 10, 
    "maxUserJobs": 5, 
    "maxNodes": 1,
    "maxMemoryPerNode": "4GB", 
    "maxProcessorsPerNode": 12,
    "customDirectives": null 
} ]  

You'll want to change the variables to suit your system.

Execution System JSON - Login

Lastly, Agave will need to know information to login to the Execution system. This can be specified using a Login object, which is specified as follows:

Expand source

"login": {
  "host"    : "yourhost.example.org",
  "port"    : "22",
  "protocol": "SSH",
  "auth"    : {
    "type"      : "PASSWORD",
    "username"  : "username",
    "password"  : "changethis"
  }
}

Like mentioned before, posting your password in plaintext is usually a bad idea. We can specify a login object using public and private keys as well. To do this we'll change the auth part of the object as follows:

Expand source

"auth" {
  "type"       : "SSHKEYS",
  "username"   : "username",
  "publicKey"  : "ssh-rsa AAAA...your public key... username@yourhost.example.org",
  "privateKey" : "-----BEGIN RSA PRIVATE KEY-----*private key here*-----END RSA PRIVATE KEY-----"
}

An important thing to note when using keypairs is that your private key should be JSON encoded before pasting it into the JSON file using the jsonpki command:
jsonpki --private /path/to/private/id_rsa
If necessary, a password for the file can be specified using --password.

Registering the execution system

Now that we have defined our system, you can find the completed JSON file below:

Expand source

{
  "id"            : "myTutorialMachine",
  "name"          : "A machine for the TGAC Agave tutorial",
  "type"          : "EXECUTION",
  "executionType" : "CLI",
  "scheduler"     : "FORK",

"storage": { "host" : "yourhost.example.org", "port" : 22, "protocol" : "SFTP", "homedir" : "/mnt/scratch/username", "rootdir" : "/mnt/scratch", "auth" : { "type": "PASSWORD", "username": "username", "password": "changethis" }, },

"queues": [ { "name": "normal", "default": true, "maxRequestedTime": "24:00:00", "maxJobs": 10, "maxUserJobs": 5, "maxNodes": 1, "maxMemoryPerNode": "4GB", "maxProcessorsPerNode": 12, "customDirectives": null } ],

"login": { "host" : "yourhost.example.org", "port" : "22", "protocol": "SSH", "auth" : { "type" : "PASSWORD", "username" : "username", "password" : "changethis" } } }

Let's use it to register the system on Agave:
systems-addupdate -v -F TutSystem.json
A large amount of JSON describing our new system will be returned to confirm the registration. Now that we have an execution system, let's move on to registering our workflow as an App in the next part.

App registration

An App in the Agave API means a workflow that is wrapped into a single unit which can be executed by a user. It is described in the same way a system is described: JSON. In this part we'll register a test app that runs a simple BLAST job.

App JSON - Front matter

The first thing we'll need to describe are some basic parameters of our app:

Expand source

"name"          : "blastapp-tutorial",
"label"         : "EI tutorial BLAST app",
"version"       : "0.0.1",
"executionType" : "CLI",

The App ID will be generated from the name and version number and this combination must be unique. You can delete the previous one (if there was an error), or increase the version number (if you need to make an updated version).
Next, we'll specify where and how the app will run:

Expand source

"executionSystem"  : "myhost.example.org",
"deploymentPath"   : "username/apps/EI_tutorial",
"templatePath"     : "wrapper.sh",
"testPath"         : "test.sh",
"parallelism"      : "SERIAL",

When specifying an executionSystem only like above, you must make sure your app assets are already present on the system!. This means that you need admin access to your execution system. Often this is not the case. To remedy this, we can store our apps assets on the CyVerse Datastore and specify a deploymentSystem parameter like so:
"deploymentSystem" : "data.iplantcollaborative.org",
If you are planning to publish your app with CyVerseUk we'd ask to add the "ontology" fields with a list of EDAM URI and the "tag" : [ "CyverseUK"]. You can easily see some complete JSON examples in this organization's repositories.
Finally, we'll specify our apps inputs:

Expand source

"inputs" : [ {
    "id": "query",
    "details" : {
      "label": "Query" ,
      "description": "FASTA file with query sequence(s)"
    },
    "value": { "required" : "true" }
  },
  {
    "id": "database",
    "details" : {
      "label": "Database" ,
      "description": "FASTA file with sequences to search (database)"
    },
    "value": {"required" : "true"}
  }
],
"parameters" : [ ]

We're leaving parameters empty, but we could add any BLAST command line parameters here. Now that we have specified this, we'll have to actually upload our app's assets to CyVerse.

Storing App assets with CyVerse

We'll upload data to the datastore using the Discovery Environment (DE), however, the CyVerse datastore uses iRods under the hood, so you could use icommands as well. For more details, see the CyVerse wiki.

First, login to the DE at [https://de.iplantcollaborative.org/]. You'll be presented with a desktop like environment. Click on the "Data" button. This will open up a file manager window, with a file tree on the left hand side. Here, click on the folder with your username (at the top). We'll create a new folder to hold our apps first. Go to "File" and select "New Folder...". Name the new folder "EI_tutorial" and click "OK" to confirm. Navigate to our newly created folder by clicking on it. This is where our app's assets will live, which we'll create in the next sections.

To develop an app on CyVerseUK system it would be a good idea to make the assets live in our systems.

Creating the App assets

For our minimal BLAST app, we'll need three files: a wrapper script, a test script and an executable. Because of the way BLAST works, we'll actually need two executables for this app. First we'll create the wrapper script:

Expand source

#!/bin/bash

QUERY="${query}" DATABASE="${database}"

#These two lines are necessary because permissions get lost in the Agave transfer chmod u+x lib/makeblastdb chmod u+x lib/blastn

lib/makeblastdb -dbtype nucl -in $DATABASE -out db lib/blastn -query $QUERY -db db

return $!;

As you can see from the first line, this is a plain bash script that runs our pipeline. The next two lines set up our main two parameters: the query and the database. The ${query} directive will be replaced BEFORE execution of the script by Agave to the inputs we have given. Note that the word query is the id we specified in our JSON file earlier. The next line does the same for the database file.
The next two lines run our actual BLAST 'pipeline': first we create our database with makeblastdb and we then execute the BLAST with blastn. The lib/ part of the command line is because of the way we will set up our app assets; Agave convention requires that all our App's executables are stored in a separate lib directory. We output the database in the first line with a simple title of db, and we call that database again in the next line.
The last line returns the current exit status, which will be inherited from the status of BLAST; this means that the script will pass on BLAST's exit value as its own.
Next, we'll need a test script that test our app with some default data. This is useful, but we'll skip this for now as it is a bit out of this tutorials' scope. Instead, we'll just write a script that returns true and call it done:

#!/bin/bash
return true

Finally we'll need to provide the BLAST executables. These can be obtained from the NCBI ftp server.
Now that we have everything, let's get our assets setup in the datastore. Go back to your DE window, and go the the #EI_tutorial folder under your username (if you weren't already there). Create a folder called lib, and navigate to it. We'll put our BLAST executables here. Go to the "Upload" menu on the top left-hand corner of the file navigation window. The easiest way is to upload the executables from this repo directly, so choose "Import from URL...".

The wrapper script should perform all the checks that the Agave API doesn't support (mutually inclusive or exclusive parameters for example), and ideally return the proper error before running the Docker container. It may be useful to use the wrapper script to delete any new files that is not needed from the working directory, to avoid them to be archived.
In our case there is some additional logic in the wrapper scripts to allow some automatic tasks in the virtual machines to perform as expected and to integrate the system with the webapp (CyVerseUk). You don't usually have to worry about this.

Registering App in Agave

Now that our assets are in place, we can register our app in Agave using the JSON file we wrote earlier. (If needed, refresh your access tokens with auth-tokens-refresh). Navigate to where the file is stored (we'll assume you've named it TutApp.json) and run the apps-addupdate command:
apps-addupdate -v -F TutApp.json
A bunch of JSON describing your app will be returned, confirming the registration of our app.


Additional notes on the JSON file

Following the introductory part the JSON file lists inputs and parameters. A good documentation about the available fields and their usage can be found here.
For the application (if you wish to publish it) to display a proper information window in the Discovery Environment, the following fields need to be present in the JSON file: help_URI, datePublished, author, longDescription.
In the ontology field a list of IRI for topic and operation branches of the EDAM ontology has to be specified to properly categorize the App.

May you encounter some problems registering your application, I'd suggest first checking the JSON file is valid. A good way to do this is to copy-paste it to AgaveToGo.

If details.showArgument (boolean) is set to true, it will pass details.argument before the value (e.g. if we want to pass to command line --kmer 31). Note that the argument is put before the value without spaces (so usually we want to add one in the string!!).
value.validator can supply a check on the format of the submitted value as a perl formatted regular expression. (pay particular attention to the escapes)
Example case: JSON value.type doesn't provide a distinction between integers and floating point, but just number. To check the input is an integer we may use "validator": "^\\d*$" (or "^[0-9]+$" to avoid the escapes). The same field also allow to accept just even/odd numbers, set a maximum value, etc.
Also note that it may be useful to define numerical variables as strings providing the right validator if we don't want to define a default value, because both the Discovery Environment and the CyVerseUk web interface will pass 0 otherwise.
We usually don't want the user to work in a folder that is not the set working directory, so if the program run by the App has a --output_directory option (or similar) we may want to add a validator to be sure that the string doesn't start with '/', or just hide it and give a default name (e.g. output, this will also make the wrapper script easier to write and maintain).

IMPORTANT:

"value": {
    ...
    "visible": false,
    "default": "default_value",
    ...
}

is NOT supported. The default value must be provided in the wrapper script if we don't want the user to be able to change it.


Docker integration

It's not possible to run an App in CyVerse interactively. Therefore to run multiple commands in a Docker container we need the following syntax in the wrapper.sh script:

docker run <image_name[:tag]> /bin/bash -c "command1;command2...;".

/bin/bash is not strictly necessary, but, depending on the base image, bash may not be the default shell: adding it to command line takes care of this problem.

IMPORTANT UPDATE: in Docker version 1.12 the SHELL instruction was added. This allows the default shell used for the shell form of commands to be overridden (at build time too-so it may make the built a bit slower). Use it as follows:
SHELL ["/bin/bash", "-c"]


Condor integration

The HPC on CyVerseUk infrastructure is using HTCondor scheduler, so the wrapper.sh is not enough to run the app, but a HTCondorSubmit.htc script is needed as well.
The HTCondorSubmit.htc file will be in the following form:

Expand source

universe                = docker
docker_image            = [:tag]
executable              = 
should_transfer_files   = YES
arguments               = 
transfer_input_files    = 
transfer_output_files   = 
when_to_transfer_output = ON_EXIT
request_memory          = 100G
output                  = out.$(Process)
error                   = err.$(Process)
log                     = log.$(Process)

queue 1

This HTCondor submit has to be generated by the wrapper.sh since we can't know in advance arguments and inputs files.
transfer_output_files is not needed if the output is in the working directory. A good idea is to create, when possible, all the output files in a subdirectory (e.g. output) of the working directory, so that the transfer is easier.
If transferring executables in transfer_input_files, make sure to restore the right permissions in the wrapper script (e.g. chmod u+x <file_name>).
It's also possible that the Docker image has to be updated giving 777 permissions to scripts because of how Condor handle Docker.


Publishing an App

The App, after being made public with (this step has to be performed by a tenant admin, so please contact them if you have a ready-to-publish application):

apps-pems-update -v -u <username> -p ALL <app_name>-<version>

can be found both in the DE, under Apps>High-Performance Computing, and in the CyVerseUk web interface. The App interface is automatically generated based on the submitted JSON file.


Running an App from command line

Finally, we can run our App! We'll need one more (short) JSON file to run a new job:

Expand source
  
{
  "name"    : "blasttest",
  "appId"   : "blastapp-tutorial-0.0.1",
  "archive" : "true",
  "inputs": {
    "query"   : "https://github.com/erikvdbergh/cyverseuk-util/raw/master/testquery.fa",
    "database": "https://github.com/erikvdbergh/cyverseuk-util/raw/master/testdb.fa"
  }
}

We'll save this file as RunApp.json and submit it as a job with the jobs-submit command:
jobs-submit -v -W -F RunApp.json The -W flag in this command tells it to keep watching the job in the current window, with can be stopped with Ctrl-C.
After your job has completed, your outputs, logs and error messages will be in a folder that is generated automatically on your apps storage system (which is the CyVerse data store in our case, but you can modify this at run time in a JSON field). To view them on the CyVerse data store, check the "archive" folder under your username. All your job output will be in a separate subfolder under the "jobs" folder.

Alternatively you will be able to run your jobs through one of the available web interfaces.

Known problems with the DE: not all the teams are building apps the same way, this led to some functionalities not being available for Agave apps. In particular you may define an input field as accepting multiple files, but the GUI will not allow for multiple file selection. The same appears to be happen with AgaveToGo as well. In this case you will have to submit a JSON via command line or use http://cyverseuk.herokuapp.com/ (only for apps hosted on the EI system).

Known problem with AgaveToGo: it looks like some disabled apps keeps showing up in the list (they can't be used though).