Skip to content

Commit

Permalink
updated README
Browse files Browse the repository at this point in the history
  • Loading branch information
kausmees committed Jan 22, 2018
1 parent 770c840 commit d2318eb
Showing 1 changed file with 61 additions and 54 deletions.
115 changes: 61 additions & 54 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,46 +33,52 @@ WORKER_NODES denote the hostnames of the machines running filtering workers

It is assumed the resources are configured so that the WORKER_NODES can can resolve the hostname of MASTER_NODE and STORAGE_NODE and reach them via a network.

BAMSI should work on any cloud provider supporting Ubuntu VMs. You will need Python, and the following packages:

* java
* git
* pip
* rabbitmq-server(only on MASTER_NODE)
* SAMtools
BAMSI should work on any cloud provider supporting Ubuntu VMs.

### Setup and configuration ####

Examples will be shown for Ubuntu using apt.

```
sudo apt-get update
sudo apt-get install default-jre
sudo apt-get install git
sudo apt-get install python-pip
sudo apt-get install python-dev
sudo apt-get install zlib1g-dev
sudo apt-get install samtools
```
1. Initial requirements:

* python 2
* java
* git
* pip

```
$ sudo apt-get update
$ sudo apt-get install default-jre
$ sudo apt-get install git
$ sudo apt-get install python-pip
$ sudo apt-get install python-dev
$ sudo apt-get install zlib1g-dev
```


### Setup and configuration ####
2. Clone the BAMSI git repository

1. Clone the BAMSI git repository
```
$ git clone https://github.com/NGDSG/BAMSI.git
```
2. Change the directory, and install python packages with pip

3. Change the directory, and install python packages with pip

```
$ cd BAMSI
$ sudo apt-get install python-pip
$ sudo pip install -r requirements.txt
```
3. Install and start the Celery broker - RabbitMQ (Only on MASTER_NODE)


4. On all WORKER_NODES: install SAMtools version 1.x

See http://www.htslib.org/download/ for download instructions.


5. Only on MASTER_NODE: install and start the Celery broker - RabbitMQ

(See https://www.rabbitmq.com/download.html for instructions)

Example (on ubuntu using apt)
```
$ sudo apt-get install rabbitmq-server
```
Expand All @@ -89,50 +95,50 @@ sudo apt-get install samtools



4. On the MASTER_NODE and all the WORKER_NODES: modify the configuration file config.cfg according to your setup:

Example configuration file:
6. On the MASTER_NODE and all the WORKER_NODES: modify the configuration file config.cfg according to your setup:

```
[bamsi]
CELERY_BROKER_URL = amqp://celery:stalk@MASTER_NODE/celery-host
CELERY_RESULT_BACKEND = amqp
DATA_PATH=ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/{individual}/alignment/{filename}
MASTER_IP=MASTER_NODE
MASTER_PORT=8888
Example configuration file:

[storage]
WEBHDFS_IP=STORAGE_NODE
WEBHDFS_PORT=50070
WEBHDFS_PUBLIC_IP=STORAGE_NODE
WEBHDFS_PUBLIC_PORT=14000
WEBHDFS_USER=ubuntu
RESULTS_PATH=/filtered/
```
```
[bamsi]
CELERY_BROKER_URL = amqp://celery:stalk@MASTER_NODE/celery-host
CELERY_RESULT_BACKEND = amqp
DATA_PATH=ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/{individual}/alignment/{filename}
MASTER_IP=MASTER_NODE
MASTER_PORT=8888
[storage]
WEBHDFS_IP=STORAGE_NODE
WEBHDFS_PORT=50070
WEBHDFS_PUBLIC_IP=STORAGE_NODE
WEBHDFS_PUBLIC_PORT=14000
WEBHDFS_USER=ubuntu
RESULTS_PATH=/filtered/
```

The settings in the section 'bamsi' are mandatory. They define the location and protocol of the celery host, and define the location from which
each WORKER_NODE streams the data. To use the public Amazon S3 mirror, use:
The settings in the section 'bamsi' are mandatory. They define the location and protocol of the celery host, and define the location from which
each WORKER_NODE streams the data. To use the public Amazon S3 mirror, use:

```
DATA_PATH=http://s3.amazonaws.com/1000genomes/phase3/data/{individual}/alignment/{filename}
```
```
DATA_PATH=http://s3.amazonaws.com/1000genomes/phase3/data/{individual}/alignment/{filename}
```



The settings in the section 'storage' relate to the storage repository and can be changed accordingly when adding support for other systems.
Adding a different type of storage repository requires writing a class that implements the interface defined by the class StorageRepositoryBase in tapp.py.
If no other functionality is required, creating such a class and changing the line
The settings in the section 'storage' relate to the storage repository and can be changed accordingly when adding support for other systems.
Adding a different type of storage repository requires writing a class that implements the interface defined by the class StorageRepositoryBase in tapp.py.
If no other functionality is required, creating such a class and changing the line

StorageRepository = HDFS()
StorageRepository = HDFS()

in tapp.py to initialize your new class instead, and modifying the config file accordingly, should suffice.
in tapp.py to initialize your new class instead, and modifying the config file accordingly, should suffice.



For the current HDFS implementation it is assumed that the namenode is listening on STORAGE_NODE:50070 and
that a HttpFS server is listening on STORAGE_NODE:14000. These may have to be modified depending on your setup of HDFS.
(E.g. if the WORKER_NODE is on the same network as the HDFS datanodes, then they can push results to the storage via the namenode on port 50070.
Otherwise, they should interact with HDFS via the HttpFS server, and WEBHDFS_PORT should also be 14000.)
For the current HDFS implementation it is assumed that the namenode is listening on STORAGE_NODE:50070 and
that a HttpFS server is listening on STORAGE_NODE:14000. These may have to be modified depending on your setup of HDFS.
(E.g. if the WORKER_NODE is on the same network as the HDFS datanodes, then they can push results to the storage via the namenode on port 50070.
Otherwise, they should interact with HDFS via the HttpFS server, and WEBHDFS_PORT should also be 14000.)


### Starting the application ####
Expand Down Expand Up @@ -167,3 +173,4 @@ There is also a Python API (https://github.com/NGDSG/BAMSI-API) that is possible



contact: kristiina.ausmees@it.uu.se

0 comments on commit d2318eb

Please sign in to comment.