Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Documentation to deploy annotator #495

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
165 changes: 165 additions & 0 deletions AMAZON2023.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
# Bystro Annotator Installation Guide

### Table of Contents
1. [Installing the Bystro Perl Annotator](#installing-the-bystro-perl-annotator)
2. [Installing Bystro Python Libraries](#installing-bystro-python-libraries)
3. [Configuring the Bystro Annotator](#configuring-the-bystro-annotator)
4. [Databases](#databases)
5. [Running Your First Annotation](#running-your-first-annotation)
6. [EFS Joint Drive](#efs-joint-drive)
7. [FAQ](#faq)

## Installing the Bystro Perl Annotator
#### Amazon 2023

1. Clone and install the Bystro repository:
```sh
git clone https://github.com/bystrogenomics/bystro.git && cd bystro && source ./install-rpm.sh
```

2. Install dependencies:
```sh
cpanm --quiet https://github.com/bystrogenomics/msgpack-perl.git
cpanm --quiet --notest MouseX::Getopt
git clone --depth 1 --recurse-submodules https://github.com/salortiz/LMDB_File.git \
&& cd LMDB_File \
&& cpanm --quiet . \
&& cd .. \
&& rm -rf LMDB_File
cpanm --quiet DBD::mysql@4.051
# Ensure mysql-devel is installed as it's needed for mysql_config
```

## Installing Bystro Python Libraries
We recommend using Miniconda to manage Python dependencies for Bystro. After installing Conda, proceed with the following steps:

1. Install Rust:
```sh
curl https://sh.rustup.rs -sSf | sh -s -- -y
echo -e "\n### Bystro: Done installing Rust! Now sourcing .cargo/env for use in the current shell ###\n"
source "$HOME/.cargo/env"
```

2. Set up the Bystro environment:
```sh
source .initialize_conda_env.sh
```

3. To manage local operations:
- Run `make run-local` to install the Bystro library and Bystro-API CLI tool, and start a local Ray server.
- If you are also setting up the Bystro webapp and API server, run `make serve-local` to start queue listeners for annotation, ancestry, PRS, and proteomics.
- If you are planning to contribute to Bystro and need faster iteration time, use `make develop` or `make serve-dev` to create a Bystro wheel that is installed into your local environment.


## Configuring the Bystro annotator

Once Bystro is installed, it needs to be configured. The easiest step is choosing the species/assemblies to annotate.

1. Download the Bystro database for your species/assembly

- **Example:** hg38 (human reference GRCh38): `wget https://s3.amazonaws.com/bystro-db/hg38_v7.tar.gz`</strong>
- You need ~700GB of free space for hg38 and ~400GB of free space for hg19, including the space for the tar.gz archives

2. To install the database:

**Example:**

```shell
cd /mnt/annotator/
wget https://s3.amazonaws.com/bystro-db/hg38_v7.tar.gz
bgzip -d -c --threads 32 hg38_v7.tar.gz | tar xvf -
```

In this example the hg38 database would located in `/mnt/annotator/hg38`

3. Update the YAML configuration for the species/assembly to point to the database.

For human genome assemblies, we provide pre-configured hg19.yml and hg38.yml, which assume `/mnt/annotator/hg19_v9` and `/mnt/annotator/hg38_v7` database directories respectively.

If using a different mount point, different database folder name, or a different (or custom-built) database altogether,
you will need to update the `database_dir` property of the yaml config.
- Note for a custom database, you would also need to ensure the track `outputOrder` lists all tracks, and that each track has all desired `features` listed

For instance, using `yq` to can configure the `database_dir` and set `temp_dir` to have in-progress annotations written to local disk

```shell
yq write -i config/hg38.yml database_dir /mnt/my_fast_local_storage/hg38_v7
yq write -i config/hg38.yml temp_dir /mnt/my_fast_local_storage/tmp
```

## Databases:

1. Human (hg38): https://s3.amazonaws.com/bystro-db/hg38_v7.tar.gz
2. Human (hg19): https://s3.amazonaws.com/bystro-db/hg19_v9.tar.gz
3. There are no restrictions on species support, but we currently only build human genomes. Please create a GitHub issue if you would like us to support others.

## Running your first annotation

Ex: Runing hg38 annotation

```sh
bin/bystro-annotate.pl --config config/hg38.yml --in /path/in.vcf.gz --out /path/outPrefix --run_statistics [0,1] --compress
```

The outputs will be:

- Annotation (compressed, due to --compress flag): `outPrefix.annotation.tsv.gz`
- Annotation log: `outPrefix.log.txt`
- Statistics JSON file `outPrefix.statistics.json`
- Statistics tab-separated file: `outPrefix.statistics.tsv`
- Removing the `--run_statistics` flag will skip the generation of `outPrefix.statistics.*` files

## EFS JOINT DRIVE

- If you are installing this for the bystro api server integration, you need a shared drive jointly accessible to the bystro api server and the instance that is running

1. First create a joint drive, so go to amazon web services and create an EFS
2. Then create a root directory, for example `/seqant`
3. Finally mount the EFS to the root directory you created

```sh
sudo mount -t efs -o tls fs-xxxxxxxxxx:/ efs
```

NOTE: You might need to set the directory's permissions before use

## FAQ

This section outlines common issues encountered during the deployment of Bystro in Amazon 2023 and the solutions that addressed these issues at the time.

- **Perl Version Requirement**
- In order to install Lmbd_file, Perl 5.34.0+ must be installed.

- **Database Configuration Error**
- **Error:** [fatal] dbCleanUp LMDB error: 13 at /home/ec2-user/bystro/perl/lib/Seq/DBManager.pm line 1177.
- **Cause:** The `data_dir` in `config/hg19.yml` and `config/hg38.yml` might not be configured correctly, or the hg19 and hg38 databases were not downloaded and extracted.
- **Reference:** [Configuring the Bystro Annotator](#configuring-the-bystro-annotator).

- **Missing mysql_config**
- **Error:** Can't exec "mysql_config": No such file or directory at Makefile.PL line 89. Cannot find the file 'mysql_config'! Your execution PATH doesn't seem to contain the path to mysql_config.
- **Solution:**
```sh
sudo wget https://dev.mysql.com/get/mysql80-community-release-el9-1.noarch.rpm
sudo dnf install mysql80-community-release-el9-1.noarch.rpm -y
sudo rpm --import https://repo.mysql.com/RPM-GPG-KEY-mysql-2023
sudo rm mysql80-community-release-el9-1.noarch.rpm
sudo yum install mysql-devel -y;
```

- **PERLIO::gzip Installation Failure**
- **Error:**
```
PerlIO/gzip/gzip.bs 644
"/home/ec2-user/perl5/perlbrew/perls/perl-5.34.0/bin/perl" "/home/ec2-user/perl5/perlbrew/perls/perl-5.34.0/lib/5.34.0/ExtUtils/xsubpp" -typemap '/home/ec2-user/perl5/perlbrew/perls/perl-5.34.0/lib/5.34.0/ExtUtils/typemap' gzip.xs > gzip.xsc
mv gzip.xsc gzip.c
cc -c -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2 -O2 -DVERSION="0.20" -DXS_VERSION="0.20" -fPIC "-I/home/ec2-user/perl5/perlbrew/perls/perl-5.34.0/lib/5.34.0/x86_64-linux/CORE" gzip.c
gzip.xs:16:10: fatal error: zlib.h: No such file or directory
16 | #include <zlib.h>
| ^~~~~~~~
compilation terminated.
make: *** [Makefile:338: gzip.o] Error 1
```
- **Solution:** Install zlib-devel to resolve this and other related installation issues:
```sh
sudo yum install zlib-devel
```
2 changes: 1 addition & 1 deletion install-rpm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ fi
. ~/.bash_profile;

# Perlbrew simplifies version management
. ./install/install-perlbrew-linux.sh $INSTALL_DIR perl-5.30.1;
. ./install/install-perlbrew-linux.sh $INSTALL_DIR perl-5.34.0;
. ./install/install-perl-libs.sh;

. ~/.bash_profile;
Expand Down
4 changes: 4 additions & 0 deletions install/install-go-linux.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,16 @@ fi
echo -e "\n\nInstalling Go in /usr/local\n"

# Clean in case somethign left over from old installation
mkdir BYSTRO_GO_INSTALL && cd BYSTRO_GO_INSTALL

GOFILE=go1.21.4.linux-amd64.tar.gz
wget https://dl.google.com/go/$GOFILE;
tar -xf $GOFILE;
echo "Deleting go in /usr/local"
sudo rm -rf /usr/local/go
sudo mv go /usr/local;
rm $GOFILE;
cd ../
rm -rf BYSTRO_GO_INSTALL

. install/export-go-path-linux.sh $DIR $PROFILE
12 changes: 12 additions & 0 deletions install/install-rpm-deps.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

echo -e "\n\nInstalling Debian (rpm) dependencies\n";

sudo yum update -y;
sudo yum install gcc -y;
sudo yum install openssl -y;
sudo yum install openssl-devel -y;
Expand All @@ -11,6 +12,11 @@ sudo yum install git-all -y;
sudo yum install pigz -y;
sudo yum install unzip -y;
sudo yum install wget -y;
# Need to download and install mysql rpm to install mysql-devel
sudo wget https://dev.mysql.com/get/mysql80-community-release-el9-1.noarch.rpm
sudo dnf install mysql80-community-release-el9-1.noarch.rpm -y
sudo rpm --import https://repo.mysql.com/RPM-GPG-KEY-mysql-2023
sudo rm mysql80-community-release-el9-1.noarch.rpm
# For tests involving querying ucsc directly
sudo yum install mysql-devel -y;
# For Search::Elasticsearch::Client::5_0::Direct
Expand All @@ -33,5 +39,11 @@ sudo npm install -g pm2;

sudo yum install awscli -y;

# amazon-efs-utils is required to mount efs
sudo yum install amazon-efs-utils -y;

# for installing PerlIO::gzip, Net-SSLeay, Alien::gmake, Alien::LMDB
sudo yum install zlib-devel -y;

# pkg-config is required for building the wheel
sudo yum install -y pkg-config;
Loading