A web application for viewing, filtering, and searching CRISPR annotation TSV data.
The MHcut Browser was developed by the Canadian Centre for Computational Genomics to serve data generated by the MHcut tool.
- David Lougheed (david.lougheed@mail.mcgill.ca)
- Wrote the original web application.
The MHcut browser, like the MHcut tool, is licensed under the GPLv3 license. Additional terms of use apply (see the website), particularly with regards to data access and usage.
MHcut is a tool by Grajcarek et al. used to identify "11 million
naturally-occurring deletion mutations flanked by microhomologies", which are
candidates for precise deletions using microhomology-mediated end-joining
repair.
The data from this project can be accessed using this browser, which is hosted online at https://mhcut-browser.genap.ca/.
Python 3.6 or later is needed to run the application server and database-
generating script. See the requirements.txt
file for required packages.
NPM version 6 or later is required to install front-end dependencies for the web application.
All the sub-steps outlined here are contained in a single script, setup.bash
,
which can be run with the following command in the root of the project
directory:
bash ./setup.bash
This script will prompt the user for a choice of web server software to be installed; either Apache 2 or NGINX can be chosen.
If some of the dependencies are already installed on the host machine, or if different install paths or installation methods are required, it is suggested that the installation steps are completed manually (see directly below).
Make sure the python3
, python3-dev
, and python3-pip
packages are
installed on the host system. They can be installed on a Debian-based system
with the following commands:
sudo apt update
sudo apt install python3 python3-dev python3-pip
A web server software is used to serve the application's static files and as a proxy for the web application itself. This project supports both the NGINX and Apache 2 server software, although it may be possible for others to work as well.
To install NGINX on a Debian-based system:
sudo apt install nginx
To install Apache 2 on a Debian-based system, and enable the required modules:
sudo apt install apache2 libapache2-mod-wsgi-py3
sudo a2enmod wsgi
sudo a2enmod proxy
sudo a2enmod proxy_http
sudo a2enmod rewrite
sudo systemctl restart apache2
MHcut Browser uses Postgres as a database software in order to efficiently perform complex queries on the data.
To install Postgres, use the following command:
sudo apt install postgresql postgresql-contrib
Install NPM via your method of choice (caution; the version in Aptitude repositories is often out of date) and make sure it is updated to at least version 6.
Using external NodeJS Aptitude repositories, this can be done with the following commands:
curl -sL https://deb.nodesource.com/setup_10.x | sudo -E bash -
sudo apt install nodejs
sudo npm install -g npm # Update NPM to the latest version
npm -v # The version should be at least 6.x.x
Create a Python 3 virtual environment in the main project directory with the following commands, ran from the root project directory:
sudo -H pip3 install virtualenv # If not done so already, install Python virtualenv
virtualenv -p python3 ./env
source env/bin/activate
pip install -r requirements.txt
Install all dependencies required for the web application with the following commands, starting from the root project directory:
cd web
npm install
cd ..
First, create a Postgres user for the application to use in order to access the
data by opening a new psql
session and issuing a CREATE ROLE
command,
specifying the database role's password:
sudo -u postgres psql
CREATE ROLE mhcut LOGIN PASSWORD 'some_password';
This will prompt the user for a username (for example, one could enter
mhcut
) and whether the new user should be a super user (it should not).
Then, create two databases (one for the Cas dataset, and one for the xCas
dataset) in the psql
session and enable the trigram extension on both, for
indexing purposes (\q
exits the session):
CREATE DATABASE mhcut_db WITH OWNER mhcut; -- For the Cas dataset
CREATE DATABASE mhcut_db_2 WITH OWNER mhcut; -- For the xCas dataset
\c mhcut_db
CREATE EXTENSION IF NOT EXISTS pg_trgm;
\q
Finally, edit the pg_hba.conf
file (usually found in the
/etc/postgresql/11/main/
directory), adding the following line, and
restart the database:
Before:
local all postgres peer
# TYPE DATABASE USER ADDRESS METHOD
After:
local all postgres peer
# TYPE DATABASE USER ADDRESS METHOD
local all mhcut md5
Restart:
sudo systemctl restart postgresql
To build the databases, run the tsv_to_postgres.py
script twice as follows,
passing in the desired TSV files to convert as arguments, as well as the name
of the created databases and database user:
python ./tsv_to_postgres.py variants.tsv guides.tsv cartoons.tsv mhcut_db mhcut
python ./tsv_to_postgres.py variants_xCas.tsv guides_xCas.tsv cartoons_xCas.tsv mhcut_db_2 mhcut
This will prompt the user for the database user's password before building the MHcut Browser databases.
Warning: The database construction process will take quite a while (~30 minutes per database). The resulting databases are typically around 20-60 gigabytes each.
If not already in the virtual environment, activate it by running the following command from within the root project directory:
source env/bin/activate
Either alter the default configuration file (for localhost
) or create a new
virtual host specifically for the application and register a local domain name
in the /etc/hosts
file on the development machine.
Example configuration file, located in /etc/nginx/sites-available/
:
server {
listen 80;
root /path/to/mhcut/browser/web;
index index.html index.htm;
server_name your-domain-name.local;
location /api/ {
proxy_pass http://localhost:5000/;
}
location / {
try_files $uri $uri/ /index.html;
}
}
If a new virtual host configuration file has been created, enable the virtual
host by running the following commands, replacing conf-name
with the name of
the newly-created configuration file, and restarting NGINX:
ln -s /etc/nginx/sites-available/conf-name /etc/nginx/sites-enabled/conf-name
nginx -t # Optional, tests the configuration and informs the user if it's OK
sudo systemctl restart nginx
Either alter the default configuration file (for localhost
) or create a new
virtual host specifically for the application and register a local domain name
in the /etc/hosts
file on the development machine.
Example configuration file, located in /etc/apache2/sites-available/
:
<VirtualHost *:80>
ServerName your-domain-name.local
ServerAdmin webmaster@localhost
DocumentRoot /path/to/mhcut/browser/web
ErrorLog ${APACHE_LOG_DIR}/error.log
CustomLog ${APACHE_LOG_DIR}/access.log combined
<Directory /path/to/mhcut/browser/web>
Require all granted
RewriteEngine on
# Don't rewrite files or directories
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^ - [L]
# Rewrite everything else to index.html to allow html5 state links
RewriteRule ^ index.html [L]
</Directory>
ProxyPass /api/ http://127.0.0.1:5000/
ProxyPassReverse /api/ http://127.0.0.1:5000/
</VirtualHost>
If a new virtual host configuration file (for example mhcut.conf
)
has been created, enable the virtual host by running the following commands,
replacing mhcut
with the name of the newly-created configuration
file, and restarting Apache:
sudo a2ensite mhcut
sudo systemctl restart apache2
In this case, the development-environment bundle is built, to aid in debugging. This should be ran from the main project directory.
cd web
npm run build-dev
cd ..
If continuous development is being done on the JavaScript parts of the web
application, the webpack watcher (which continuously compiles the code as it is
changed) can be started with the following command, within the web
directory:
npm run watch
This will run continuously, waiting for changes in the web/src
directory,
until it is killed with Ctrl-c
or similar.
Run the following commands from the root project directory to start the development server:
export FLASK_APP=application.py
export FLASK_ENV=development
DB_NAME=mhcut_db DB_USER=mhcut DB_PASSWORD=your_db_password flask run
To check if the server is running, visit localhost:5000. The URL should give a JSON response with some of the entries from the database.
If it is not running, check the terminal in which the server is running for any possible error messages.
In production, the MHcut Browser web application is designed to be deployed with uWSGI and NGINX as a systemd service.
See Step 0 for details.
In production, it is recommended to create a new virtual host specifically for the application.
Example configuration, located in /etc/nginx/sites-available/
:
server {
listen 80;
root /path/to/mhcut/browser/web;
index index.html index.htm;
server_name your-domain-name.com;
location /api/ {
include uwsgi_params;
uwsgi_pass unix:/path/to/mhcut/browser/mcb.sock;
uwsgi_param SCRIPT_NAME /api/;
}
location / {
try_files $uri $uri/ /index.html;
}
}
If a new virtual host configuration file has been created, enable the virtual
host by running the following command, replacing conf-name
with the name of
the newly-created configuration file, and restarting NGINX:
ln -s /etc/nginx/sites-available/conf-name /etc/nginx/sites-enabled/conf-name
nginx -t # Optional, tests the configuration and informs the user if it's OK
sudo systemctl restart nginx
In production, it is recommended to create a new virtual host specifically for the application.
Example configuration, located in /etc/apache2/sites-available/
:
<VirtualHost *:80>
ServerName your-domain-name.com
ServerAdmin webmaster@localhost
DocumentRoot /path/to/mhcut/browser/web
ErrorLog ${APACHE_LOG_DIR}/error.log
CustomLog ${APACHE_LOG_DIR}/access.log combined
<Directory /path/to/mhcut/browser/web>
Require all granted
RewriteEngine on
# Don't rewrite files or directories
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^ - [L]
# Rewrite everything else to index.html to allow html5 state links
RewriteRule ^ index.html [L]
</Directory>
<Directory /path/to/mhcut/browser>
<Files wsgi.py>
Require all granted
</Files>
</Directory>
WSGIDaemonProcess mcb python-home=/path/to/mhcut/browser/env python-path=/path/to/mhcut/browser
WSGIProcessGroup mcb
WSGIScriptAlias /api /path/to/mhcut/browser/wsgi.py
</VirtualHost>
If a new virtual host configuration file (for example mhcut.conf
)
has been created, enable the virtual host by running the following commands,
replacing mhcut
with the name of the newly-created configuration
file, and restarting Apache:
sudo a2ensite mhcut
sudo systemctl restart apache2
In order to run the server in either production environment, the wsgi.py
file must be edited to contain the production database name and credentials.
Change the following variables to contain the production values, within the
quotes:
os.environ["DB_NAME"] = "your_production_db_name"
os.environ["DB_USER"] = "your_production_db_user"
os.environ["DB_PASSWORD"] = "your_production_db_password"
os.environ["GMAIL_SENDER_EMAIL"] = "your_production_gmail_sender_email"
os.environ["GMAIL_SENDER_PASSWORD"] = "your_production_gmail_sender_password"
os.environ["BUG_REPORT_EMAIL"] = "your_production_bug_report_email"
Restart Apache with the following command:
sudo systemctl restart apache2
In this case, the production-environment bundle is built. This must be run from the product directory.
cd static
npm install
npm run build
cd ..
First, edit the example mcb.example.service
file to match the paths the
application is being served out of and set a correct service user.
Then, copy the example mcb.example.service
file to the systemd
services
folder as follows, and change any paths and user specifications in the service:
sudo cp ./mcb.example.service /etc/systemd/system/mcb.service
Finally, start the service and enable it to start at boot time:
sudo systemctl start mcb
sudo systemctl enable mcb
To check if the software is running, visit localhost:5000. The URL should give a JSON response with some of the entries from the database.
If it is not running, make sure the service started correctly with the following command:
sudo systemctl status mcb
If a 502
HTTP error appears, check if the socket path is correct in the NGINX
configuration and the service user is correct in the systemd service file.