-
Notifications
You must be signed in to change notification settings - Fork 1
Running Biorels
At this stage, you should have:
-
Set up the environment and source setenv.sh
-
Created the container
-
Created the database and the tables
-
Configured the global variables and the data sources you wish to process.
Great! You are ready to run biorels!!!!
We have currently two ways to run BioRels: Either via an SGE Cluster or using a single CPU.
Note
Make sure you have configured the SGE cluster configuration in setenv.sh and sourced it before running BioRels
To do so, nothing’s easier, just run the following command:
biorels_monitor
Note
Biorels_monitor is an alias to biorels_php $TG_DIR/BACKEND/SCRIPT/monitor_job.php
This will trigger the master job that will look at each job, check if a job meets all criteria and submit it.
Thank you for using Biorels – We hope you enjoy it!
You don’t have a SGE Cluster available or you just want to test it without additional headache, that’s alright. In this case, you will have to go to $TG_DIR/BACKEND/INSTALL. In that directory, there is a script called gen_single_script.php. It will generate a list of shell commands executing the different scripts.
Warning
You will need php and postgresql tool installed locally on your machine. The Postgres database can be located elsewhere.
cd $TG_DIR/BACKEND/INSTALL
php gen_single_script.php \> ../commands.sh
And to execute that script:
cd \$TG_DIR/BACKEND/
sh commands.sh
gen_single_script.php will look at your CONFIG_USER file and the dependencies between the different scripts to generate a list of scripts in the order they are supposed to run. When executing the commands.sh script, it will execute the individual script in their corresponding order, one at the time.
BioRels is constantly being improved upon and we will work hard to make it fully compatible with single CPU run. However, there are currently some limitations associated with this configuration.
-
The commands.sh will needs to be manually triggered
-
If a script fails for any reason, the next script will still be triggered, resulting in undefined behavior.
-
In the case of parallelized jobs, nothing will be triggered.
A lot will be happening at the beginning since everything needs to be loaded in the database. You will have a few ways to monitor what’s happening.
Biorels_monitor is the main script in charge of running the different scripts. It provides an output that will look like this image below:
Column | Description |
---|---|
A | Header |
B | Timestamp |
C | List of failed jobs. Biorels Job ID followed by job name |
D | Currently running jobs, with their SGE Job ID, BioRels Job ID and job name |
E | Summary of job running & ended. |
Latest job run: | |
F | BioRels Job ID |
G | Job Name |
H | Working directory in [PRIVATE_]PROCESS/[JOB_DIR]/ |
I |
Date that job was processed and generated new data. A script that checks for a new release and didn’t find any will update run_date and last_check_date but not processed_date. |
J | Date that job was ready to run |
K | Date the job was run |
L | Run time |
M | T: Job run successfully. F: failed |
N | Cause of the job failing |
The output will show the list of currently running jobs, failed jobs and latest job run. However, it does not provide why a job has been triggered or not.
Monitor_status files are located in $TG_DIR/BACKEND/LOG. When running biorels_monitor, the script will generate up to 20 versions of that file, one for each iteration of biorels_monitoring checks. Therefore, MONITOR_STATUS_20 will represent a snapshot of the decision making 20 iterations ago.
Column | Description |
---|---|
A | Dependency Level |
B | BioRels JOB ID |
C | Job name |
D | Workding directory. -1 if never run |
E | For level 1, which are triggered on a time basis, shows next submission |
Dependency level 2 and above: | |
F | Dependency level |
G |
Ruleset. Db_gene requires: Dl_gene and wh_taxonomy to have a newer date than db_gene date OR Pp_uniprot OR dl_chembl to have a newer date than db_gene date To be triggered. |
H | Wh_go as dependency level 3 required db_pubmed, which is already up to date and ck_go_rel. Therefore wh_go is waiting on ck_go_rel to run successfully. |
I | Wh_reactome required 4 data sources. 3 are up to date but it is waiting on ck_reactome_rel |
To disable a script, you will need to:
- open $TG_DIR/BACKEND/CONFIG/CONFIG_USER
- locate the script of interest
- replace T by F. (this section is tab delimited)
If you are using an SGE_CLUSTER, you will have to stop & restart the job monitoring. If you are running on a single CPU, you will have to regenerate the commands.sh file
To disable a script, you will need to:
- open $TG_DIR/BACKEND/CONFIG/CONFIG_USER
- locate the script of interest
- replace T by F. (this section is tab delimited)
If you are using an SGE_CLUSTER, you will have to stop & restart the job monitoring. If you are running on a single CPU, you will have to regenerate the commands.sh file
To disable a data source, you have the possible to either disable all scripts of that data source in CONFIG_USER (see section above) OR run prep_config_job.php script again, located in BACKEND/INSTALL/. For the latter, simply do not provide that data source in the list of data sources to process.
To enable a data source, it is strongly recommended to run prep_config_job.php script again, located in BACKEND/INSTALL. The main reason is to ensure that all dependencies are also enabled.
If a job has failed for some network issues for instance and you wish to run it again, you have two options:
- Run it manually:
biorels_php SCRIPT_PATH/SCRIPT_NAME
. This will run your process and the job monitoring will automatically trigger the next scripts if any. - Modify the database:
biorels_sql
to run psql and connect to your database Run the following query:UPDATE [schema_name].biorels_timestamp set processed_date=null, last_check_date=null where job_name='[YOUR_JOB_NAME]'