Skip to content

Running Biorels

jdesaphy edited this page Oct 21, 2024 · 1 revision

At this stage, you should have:

  • Set up the environment and source setenv.sh

  • Created the container

  • Created the database and the tables

  • Configured the global variables and the data sources you wish to process.

Great! You are ready to run biorels!!!!

We have currently two ways to run BioRels: Either via an SGE Cluster or using a single CPU.

Running BioRels

Running with SGE Cluster:

Note

Make sure you have configured the SGE cluster configuration in setenv.sh and sourced it before running BioRels

To do so, nothing’s easier, just run the following command:

biorels_monitor

Note

Biorels_monitor is an alias to biorels_php $TG_DIR/BACKEND/SCRIPT/monitor_job.php

This will trigger the master job that will look at each job, check if a job meets all criteria and submit it.

Thank you for using Biorels – We hope you enjoy it!

Running on a single CPU – Experimental

You don’t have a SGE Cluster available or you just want to test it without additional headache, that’s alright. In this case, you will have to go to $TG_DIR/BACKEND/INSTALL. In that directory, there is a script called gen_single_script.php. It will generate a list of shell commands executing the different scripts.

Warning

You will need php and postgresql tool installed locally on your machine. The Postgres database can be located elsewhere.

cd $TG_DIR/BACKEND/INSTALL

php gen_single_script.php \> ../commands.sh

And to execute that script:

cd \$TG_DIR/BACKEND/

sh commands.sh

gen_single_script.php will look at your CONFIG_USER file and the dependencies between the different scripts to generate a list of scripts in the order they are supposed to run. When executing the commands.sh script, it will execute the individual script in their corresponding order, one at the time.

Limitation of single CPU:

BioRels is constantly being improved upon and we will work hard to make it fully compatible with single CPU run. However, there are currently some limitations associated with this configuration.

  • The commands.sh will needs to be manually triggered

  • If a script fails for any reason, the next script will still be triggered, resulting in undefined behavior.

  • In the case of parallelized jobs, nothing will be triggered.

Monitoring BioRels

A lot will be happening at the beginning since everything needs to be loaded in the database. You will have a few ways to monitor what’s happening.

Biorels_monitor output:

Biorels_monitor is the main script in charge of running the different scripts. It provides an output that will look like this image below:

image

Column Description
A Header
B Timestamp
C List of failed jobs. Biorels Job ID followed by job name
D Currently running jobs, with their SGE Job ID, BioRels Job ID and job name
E Summary of job running & ended.
Latest job run:
F BioRels Job ID
G Job Name
H Working directory in [PRIVATE_]PROCESS/[JOB_DIR]/
I

Date that job was processed and generated new data.

A script that checks for a new release and didn’t find any will update run_date and last_check_date but not processed_date.

J Date that job was ready to run
K Date the job was run
L Run time
M T: Job run successfully. F: failed
N Cause of the job failing

The output will show the list of currently running jobs, failed jobs and latest job run. However, it does not provide why a job has been triggered or not.

MONITOR_STATUS

Monitor_status files are located in $TG_DIR/BACKEND/LOG. When running biorels_monitor, the script will generate up to 20 versions of that file, one for each iteration of biorels_monitoring checks. Therefore, MONITOR_STATUS_20 will represent a snapshot of the decision making 20 iterations ago.

image

Column Description
A Dependency Level
B BioRels JOB ID
C Job name
D Workding directory. -1 if never run
E For level 1, which are triggered on a time basis, shows next submission
Dependency level 2 and above:
F Dependency level
G

Ruleset. Db_gene requires:

Dl_gene and wh_taxonomy to have a newer date than db_gene date

OR

Pp_uniprot OR dl_chembl to have a newer date than db_gene date

To be triggered.

H Wh_go as dependency level 3 required db_pubmed, which is already up to date and ck_go_rel. Therefore wh_go is waiting on ck_go_rel to run successfully.
I Wh_reactome required 4 data sources. 3 are up to date but it is waiting on ck_reactome_rel

Manual changes

Disable a script

To disable a script, you will need to:

  • open $TG_DIR/BACKEND/CONFIG/CONFIG_USER
  • locate the script of interest
  • replace T by F. (this section is tab delimited)

If you are using an SGE_CLUSTER, you will have to stop & restart the job monitoring. If you are running on a single CPU, you will have to regenerate the commands.sh file

Enable a script

To disable a script, you will need to:

  • open $TG_DIR/BACKEND/CONFIG/CONFIG_USER
  • locate the script of interest
  • replace T by F. (this section is tab delimited)

If you are using an SGE_CLUSTER, you will have to stop & restart the job monitoring. If you are running on a single CPU, you will have to regenerate the commands.sh file

Disable a data source

To disable a data source, you have the possible to either disable all scripts of that data source in CONFIG_USER (see section above) OR run prep_config_job.php script again, located in BACKEND/INSTALL/. For the latter, simply do not provide that data source in the list of data sources to process.

Enable a data source

To enable a data source, it is strongly recommended to run prep_config_job.php script again, located in BACKEND/INSTALL. The main reason is to ensure that all dependencies are also enabled.

Reset a job

If a job has failed for some network issues for instance and you wish to run it again, you have two options:

  • Run it manually: biorels_php SCRIPT_PATH/SCRIPT_NAME. This will run your process and the job monitoring will automatically trigger the next scripts if any.
  • Modify the database: biorels_sql to run psql and connect to your database Run the following query: UPDATE [schema_name].biorels_timestamp set processed_date=null, last_check_date=null where job_name='[YOUR_JOB_NAME]'