To see the commands:
$ python3 disorderly.py -h
Put your query sequences in FASTA format and put them in a file
Your database is made of sequences that you want to compare against. This is also in FASTA format, but we need to convert it to a .disorderdb database so it can be used to search against. Generate a .disorderdb file from your database using the following command:
$ python3 disorderly.py -v -fb path/to/your_database.fasta
-v Verbose flag
-fb Database FASTA file
This will generate your_database.fasta.disorderdb in the same folder as your_database.fasta
Each of your queries is compared only to sequences of the same length in the database. Once a same-length sequence is found, the Euclidean distance between the compositions of your query and the database sequence is computed. The output contains all the same-length sequences sorted by the Euclidean distance (low to high).
This search is distributed over all the available CPUs!
$ python3 disorderly.py -v -i path/to/query.fasta -db path/to/your_database.fasta.disorderdb
-i Your query sequences in FASTA
-db The converted .disorderdb database
This will generate a .csv with the same name as your query with a bit of additional stuff (i.e. for query.fasta, the result will be query_search-20180816190934-ABCD.csv). The -v verbose flag will tell you where your result is, which will be in the same directory as your query)
$ python3 disorderly.py -v -i query.fasta -fb your_database.fasta
The previous step-by-step instruction is meant to help you understand what is really going on.
The format is (sequence IDs are the FASTA headers):
Queries | Hits | Distances |
---|---|---|
query-seq-1 | database-seq-9 | 0.000 |
query-seq-1 | database-seq-5 | 0.135 |
query-seq-1 | database-seq-14 | 0.246 |
query-seq-2 | database-seq-3 | 0.000 |
query-seq-2 | database-seq-75 | 0.321 |
$ sbatch bash_run.sh -v -i query.fasta -fb your_database.fasta
NOTE: bash_run.sh must be in the same folder as disorderly.py
ALSO: It is currently configured to use the DPB partition and 24 cores (1 node on MEMEX). Edit the file with any editor to change this, i.e.:
#SBATCH -p dge # To use the DGE partition
#SBATCH -c 12 # for 12 cores