-
Notifications
You must be signed in to change notification settings - Fork 0
BLAST and extract the sequences
The option SeqsExtractor-blast-and-extract will perform a BLAST search using NCBI-BLAST+ and after that will extract all sequences that match in a specific percentage (specified by user) with the subject database.
EXAMPLE: After a BLAST run you can use the tabular BLAST format to extract from your query dataset only the sequences that match in a specific percentage of hits, like 100%.
USAGE:
Example: SeqsExtractor-blast-and-extract -i query.fa -o /home/user/test -b n -d mouse.preformated.blastdb.fa -p 90-100 -e 1e-20 -t 10 -a '-max_target_seqs 1'
Required arguments:
-i <string> | Query fasta
-o <string> | Output directory
-b <n/x> | Blast+ algorithm (n or x)
-d <string | Pre-formated Blast+ database
-p <string | Pct. of identity to_extract Sequences
Optional arguments:
-e <string> | Default 1e-3
-t <interger> | Default: all available threads
-a <string> | Blast+ optional parameters. E.g. '-max_target_seqs 1 -import_search_strategy filename' (Use between quotes!)
SeqsExtractor-blast-and-extract -i M.musculus_NCBI_entire_genome.fasta -o /home/user/test -b n -d Mus_musculus_uniprot_swisprot.fasta -p 90-100 -e 1e-20 -t 10 -a '-max_target_seqs 1'
makeblastdb -in name_of_your_database_to_BLAST.fasta -dbtype prot
makeblastdb -in name_of_your_database_to_BLAST.fasta -dbtype nucl
Enter the fasta file to be used as a query
-i /home/me/M.musculus_NCBI_entire_genome.fasta
Enter the fasta file to be used as a query
-o
Avaliable x or n
-b x
or
-b n
Here you need provided a blast+ preformated database
-d Mus_musculus_uniprot_swisprot.fasta
Now you can choose a specific percentage to extract your sequences. The all available options are provided bellow:
10 to get only the sequences that match with 10%
20 to get only the sequences that match with 20%
30 to get only the sequences that match with 30%
40 to get only the sequences that match with 40%
50 to get only the sequences that match with 50%
60 to get only the sequences that match with 60%
70 to get only the sequences that match with 70%
80 to get only the sequences that match with 80%
90 to get only the sequences that match with 90%
100 to get only the sequences that match with 100%
10-100 to get only the sequences that match with 10% to 100% of hits
20-100 to get only the sequences that match with 20% to 100% of hits
30-100 to get only the sequences that match with 30% to 100% of hits
40-100 to get only the sequences that match with 40% to 100% of hits
50-100 to get only the sequences that match with 50% to 100% of hits
60-100 to get only the sequences that match with 60% to 100% of hits
70-100 to get only the sequences that match with 70% to 100% of hits
80-100 to get only the sequences that match with 80% to 100% of hits
90-100 to get only the sequences that match with 90% to 100% of hits
Or type all to no filter and get all sequences the match in the blast search.
Example:
-p 90-100
Will extract the sequences that match 90% to 100% percent of identity
Example:
-e 1e-20
If you do not use this options it will use a default value (1e-3)
NOTE: In the linux Mint/Ubuntu environment the command nproc
shows the total number of threads available in the machine
Example:
-t 12
If you do not use this option SeqsExtractor automatically set the maximum number of cores of the machine
Here you can insert additional BLAST parameters separated by spaces and starting with dashes inside single quotes.
Example:
-a '-max_target_seqs 1 -num_descriptions 10'
- Home
- How to install
- How to run
- BLAST+ in Seqs-Extractor
- SSR analysis using Seqs-Extractor and MISA
- Extracting sequences from any .FASTA file using any TEXT file
- Extracting Trinity Differential Expression transcripts
- Example files of SeqsExtractor
- Frequent asked questions (FAQ) and known bugs
- Third party software that comes with Seqs Extractor
- References
- [DEPRECATED Options]