seqpull

Use a file of gene queries to pull similar sequences from short-read nucleotide datasets, assemble gene contigs, reverse complement, and trim sequences based on blast results.

Functions:

Pull gene sequences from datasets using a high throughput snakemake pipeline.
CAP3 assembly of pulled sequences (useful for short-read sequences, not long-read).
Autoreverse complements sequences based on blast results against the top gene queries.
Trims sequence ends based on blast results against the top gene queries.
- note: the more robust your query sequences (taxonomy and size), the more robust trimming will be.
Sequence size filter (default: <500bp sequences are removed).
Sequences and output files are automatically named based on input file names and gene file names.
Easy-to-parse directories and file names.
Useful logs.

Use examples:

Pull marker genes for organism identification (18S, 16S, COI).
Find virus-related genes in organism transcriptomes.

Quickstart

Python environment

Use the provided yaml file to create the seqpull environment

git clone https://github.com/SingleEukaryote/seqpull
cd seqpull/code
conda env create -f env/seqpull.yaml

Running the seqpull pipeline

Put fasta files you want to pull sequences from in the data/DNAinputs directory. (must end with .fas or .fasta to be recognized)
Put fasta files you want to use as gene queries in the data/gene_queries directory. (must end with .fas or .fasta to be recognized)
- Provided queries include 18S, 16S, and actin genes. Please curate your own queries depending on the target organisms and genes.
Move to code directory.
Do a dry run: snakemake -n
- Allows you to preview every step of the pipeline and every file being generated.
Run bash script: bash seqpull.sh
- Modify cores/threads first
- Optionally, use in a queue system.
From the terminal you can run:
- snakemake --cores 4 --keep-going > log.txt

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
code		code
data		data
.gitignore		.gitignore
README.md		README.md
seqpull_pipeline.png		seqpull_pipeline.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

seqpull

Quickstart

Python environment

Running the seqpull pipeline

Pipeline overview

About

Releases

Packages

Languages

TheBrownLab/seqpull

Folders and files

Latest commit

History

Repository files navigation

seqpull

Quickstart

Python environment

Running the seqpull pipeline

Pipeline overview

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages