Assembly and annotation pipeline of the whole genome of Zygogramma bicolorata, commonly known as the 'Parthenium beetle' or 'Mexican beetle'.
This repository has two primary agenda: one, a supplementary material to our research publication Sahoo et al 2023 on the beetle genome assembly; two, a guide to the non-bioinformaticians to get acquainted with the genome assembly pipeline.
If you find this pipeline useful, please consider citing our publication at Sahoo et al 2023. Thanks to Shivakumara Manu and Naveen Kumar Chandrakumaran for their contribution in code compilation.
To make sure that all the syntax used in the analysis pipeline are clear to the reader, few key notations need to be clarified:
~/path/to/dir
denotes path to the directory of interest.
~/path/to/dir/in
denotes path to the directory where the input file resides.
~/path/to/dir/out
denotes path to the directory where the output file will be placed.
./
indicates current directory from where the program of interest in executing.
The pipeline has a simplified route. Preassembly.md records the quality assessment steps to check and filter low quality reads and biological/methodological contaminants of minor interest from the raw sequence data. Once the raw data is cleaned, a series of methods in the Assembly.md stage is followed to establish the most probable connections among the reads forming contigs or scaffolds or chromosomes, if possible. Then, the assembled fragments are checked for quality and completeness using certain measurable parameters in the Postassembly.md stage.
Wish the best!
Note: This pipeline is not exhaustive in nature. Training programs at the Galaxy Project are comprehensive and resourceful. Readers may gain further insight into the pipeline by following those tutorials.