assembly_stats

A Python library that takes a FASTA file as input and calculates both scaffold and contig statistics (N50, L50, etc.) from a scaffold FASTA file. It does this by breaking each scaffold wherever there is more than one N and then calculating statistics for both the scaffolds and contigs.

This is a re-write of fasta_metadata_parser to speed up the old implementation, and -- most importantly -- to learn how to install Python scripts onto the Smithsonian HPC.

Installation

pip install assembly_stats

Usage

  $ assembly_stats -h

    usage: assembly_stats [-h] filename

    Calculate statistics about genome assemblies.

    positional arguments:
      filename    Genome file in FASTA format.

    optional arguments:
      -h, --help  show this help message and exit

After calculating the statistics for the genome assembly, they will be printed out in JSON format.

Hoow to Cite

The assembly_stats package has been added to Zenodo in order to receive a stable DOI for citation: "10.5281/zenodo.3968774".

Please go to the Zenodo page for assembly_stats to find style-specific formatting options: https://zenodo.org/record/3968774.

Next steps

Add ability to save NumPy sequence length arrays for further visualization, since generating these are what takes the most time.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assembly_stats		assembly_stats
tests		tests
.gitignore		.gitignore
LICENSE.md		LICENSE.md
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

assembly_stats

Installation

Usage

Hoow to Cite

Next steps

About

Releases 1

Packages

Languages

License

MikeTrizna/assembly_stats

Folders and files

Latest commit

History

Repository files navigation

assembly_stats

Installation

Usage

Hoow to Cite

Next steps

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages