Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create a signature by downloading and sketching a genome sequence #11

Open
ctb opened this issue May 12, 2022 · 0 comments
Open

create a signature by downloading and sketching a genome sequence #11

ctb opened this issue May 12, 2022 · 0 comments
Labels
fasta working with FASTA files genome analyzing genomes intro introductory examples

Comments

@ctb
Copy link
Contributor

ctb commented May 12, 2022

first, download a genome:

curl -JLO  https://osf.io/bjh2y/download

This will create a 1.4MB file GCF_000005845.2_ASM584v2_genomic.fna.gz containing an E. coli K-12 genome for strain MG1655 (see Genbank entry).

Next, calculate the signature using sourmash sketch dna:

sourmash sketch dna -p abund GCF_000005845.2_ASM584v2_genomic.fna.gz

here, the -p abund tells sourmash sketch to also retain the abundance (frequency) information for k-mers.

This will produce a signature file, GCF_000005845.2_ASM584v2_genomic.fna.gz.sig, that is much smaller than the original genome file (86k vs 1.4 MB).

You can view the metadata properties of this signature with sourmash sig describe:

sourmash sig describe GCF_000005845.2_ASM584v2_genomic.fna.gz.sig

This example was taken from Large scale sequence comparisons with sourmash, Pierce et al., 2019.

@ctb ctb changed the title creating a signature by downloading and sketching a genome sequence create a signature by downloading and sketching a genome sequence May 12, 2022
@ctb ctb added intro introductory examples fasta working with FASTA files genome analyzing genomes labels May 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fasta working with FASTA files genome analyzing genomes intro introductory examples
Projects
None yet
Development

No branches or pull requests

1 participant