Simple_Minimizer_Algorithmn

A Simple Minimizer Algorithm that matches reads based on shared minimizers, disregarding order and multiplicity.

Overview: It takes in two fasta files. The first file is the references and it is used to create a HashMap of the the minimal k-mers in reference sequences. The second file is the query and the program decides which sequence in the reference the query sequences are more likely the same as.

Detailed Steps

The program first reads the two genomes from the reference and creates a vector of minimizer kmers, each representing an l-sized window in the sequence, for each genome
- The kmers selected are the kmers with the minimal hash value in each l-sized window
The program then takes in the query fasta file and converts each query sequence into a vector of minimizer kmers
The vector is compared to vectors representing our reference genomes, and the amount of hits are calculated
- Remark: Hits at the current stage only counts matches of kmers, disregarding multiplicity and order of the kmers
The program then returns a string telling us for each query, which reference it is likely from, or if it is from none of them

Example Fasta files provided HepatitisC.fasta: The complete genome of genotype 1 and genotype 7 of Hepatitis C, obtained from NCBI query.fasta: Subsequences of the two genomes in HepatitisC.fasta, with slight modifications to the ends id1: from genotype 7 id2: from genotype 1 id3: from genotype 7 id4: a random sequence error_query.fasta: sequences from query.fasta, but with more errors introduced id5: id1 with substitutions {T,A, 4}, {C, T, 19}, and {C, G, 46} id6: id 2 with an insertion of CATCGT at position 10 id7: id2 with a deletion of 10 nucleotides at position 10

Possible Improvements:

Deal with what happens if there are multiple maximums
Write the results as a file
Use Enums to select for how we select for minimizers

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
HepatitisC.fasta		HepatitisC.fasta
README.md		README.md
error_query.fasta		error_query.fasta
query.fasta		query.fasta

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple_Minimizer_Algorithmn

About

Releases

Packages

Languages

AfZheng126/Simple_Minimizer_Algorithmn

Folders and files

Latest commit

History

Repository files navigation

Simple_Minimizer_Algorithmn

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages