Bioinformatics-toolspy Code Documentation

Introduction

This document provides documentation for the bioinformatics code snippets provided. These snippets cover various bioinformatics algorithms and functions for sequence analysis, alignment, and motif finding.

Functions

`levenshtein_distance(string1, string2)`

Calculates the Levenshtein distance between two input strings.

Example:

import bioinformatics
print(bioinformatics.levenshtein_distance("kitten","sitting"))  # Output: 3

`sequence_alignment(sequence1, sequence2, gap_penalty, mismatch_penalty)`

Performs sequence alignment between two sequences with specified gap and mismatch penalties.

Example:

import bioinformatics
print(bioinformatics.sequence_alignment("AGGGCT", "AGGCA", 3, 2))  # Output: (3, 'AGGGCT-', '-AGGCA')

`longest_common_subsequences(sequence_list)`

Finds the longest common subsequence among a list of input sequences.

Example:

import bioinformatics
print(bioinformatics.longest_common_subsequences(["ACCGAAGG","ACCGAACC","CCACCGAAGG","GGACCGAACC"]))  # Output: 'ACCGA'

`longest_common_subsequence(sequence1, sequence2)`

Finds the longest common subsequence between two input sequences.

Example:

import bioinformatics
print(bioinformatics.longest_common_subsequence("ACCGAAGG", "ACCGAACC"))  # Output: 'ACCGAA'

`commun_patterns(pattern_list)`

Finds common patterns among a list of input patterns.

Example:

import bioinformatics
print(bioinformatics.commun_patterns(["XAaXV","XAsXV","XAcXV"]))  # Output: 'XAXV'

`reconstruct_from_kmers(k, kmers)`

Reconstructs a string from a collection of k-mers.

Example:

import bioinformatics
print(bioinformatics.reconstruct_from_kmers(3,["AAT","ATG", "TGC", "GCT", "CTA"]))  # Output: 'AATGCTA'

`translate_rna_to_aminoacid(rna_sequence)`

Translates an RNA sequence into an amino acid sequence.

Example:

import bioinformatics
print(bioinformatics.translate_rna_to_aminoacid("AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA"))  # Output: 'MAAMGSST*'

`de_bruijn_collection(kmers, prefix_func, suffix_func)`

Constructs a de Bruijn graph from a collection of k-mers.

Example:

import bioinformatics
print(bioinformatics.de_bruijn_collection(["ATG", "ATG", "TGT", "TGG", "CAT", "GGA", "GAT", "AGA"], lambda kmer: kmer[:-1], lambda kmer: kmer[1:]))  # Output: {'AT': ['TG', 'TG', 'TG'], 'TG': ['GT', 'GG'], 'GT': ['TG'], 'GG': ['GA'], 'GA': ['AT'], 'CA': ['AT']}

`find_eulerian_cycle(graph)`

Finds an Eulerian cycle in a graph.

Example:

import bioinformatics
print(bioinformatics.find_eulerian_cycle({"AAT": ["ATG"],"ATG": ["TGC"],"TGC": ["GCT"],"GCT": ["CTA"],"CTA": ["TAC"],"TAC": ["ACG"],"ACG": ["CGA"],"CGA": ["GAT"],"GAT": ["ATG"]}))  # Output: ['AAT', 'ATG', 'TGC', 'GCT', 'CTA', 'TAC', 'ACG', 'CGA', 'GAT', 'ATG']

`de_bruijn(k, sequence)`

Constructs a de Bruijn graph from a sequence.

Example:

import bioinformatics
print(bioinformatics.de_bruijn(3, "ATGATCAAG"))  # Output: {'ATG': ['TGA'], 'TGA': ['GAT'], 'GAT': ['ATC'], 'ATC': ['TCA'], 'TCA': ['CAA', 'CAG'], 'CAA': ['AAG'], 'AAG': ['AG']}

`grph_kmers(kmers)`

Constructs a graph from a collection of k-mers.

Example:

import bioinformatics
print(bioinformatics.grph_kmers(["ACCGA", "CCGAA", "CGAAG", "GAAGC", "AAGCT"]))  # Output: {'ACCGA': ['CCGAA'], 'CCGAA': ['CGAAG'], 'CGAAG': ['GAAGC'], 'GAAGC': ['AAGCT']}

`reconstruct(kmers)`

Reconstructs a string from a collection of overlapping k-mers.

Example:

import bioinformatics
print(bioinformatics.reconstruct(["ACCGA", "CCGAA", "CGAAG", "GAAGC", "AAGCT"]))  # Output: 'ACCGAAGCT'

`kmer_composition(k, sequence)`

Finds the k-mer composition of a sequence.

Example:

import bioinformatics
print(bioinformatics.kmer_composition(2,'CAATCCAAC'))  # Output: ['CA', 'AA', 'AT', 'TT', 'TC', 'CC', 'CA', 'AA', 'AC']

`distance_between_pattern_and_strings(pattern, string_list)`

Calculates the total Hamming distance between a pattern and a list of strings.

Example:

import bioinformatics
print(bioinformatics.distance_between_pattern_and_strings("AA",['AAATTGACGCAT','GACGAAAAACGTT','CGTCAGCGCCTG''GCTGAGCAAAGG','AGTACGGGACAG']))  # Output: 14

`gibbs(k, t, n, dna, iterations)`

Performs Gibbs sampling for motif finding.

Example:

import bioinformatics
print(bioinformatics.gibbs(4, 5, 10,["GGCGTTCAGGCA", "AAGAATCAGTCA", "CAAGGAGTTCGC", "CACGTCAATCAC", "CAATAATATTCG"],1000))  # Output: ['TCAG', 'TCAG', 'TCAG', 'TCAG', 'TCAG']

`randomized_motif_search(k, t, dna, iterations)`

Performs randomized motif search for motif finding.

Example:

import bioinformatics
print(bioinformatics.randomized

_motif_search(3,5,["GGCGTTCAGGCA", "AAGAATCAGTCA", "CAAGGAGTTCGC", "CACGTCAATCAC", "CAATAATATTCG"],1))  # Output: ['TTC', 'TTC', 'TTC', 'TTC', 'TTC']

`greedy_motif_search(k, t, dna)`

Performs greedy motif search for motif finding.

Example:

import bioinformatics
print(bioinformatics.greedy_motif_search(3,5,["GGCGTTCAGGCA", "AAGAATCAGTCA", "CAAGGAGTTCGC", "CACGTCAATCAC", "CAATAATATTCG"]))  # Output: ['TTC', 'TTC', 'TTC', 'TTC', 'TTC']

`most_probable(dna_string, k, profile_matrix)`

Finds the most probable k-mer in a DNA string given a profile matrix.

Example:

import bioinformatics
print(bioinformatics.most_probable("ACCTGTTTATTGCCTAAGTTCCGAACAAACCCAATATAGCCCGAGGGCCT",5,[[0.2, 0.2, 0.3, 0.2, 0.3],[0.4, 0.3, 0.1, 0.5, 0.1],[0.3, 0.3, 0.5, 0.2, 0.4],[0.1, 0.2, 0.1, 0.1, 0.2]]))  # Output: 'CCGAG'

`median_string(dna, k)`

Finds the median string in a collection of DNA strings.

Example:

import bioinformatics
print(bioinformatics.median_string(['AAATTGACGCAT','GACGACCACGTT','CGTCAGCGCCTG''GCTGAGCACCGG','AGTACGGGACAG'],6))  # Output: 'GACGAC'

`enumerate_motifs(dna, k, d)`

Enumerates all motifs of length k with at most d mismatches in a collection of DNA strings.

Example:

import bioinformatics
print(bioinformatics.enumerate_motifs(["ATTTGGC","TGCCTTA","CGGTATC","GAAAATT"],3,1))  # Output: ['ATA', 'ATT', 'GTT', 'TTA', 'TTG']

`pattern_to_number(pattern)`

Converts a DNA pattern to its corresponding integer.

Example:

import bioinformatics
print(bioinformatics.pattern_to_number("CC"))  # Output: 13

`number_to_pattern(number, k)`

Converts an integer to its corresponding DNA pattern of length k.

Example:

import bioinformatics
print(bioinformatics.number_to_pattern(5,2))  # Output: 'GG'

`generate_frequency_array(dna_string, k)`

Generates the frequency array of k-mers in a DNA string.

Example:

import bioinformatics
print(bioinformatics.generate_frequency_array("AAACAGATCACCCGCTGAGCGGGTTATCTGTT",1))  # Output: [5, 4, 2, 1, 0, 0, 2, 2, 1, 2]

`reverse_complement(dna_string)`

Finds the reverse complement of a DNA string.

Example:

import bioinformatics
print(bioinformatics.reverse_complement("AAAAAGCATAAACATTAAAGAG"))  # Output: 'CTCTTTAATGTTTATGCTTTTT'

`frequent_words_mismatch(dna_string, k, d)`

Finds the most frequent k-mers with at most d mismatches in a DNA string.

Example:

import bioinformatics
print(bioinformatics.frequent_words_mismatch("ACGTTGCATGTCGCATGATGCATGAGAGCT",4,1))  # Output: ['ATGT', 'GATG', 'ATGC']

`approximate_pattern_matching(pattern, text, d)`

Finds all approximate occurrences of a pattern in a text with at most d mismatches.

Example:

import bioinformatics
print(bioinformatics.approximate_pattern_matching("AAAAAGCATAAACATTAAAGAG","AAAAA",0))  # Output: [0, 1, 2, 3, 4, 5, 6, 21]

`approximate_pattern_count(pattern, text, d)`

Counts the number of approximate occurrences of a pattern in a text with at most d mismatches.

Example:

import bioinformatics
print(bioinformatics.approximate_pattern_count("AAAAAGCATAAACATTAAAGAG","AAAAA",0))  # Output: 8

`min_skew(genome)`

Finds the positions in a genome where the skew diagram attains its minimum value.

Example:

import bioinformatics
print(bioinformatics.min_skew("CATGGGCATCGGCCATACGCC"))  # Output: [1, 6]

`clump_finding(genome, k, L, t)`

Finds patterns forming clumps in a genome.

Example:

import bioinformatics
print(bioinformatics.clump_finding("CGGACTCGACAGATGTGAAGAACGACAATGTGAAGACTCGACACGACAGAGTGAAGAGAAGAG",5,50,4))  # Output: ['CGACA']

`pattern_count(text, pattern)`

Counts the occurrences of a pattern in a text.

Example:

import bioinformatics
print(bioinformatics.pattern_count("cgatatatccatag","ata"))  # Output: 3

`frequent_words(text, k)`

Finds the most frequent k-mers in a text.

Example:

import bioinformatics
print(bioinformatics.frequent_words("actgactcccaccccc",3))  # Output: ['ccc']

`pattern_count_positions(text, pattern)`

Finds the positions of all occurrences of a pattern in a text.

Example:

import bioinformatics
print(bioinformatics.pattern_count_positions("cgatatatccatag","ata"))  # Output: [2, 4, 10]

`hamming_distances(pattern, sequence)`

Calculates the Hamming distances between a pattern and a sequence.

Example:

import bioinformatics
print(bioinformatics.hamming_distances("cgatatatccatag","ata"))  # Output: [3, 2, 1, 2, 2, 1, 2, 1, 0, 1, 2, 3, 1, 2]

Files

Functions_documentation.md

Latest commit

History

Functions_documentation.md

File metadata and controls

Bioinformatics-toolspy Code Documentation

Introduction

Functions

levenshtein_distance(string1, string2)

sequence_alignment(sequence1, sequence2, gap_penalty, mismatch_penalty)

longest_common_subsequences(sequence_list)

longest_common_subsequence(sequence1, sequence2)

commun_patterns(pattern_list)

reconstruct_from_kmers(k, kmers)

translate_rna_to_aminoacid(rna_sequence)

de_bruijn_collection(kmers, prefix_func, suffix_func)

find_eulerian_cycle(graph)

de_bruijn(k, sequence)

grph_kmers(kmers)

reconstruct(kmers)

kmer_composition(k, sequence)

distance_between_pattern_and_strings(pattern, string_list)

gibbs(k, t, n, dna, iterations)

randomized_motif_search(k, t, dna, iterations)

greedy_motif_search(k, t, dna)

most_probable(dna_string, k, profile_matrix)

median_string(dna, k)

enumerate_motifs(dna, k, d)

pattern_to_number(pattern)

number_to_pattern(number, k)

generate_frequency_array(dna_string, k)

reverse_complement(dna_string)

frequent_words_mismatch(dna_string, k, d)

approximate_pattern_matching(pattern, text, d)

approximate_pattern_count(pattern, text, d)

min_skew(genome)

clump_finding(genome, k, L, t)

pattern_count(text, pattern)

frequent_words(text, k)

pattern_count_positions(text, pattern)

hamming_distances(pattern, sequence)

`levenshtein_distance(string1, string2)`

`sequence_alignment(sequence1, sequence2, gap_penalty, mismatch_penalty)`

`longest_common_subsequences(sequence_list)`

`longest_common_subsequence(sequence1, sequence2)`

`commun_patterns(pattern_list)`

`reconstruct_from_kmers(k, kmers)`

`translate_rna_to_aminoacid(rna_sequence)`

`de_bruijn_collection(kmers, prefix_func, suffix_func)`

`find_eulerian_cycle(graph)`

`de_bruijn(k, sequence)`

`grph_kmers(kmers)`

`reconstruct(kmers)`

`kmer_composition(k, sequence)`

`distance_between_pattern_and_strings(pattern, string_list)`

`gibbs(k, t, n, dna, iterations)`

`randomized_motif_search(k, t, dna, iterations)`

`greedy_motif_search(k, t, dna)`

`most_probable(dna_string, k, profile_matrix)`

`median_string(dna, k)`

`enumerate_motifs(dna, k, d)`

`pattern_to_number(pattern)`

`number_to_pattern(number, k)`

`generate_frequency_array(dna_string, k)`

`reverse_complement(dna_string)`

`frequent_words_mismatch(dna_string, k, d)`

`approximate_pattern_matching(pattern, text, d)`

`approximate_pattern_count(pattern, text, d)`

`min_skew(genome)`

`clump_finding(genome, k, L, t)`

`pattern_count(text, pattern)`

`frequent_words(text, k)`

`pattern_count_positions(text, pattern)`

`hamming_distances(pattern, sequence)`