GitHub - svsuresh/python_scripts: This repository contains miscellaneous bioinformatics python scripts

Python scripts mostly jupyter notebooks

This folder contains python scripts (and mostly as ipython/jupyter notebooks). Following is the description:

hgvs_p_syntax_conversion.ipynb: This script converts HGVS pSyntax from 3 letter code to 1 letter code and vice vers. It asks for codon length (3 or 1) in your file and then the file path. Currently, script takes only one column file. Column should contain pSyntax without any header. Output will have two columns. One with old syntax and second column with converted syntax. Input can be in any case and output would be in case sentense. Input also support versions in protein IDs.

Example input for 3 letter syntax

NP1457:p.ser2ala
p.ala3tyr
NP123.1:p.Tyr3pHe

Output for example output would be:

NP1457:p.ser2ala NP1457:p.S2A
p.ala3tyr p.A3Y
NP123.1:p.Tyr3pHe NP123.1:p.Y3F

Example input for single letter syntax:

p.a2G
NP.2123:p.C2p
p.D3e

Output for Example input would be:

p.a2G p.Ala2Gly
NP.2123:p.C2p NP.2123:p.Cys2Pro
p.D3e p.Asp3Glu

psqldf_multiple_df_with_dots_join.ipynb. This script is an example in joining multiple data frames in pandas. Joining data frames with period(.) in names could be tricky in SQL join. This example shows how to do that. This script joins 3 data frames with . in column names.
extract_motifs - Folder has python code to extract motifs (from-to) from fasta file based on full name of the sequence and regions of interest. At this point script doesn't support partial name search.
bioservices_kegg_python.ipynb - This script uses bioservices python library to download KEGG pathway IDs and names given gene symbols.
ncbi_id_converter_py.ipynb - This script uses biopython libraries to convert PMIDs to PMCIDs.
gbtofasta folder contains python script for extracting nucleotide and protein sequence from the gb (genebank file). From multientry genbank file, it extracts nucleotide and protein sequence. Please change the script as you need.
count_a_first_position.py takes 3 arguements: list of sequences, base to be searched for, position at which based to be searched for. This script looks for a base (user provided single base, argument: base=""), at given position in each sequence in the list of the sequence. It outputs base, if it is present in each sequence or not (1 present and 0 absent) and total number of sequences in which user furnished base is present at user furnished position.
append_duplicate_numbers.py takes an input fasta file (eg. test.fa) and an output fasta file (eg. out.fa). It appends numbers at the end of the each header depending on the number of times it is present. For eg if header is present only once, it would have #1 appended to the header. If header is present 4 times, each duplicated header will carry numbers from 1 to 4.
sliding_window_16102022.py: Takes an input with one or more fasta sequences. If user provides sliding window size, and step size, script outputs a fasta file with sequences as per user input parameters, in both forward and reverse directions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python scripts mostly jupyter notebooks

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
extract_motifs		extract_motifs
fasta_kmer		fasta_kmer
fasta_ops		fasta_ops
gbtofasta		gbtofasta
ry_conversion		ry_conversion
sff2fastq		sff2fastq
README.md		README.md
append_duplicate_numbers.py		append_duplicate_numbers.py
bioservices_kegg_python.ipynb		bioservices_kegg_python.ipynb
count_a_first_position.py		count_a_first_position.py
hgvs_p_syntax_conversion.ipynb		hgvs_p_syntax_conversion.ipynb
ncbi_id_converter_py.ipynb		ncbi_id_converter_py.ipynb
psqldf_multiple_df_with_dots_join.ipynb		psqldf_multiple_df_with_dots_join.ipynb
sliding_window_16102022.py		sliding_window_16102022.py

svsuresh/python_scripts

Folders and files

Latest commit

History

Repository files navigation

Python scripts mostly jupyter notebooks

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages