Skip to content

sdwfrost/biobash

Repository files navigation

biobash

A collection of bash scripts for bioinformatics.

  • cleannames.sh : Takes NCBI formatted FASTA file and generates a new FASTA file with only the accessions.
  • countntbl.sh : Generates a table with sequence names and the number of 'N's in the sequence.
  • countseq.sh : Counts sequences in a FASTA file.
  • dedup.sh : Removes exact duplicates from a FASTA file. From a tip by Pierre Lindenbaum (see https://www.biostars.org/p/3003/).
  • degap.sh : Degaps a FASTA file (see https://www.biostars.org/p/302104/).
  • extractacc.sh : Takes NCBI formatted FASTA file and generates a text file with the accessions.
  • extractseq.sh : Takes FASTA file and extracts only the sequence.
  • fas2csv.sh : Converts FASTA into tab-separated file.
  • fas2phylip.sh : Converts FASTA to phylip format, useful for phyml.
  • lenseq.sh : Returns length of all sequences in a FASTA file.
  • linfasta.sh : Converts a FASTA file into linearized sequences (i.e. alternating titles and sequences). Taken from a hint by Frederic Mahe (see http://www.biostars.org/p/17680).
  • longorf.sh : Extracts the longest open reading frame. Requires getorf from EMBOSS.
  • numbersequences.sh : Renames sequences with 'X' followed by a number. Use in conjunction with seqnametable.sh.
  • relabel.sh: Relabels sequences using a stub concatenated with a numeric index.
  • removesmalls.sh : Removes sequences shorter than a given threshold. Taken from a hint by Frederic Mahe (see http://www.biostars.org/p/79202/).
  • seqnametable.sh : Generates a tab-separated file of new names generated by numbersequences.sh and the original name.
  • sortfasta.sh : Sorts a FASTA file into sequences of decreasing length.
  • startatname.sh : Prints a FASTA file beginning at a given sequence name.
  • stopatname.sh : Prints a FASTA file until (and including) a given sequence name.
  • translate.sh: Uses transeq from EMBOSS to translate sequences, but removes the additional numbering introduced by transeq.
  • trimorf.sh: Trims non-coding regions from the beginning and end of a sequence. Requires getorf from EMBOSS.
  • tsv2fas.sh: Converts a tab-delimited file (name,sequence with no header to FASTA.

About

A collection of bash scripts for bioinformatics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages