-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
translating protein from anvi-get-dna-sequences-for-hmm-hits #400
Comments
Hi Jordan, Would an Take care, |
Yeah, I think other users might find it useful. It is probably best to integrate a anvi-get-aa-sequences-for-hmm-hits into the existing codebase due to issues with dependencies. My code requires ruby and the ruby gem bio. But for now users can use the following code to translate there anvi-get-dna-sequences-for-hmm-hits output into aa. #!/usr/bin/env ruby
require 'bio'
def length_finder(input_array)
a = []
input_array.each do |n|
a << n.length.to_f
end
a
end
file = Bio::FastaFormat.open(ARGV.shift)
file.each do |entry|
n=0
prots = []
stopss = []
protlens = []
puts ">#{entry.definition}"
seq = Bio::Sequence::NA.new("#{entry.seq}")
codon_table = Bio::CodonTable[11]
prot = seq.translate(1, codon_table)
stops = prot.scan(/\*/).count
stopss.push(stops.to_i)
prots.push(prot)
prot = seq.translate(2, codon_table)
stops = prot.scan(/\*/).count
stopss.push(stops.to_i)
prots.push(prot)
prot = seq.translate(3, codon_table)
stops = prot.scan(/\*/).count
stopss.push(stops.to_i)
prots.push(prot)
prot = seq.translate(-1, codon_table)
stops = prot.scan(/\*/).count
stopss.push(stops.to_i)
prots.push(prot)
prot = seq.translate(-2, codon_table)
stops = prot.scan(/\*/).count
stopss.push(stops.to_i)
prots.push(prot)
prot = seq.translate(-3, codon_table)
stops = prot.scan(/\*/).count
stopss.push(stops.to_i)
prots.push(prot)
lennuc = entry.definition.scan(/\d+$/).join('')
protlen = lennuc.to_f / 3
protlens = length_finder(prots)
full = ''
part = ''
for n in 0..5 do
if (stopss[n] == 1) && (protlens[n] == protlen) then
puts prots[n]
elsif (stopss[n] == 0) && (protlens[n] == protlen) then
puts prots[n]
end
end
end |
This is done! In the new version the program will be called |
Most of the time concatenated gene trees are made from aa sequences.
Here is a quick ruby script to translate sequences to aa and keep the same output for.
Disclaimer: I should probably add a check on the length of protein for avoid any overlap between starts with M ends with * and the real protein but it worked for my proposes. Let me know if there was already a solution to this I missed.
The text was updated successfully, but these errors were encountered: