-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No output with the actual genome/contigs clusters sequence #2
Comments
Hi! Best regards |
I had the same question before. # get the clusters' representative sequences from fasta and glust_cluster_output
# usage: python3 gclust2fa.py raw.fa gclust.out cluster.fa
# input just like:
'''
>Cluster 0
0 5888230nt, >seq1... *
>Cluster 1
0 4800869nt, >seq2... *
>Cluster 2
0 3906592nt, >seq3... *
1 20nt, >seq4... at -/100.00%
'''
import sys
# input:
fasta_file = sys.argv[1]
# glust.out
clust_file = sys.argv[2]
# output:
outfa = sys.argv[3]
if fasta_file == outfa:
exit()
representative_ctgs = dict()
i = 0
with open(clust_file) as f:
for line in f:
if line.startswith('>'):
i += 1
else:
temp = line.rstrip().split()
if temp[-1] == '*':
representative = 1
else:
representative = 0
if representative == 1:
ctgname = temp[2].rstrip('.')
representative_ctgs[ctgname] = ''
print("Representative number: " + str(i))
outFlag = 0
with open(outfa, 'w') as fout:
with open(fasta_file) as f:
for line in f:
if line.startswith('>'):
if line.rstrip() in representative_ctgs:
outFlag = 1
else:
outFlag = 0
if outFlag == 1:
fout.write(line) |
Dear developers,
Thank you for the useful good tool. I have followed your instructions manual, however, by running gclust exactly as you did, no output file with the actual clusters nucleotide sequence is produced. Only a list of the clusters with the genomes/contigs in each.
It would be a much better and easy to use tool, if will produce a similar output like cd-hit does, with a representative clusters fasta file.
Thank you and best regards
Vadim
The text was updated successfully, but these errors were encountered: