-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve original contig names when using Polypolish #7
Comments
For those interested, here is a simple workaround to remove the ./polypolish assembly.fasta R1.sam R2.sam 2>polypolish.log | \
awk '{ if ($0 ~ /^>/) { gsub("_polypolish$", ""); print } \
else { print } }' \
> polypolish.fasta This issue can potentially be closed now given that an awk one-liner can accomplish the desired task. |
Thanks! I've added a new bit on the Polypolish FAQ addressing this case: I'll close the issue now 😄 |
Thanks! |
Further, it appears that polypolish saves the original headers, provided they aren't separated by spaces -- which Unicycler does and the data after spaces is pretty useful, so you could also do
|
@rrwick Just to add here, do you have any plans on looking into the above issue that happens when spaces are used in contig names? I was wondering if this was an easy fix within polypolish itself, or perhaps in unicycler! |
Polypolish doesn't current save any of the sequence description strings (anything after the first whitespace in the FASTA header), but this could potentially be added. I'll reopen the issue and tag it as an enhancement request. Thanks! |
I have just released Polypolish v0.6.0, which addresses this issue. Contig descriptions are now kept, and |
Thanks so much for your work on Polypolish, and congrats on its recent publication!
I have a very minor feature request. When I run polypolish, I notice that it always appends
_polypolish
on the ends of contig names in the FastA file. E.g.,Becomes
Adding the
_polypolish
suffix to sequence names is nice because it helps the user to track data provenance. However, it can also be a bit cumbersome sometimes, e.g., if the user wants to programmatically compare assembly statistics (pre-polishing) with other analyses done after polishing. Would it be possible to add a flag to polypolish to preserve the original contig names in the FastA file during analysis?Thanks!
The text was updated successfully, but these errors were encountered: