-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring FASTA work to break contig sizes. #160
Conversation
union { null, long } sequenceLength = null; | ||
union { null, string } url = null; | ||
union { null, string } url = null; | ||
array<Base> fragmentSequence = []; // sequence of bases in this fragment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there some discussion of moving away from array to string to represent a sequence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arahuja, I believe @nealsid suggested it a while back. I can run an experiment in the next few days to check this; I believe that Neal was concerned that we would get better compression if we used strings instead of an array of enums. This should be a straightforward change to make and an easy one to evaluate.
I think the advantage you do get from defining your alphabet via an enum is that you are strictly checking for possible errors; whether this is valuable is another question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fnothaft Should we use a string here?
One or more automated tests failed |
Jenkins, test this please. |
One or more automated tests failed |
Jenkins, test this please. |
All automated tests passed. |
Jenkins, test this please. |
One or more automated tests failed |
Jenkins, retest this please. |
All automated tests passed. |
Jenkins, test this please. |
One or more automated tests failed |
Looks like an issue with an implicit conversion... Fixing now. |
All automated tests passed. |
Refactoring FASTA work to break contig sizes.
Thanks, Frank! |
This change fixes #109. Specifically, on FASTA import, assemblies that were too long (>800kbp) would cause things to go a bit wonky. To resolve this, we break a single contig up into contig fragments. Additionally, this PR adds a few utility functions for manipulating contigs.
As a note, I've held off on editing the CHANGES.txt/md until there is consensus on which one we are sticking with.