Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring FASTA work to break contig sizes. #160

Merged
merged 1 commit into from
Mar 10, 2014

Conversation

fnothaft
Copy link
Member

@fnothaft fnothaft commented Mar 3, 2014

This change fixes #109. Specifically, on FASTA import, assemblies that were too long (>800kbp) would cause things to go a bit wonky. To resolve this, we break a single contig up into contig fragments. Additionally, this PR adds a few utility functions for manipulating contigs.

As a note, I've held off on editing the CHANGES.txt/md until there is consensus on which one we are sticking with.

union { null, long } sequenceLength = null;
union { null, string } url = null;
union { null, string } url = null;
array<Base> fragmentSequence = []; // sequence of bases in this fragment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there some discussion of moving away from array to string to represent a sequence?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arahuja, I believe @nealsid suggested it a while back. I can run an experiment in the next few days to check this; I believe that Neal was concerned that we would get better compression if we used strings instead of an array of enums. This should be a straightforward change to make and an easy one to evaluate.

I think the advantage you do get from defining your alphabet via an enum is that you are strictly checking for possible errors; whether this is valuable is another question.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fnothaft Should we use a string here?

@AmplabJenkins
Copy link

One or more automated tests failed
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/190/

@fnothaft
Copy link
Member Author

fnothaft commented Mar 3, 2014

Jenkins, test this please.

@AmplabJenkins
Copy link

One or more automated tests failed
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/191/

@fnothaft
Copy link
Member Author

fnothaft commented Mar 3, 2014

Jenkins, test this please.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/192/

@fnothaft fnothaft assigned tdanford and unassigned tdanford Mar 3, 2014
@fnothaft
Copy link
Member Author

fnothaft commented Mar 3, 2014

Jenkins, test this please.

@AmplabJenkins
Copy link

One or more automated tests failed
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/194/

@carlyeks
Copy link
Member

carlyeks commented Mar 3, 2014

Jenkins, retest this please.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/195/

@fnothaft fnothaft added this to the 0.7.0 Release milestone Mar 5, 2014
@fnothaft
Copy link
Member Author

Jenkins, test this please.

@fnothaft
Copy link
Member Author

@massie @arahuja I've changed this to strings instead of Array[Base].

@AmplabJenkins
Copy link

One or more automated tests failed
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/200/

@fnothaft
Copy link
Member Author

Looks like an issue with an implicit conversion... Fixing now.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/201/

massie added a commit that referenced this pull request Mar 10, 2014
Refactoring FASTA work to break contig sizes.
@massie massie merged commit 931d6fa into bigdatagenomics:master Mar 10, 2014
@massie
Copy link
Member

massie commented Mar 10, 2014

Thanks, Frank!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fasta converter hangs on large files
6 participants