Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding fixed depth prefix trie implementation #268

Merged
merged 1 commit into from
Jun 24, 2014

Conversation

fnothaft
Copy link
Member

This is part of the read error correction work. This adds a prefix trie trait, as well as two classes to implement a fixed key length prefix trie (and tests). Ultimately, this will be used to look up k-mers during a read error correction transformation.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/388/

@tdanford
Copy link
Contributor

Can you rebase this off of master again, Frank?

@fnothaft
Copy link
Member Author

Can you rebase this off of master again, Frank?

Definitely; I'm OOO today but will rebase tonight.

This seems like it's coming awfully close to implementing an immutable Map trait of some kind. Do you know what the delta is, between what's implemented here and what you'd need to implement to actually satisfy that trait's requirements?

It's actually a very large delta. The scala.collection.immutable.Map trait requires supporting:

  • Supporting iteration across the collection (e.g., map, flatMap, foreach)
  • Arbitrary key addition and removal
  • Supporting reduction across the collection

These three features add a lot of methods, and they're not all useful for us.

Is 'models' the appropriate package? My vote would be either for 'utils' or 'algorithms', I'm not sure this qualifies as a data model in the sense I would normally think of it. Not sure though.

Agreed; I wasn't sure of that either. I'm open to whichever.

Also, I'll take a look at your failing tests later today.

@tdanford
Copy link
Contributor

It's actually a very large delta.

Got it; nevermind on that comment then.

@fnothaft
Copy link
Member Author

@tdanford I've intentionally chosen to throw exceptions if you present an incorrect length key, since an invariant of this data structure is that all keys have the same length. If you providing it with a key with an incorrect length, you're violating the invariant and should get an exception.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/14/

@fnothaft
Copy link
Member Author

@tdanford I've just made the changes you suggested (rename, improve docs, move package), and rebased on top of master. I removed the test you added for the find method, and clarified the docs for the method: I explicitly wasn't checking for key length in that method as it is class private and is expected to only be called after error checking has been done.

Also, I squashed down your two test commits into a single commit. I had to rewrite your commit message for it to make sense, so let me know if you'd rather me change the message on 33cc463.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/16/

@carlyeks
Copy link
Member

All of the changes LGTM.
Any reason to not take this as a single commit?

@fnothaft
Copy link
Member Author

@carlyeks I kept it as multiple commits since part of the code came as a PR against my personal repo by @tdanford. I'm glad to squash down further if @tdanford is fine with his commit getting absorbed.

@carlyeks
Copy link
Member

He says he's fine with that.

@fnothaft
Copy link
Member Author

OK; I need to run to a meeting but will rebase afterwards.

@fnothaft
Copy link
Member Author

@carlyeks squashed! Sorry for the delay.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/26/

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/27/

carlyeks added a commit that referenced this pull request Jun 24, 2014
Adding fixed depth prefix trie implementation
@carlyeks carlyeks merged commit cf4d093 into bigdatagenomics:master Jun 24, 2014
@carlyeks
Copy link
Member

Thanks, Frank!

@fnothaft fnothaft deleted the prefix-trie branch July 10, 2014 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants