Mitch Peer Review #3

mrezzoni · 2018-10-27T03:04:56Z

Ana!

First of all, I'm tripping out on how similar our pseudocode is...10/10 for overall readability. Your test examples are incredibly thorough. You seem to have considered many different instances of duplicates with your test examples, which will pay significant dividends in your real code. I like how you explicitly stated that you want samtools to sort by chromosome, as this will make identifying duplicates a lot faster.

Jason also suggested that I implement a set so I can totally help you with this. You just need to initialize an empty set (as you've done) and add your items of interest with ".add". Consider re-initializing the set after you have written your unique files out and progressed to a new chromosome to minimize the amount of stuff stored in memory.

Really nice high-level functions. The fact that you created two functions to account for strandedness is awesome, but consider combining adj_strand+_pos and adj_strand-_pos, as one is simply the inverse of the other. Nice job accounting for other op's such as I, D, and N. When you call UMI_check(string) to see if the UMI is in dict_known_UMI, wouldn't you want to skip it altogether?

I like how you already thought where you want to open your file(s). Make sure you think about where you want to put the command to write to your open files. Don't forget to close the files at the end.

Nice start. Please let me know if you'd like any elaboration on my feedback.
Mitch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mitch Peer Review #3

Mitch Peer Review #3

mrezzoni commented Oct 27, 2018

Mitch Peer Review #3

Mitch Peer Review #3

Comments

mrezzoni commented Oct 27, 2018