You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, I'm tripping out on how similar our pseudocode is...10/10 for overall readability. Your test examples are incredibly thorough. You seem to have considered many different instances of duplicates with your test examples, which will pay significant dividends in your real code. I like how you explicitly stated that you want samtools to sort by chromosome, as this will make identifying duplicates a lot faster.
Jason also suggested that I implement a set so I can totally help you with this. You just need to initialize an empty set (as you've done) and add your items of interest with ".add". Consider re-initializing the set after you have written your unique files out and progressed to a new chromosome to minimize the amount of stuff stored in memory.
Really nice high-level functions. The fact that you created two functions to account for strandedness is awesome, but consider combining adj_strand+_pos and adj_strand-_pos, as one is simply the inverse of the other. Nice job accounting for other op's such as I, D, and N. When you call UMI_check(string) to see if the UMI is in dict_known_UMI, wouldn't you want to skip it altogether?
I like how you already thought where you want to open your file(s). Make sure you think about where you want to put the command to write to your open files. Don't forget to close the files at the end.
Nice start. Please let me know if you'd like any elaboration on my feedback.
Mitch
The text was updated successfully, but these errors were encountered:
Ana!
First of all, I'm tripping out on how similar our pseudocode is...10/10 for overall readability. Your test examples are incredibly thorough. You seem to have considered many different instances of duplicates with your test examples, which will pay significant dividends in your real code. I like how you explicitly stated that you want samtools to sort by chromosome, as this will make identifying duplicates a lot faster.
Jason also suggested that I implement a set so I can totally help you with this. You just need to initialize an empty set (as you've done) and add your items of interest with ".add". Consider re-initializing the set after you have written your unique files out and progressed to a new chromosome to minimize the amount of stuff stored in memory.
Really nice high-level functions. The fact that you created two functions to account for strandedness is awesome, but consider combining adj_strand+_pos and adj_strand-_pos, as one is simply the inverse of the other. Nice job accounting for other op's such as I, D, and N. When you call UMI_check(string) to see if the UMI is in dict_known_UMI, wouldn't you want to skip it altogether?
I like how you already thought where you want to open your file(s). Make sure you think about where you want to put the command to write to your open files. Don't forget to close the files at the end.
Nice start. Please let me know if you'd like any elaboration on my feedback.
Mitch
The text was updated successfully, but these errors were encountered: