You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Your pseudo-code is very readable. This made it easier for me to determine that it was also logical and sensible. Concerning the beginning of your script, sets are unordered, but quickly searchable. So, to determine if the UMI for a read is known, if you have a set of 96 known UMIs, you can use a conditional such as:
if UMI in set_of_UMIs:
do some stuff
But, to contain chromosome, position, and strandedness, you'll need a dictionary. Sets are unordered, so they cannot be indexed, and they have no "keys" like dictionaries, and chromosome number and position would be indistinguishable, as they're both simply numbers.
You may also want some statistical variables like a counter variable for number of duplicates, so you can print an informative output file telling the user how many duplicates were removed, how many low-quality, how many unknown UMIs, etc... could be helpful.
Also, consider the special cases for a reverse strand, when parsing the CIGAR string. You may need conditionals for Ns, Is, Ds to correct for POS, to test for true duplicates.
Otherwise, very thorough job, and very easy to follow. Well done!
The text was updated successfully, but these errors were encountered:
Your pseudo-code is very readable. This made it easier for me to determine that it was also logical and sensible. Concerning the beginning of your script, sets are unordered, but quickly searchable. So, to determine if the UMI for a read is known, if you have a set of 96 known UMIs, you can use a conditional such as:
if UMI in set_of_UMIs:
do some stuff
But, to contain chromosome, position, and strandedness, you'll need a dictionary. Sets are unordered, so they cannot be indexed, and they have no "keys" like dictionaries, and chromosome number and position would be indistinguishable, as they're both simply numbers.
You may also want some statistical variables like a counter variable for number of duplicates, so you can print an informative output file telling the user how many duplicates were removed, how many low-quality, how many unknown UMIs, etc... could be helpful.
Also, consider the special cases for a reverse strand, when parsing the CIGAR string. You may need conditionals for Ns, Is, Ds to correct for POS, to test for true duplicates.
Otherwise, very thorough job, and very easy to follow. Well done!
The text was updated successfully, but these errors were encountered: