-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
N50 and L50 jargon is confusing #15
Comments
Does these read better? I'm afraid we can only add some explanation before making it more confusing. :) |
Thanks for looking at this so quickly. I think the second version (N50_len and N50_num) works well - clear and compact. It would be better not to use L50 to refer to the N50 number at all - I think this usage should be avoided, even if it is found elsewhere. Just my opinion though - some context and debate here and here. |
Thanks John, let's just discard the L50 which brings confusion. |
Great, thank you. |
I feel like the conclusion of this thread was that you should use N50_num and N50_len (rather than L50), but then the implementation was that you just to remove N50_num altogether. I agree with @johnomics that L50 is confusing and N50_num is more appropriate, but I disagree with it's removal entirely. I would recommend putting N50_num back into seqkit stats. |
Just checked the code. L50 (N50_num) is computed but hidden. 😄 |
Hi Shenwei @shenwei356 , Sorry to jump in here, but I think this thread might be the best place to discuss my request. I guess N50 or L50 is not confusing to people anymore since high-throughput sequencing technologies are so common today (compared to 2017). I completely agree with @RhettRautsaw, I think it is time now to bring Lx stats back to seqkit. This would be very cool for large pangenome projects using only seqkit to calculate all the stats wanted. What do you think? By the way, I really like seqkit! Thank you very much for providing this efficient and versatile tool for the world. Best wishes, |
Just added a new column
|
Wow, what a fast reply @shenwei356. Thank you very much. I know I am asking too much, but it would be great to also support -L just like -N so that we can calculate -L 50, 90. What do you think? I really appreciate your work! Best wishes, |
Prerequisites
seqkit version
Describe your issue
Thanks for building seqkit, it is an extremely useful tool that I use every day.
seqkit stats -a
produces N50 and L50 statistics. These labels are very confusing; 'N50' is the 'N50 length', the length of read such that 50% of the bases are in reads of this length or longer. 'L50' is the 'N50 number', the number of reads in this set. The term L50 has no connection with its meaning and in fact suggests it is to do with a length, which is not true. It would be much better to to use the terms 'N50 length' and 'N50 number' (or similar terms) to make the meaning of these statistics clear. I realise other tools use the same jargon but it is unclear and would be better replaced.The text was updated successfully, but these errors were encountered: