Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSeQC: read_distribution.py bar graph double-counts some categories #1457

Closed
blavetn opened this issue Jun 17, 2021 · 4 comments
Closed

RSeQC: read_distribution.py bar graph double-counts some categories #1457

blavetn opened this issue Jun 17, 2021 · 4 comments
Labels
bug: module Bug in a MultiQC module

Comments

@blavetn
Copy link

blavetn commented Jun 17, 2021

The RSeQC read distribution plot is using the value reported by the tool as they are but a computing step need to be done because, according to RSeQC manual:

Tags assigned to TSS_up_1kb were also assigned to TSS_up_5kb and TSS_up_10kb, tags assigned to TSS_up_5kb were also assigned to TSS_up_10kb.

Therefore:

Total Assigned Tags = CDS_Exons + 5’UTR_Exons + 3’UTR_Exons + Introns + TSS_up_10kb + TES_down_10kb

So it is needed to recompute:

  • new_TES_down_10kb = TES_down_10kb - TES_down_5kb
  • new_TES_down_5kb = TES_down_5kb - TES_down_1kb
  • new_TSS_up_10kb = TSS_up_10kb - TSS_up_5kb
  • new_TSS_up_5kb = TSS_up_5kb - TSS_up_1kb

Only then does:

Total Assigned Tags = CDS_Exons + 5’UTR_Exons + 3’UTR_Exons + Introns + TSS_up_1kb + new_TSS_up_5kb + new_TSS_up_10kb + TES_down_1kb + new_TES_down_5kb + new_TES_down_10kb

Otherwise the percentage of all the features can be >= 100% !

@ewels ewels changed the title RSeQC: read distribution need recounting RSeQC: read_distribution.py bar graph double-counts some categories Jun 17, 2021
@ewels ewels added the bug: module Bug in a MultiQC module label Jun 17, 2021
@ewels
Copy link
Member

ewels commented Jun 17, 2021

Example tool output:

https://github.com/ewels/MultiQC_TestData/blob/master/data/modules/rseqc/read_distribution/RSeQC_rd_P2005_102.out.txt

read_distribution.py 2.6.1
Total Reads                   71859250
Total Tags                    83076623
Total Assigned Tags           79927698
=====================================================================
Group               Total_bases         Tag_count           Tags/Kb             
CDS_Exons           94670410            41691981            440.39            
5'UTR_Exons         7515324             295162              39.27             
3'UTR_Exons         27691393            4991227             180.24            
Introns             1502423972          30914721            20.58             
TSS_up_1kb          33825059            425459              12.58             
TSS_up_5kb          150279744           617325              4.11              
TSS_up_10kb         268263971           785889              2.93              
TES_down_1kb        35625947            302610              8.49              
TES_down_5kb        154052043           994449              6.46              
TES_down_10kb       270854584           1248718             4.61              
=====================================================================

Right, so to rewrite your suggestion in my own words to be sure that I understand it: we basically need to make new variables to use in the bar graph that are:

  • TES_down_5kb-10kb (TES_down_10kb - TES_down_5kb)
  • TES_down_1kb-5kb (TES_down_5kb - TES_down_1kb)
  • TES_down_1kb (unchanged)

(and equivalent for TSS_up).

Makes sense 👍🏻 Thanks for reporting! Good spot..

Phil

@blavetn
Copy link
Author

blavetn commented Jun 18, 2021

Exactly !

@ewels
Copy link
Member

ewels commented Jun 29, 2021

@blavetn - fixed in #1464 by @ErikDanielsson. This will be released in MultiQC v1.11 in the coming days. If you're able to give it a spin in the development version before that to confirm that it looks good that would be great 👍🏻

@ewels
Copy link
Member

ewels commented Jun 29, 2021

Thanks for letting us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug: module Bug in a MultiQC module
Projects
None yet
Development

No branches or pull requests

2 participants