Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSeqQC: parse Transcript Integrity Number #737

Closed
drbecavin opened this issue Apr 20, 2018 · 10 comments
Closed

RSeqQC: parse Transcript Integrity Number #737

drbecavin opened this issue Apr 20, 2018 · 10 comments

Comments

@drbecavin
Copy link

drbecavin commented Apr 20, 2018

Integration of Transcript Integrity Number from RSeqQC

  • Name of tool:
    • Transcript Integrity Number - tin.py
  • Tool description:
    • This program is designed to evaluate RNA integrity at transcript level. TIN (transcript integrity number) is named in analogous to RIN (RNA integrity number). RIN (RNA integrity number) is the most widely used metric to evaluate RNA integrity at sample (or transcriptome) level. It is a very useful preventive measure to ensure good RNA quality and robust, reproducible RNA sequencing.
  • Tool homepage:
  • Complete log file output:
  • Log filename pattern:
    • .summary.txt and .tin.xls
  • Most interesting data for General Stats table:
    • TIN(median)
  • Data suitable for MultiQC plot(s):
    • TIN(median)

Thanks for your amazing job on MultiQC. I love this softwareeeeee ! It changed my life !

@ewels ewels changed the title Integration of Transcript Integrity Number from RSeqQC RSeqQC: parse Transcript Integrity Number Apr 20, 2018
@ewels
Copy link
Member

ewels commented Apr 20, 2018

Hah, thanks for the comment @drbecavin - I should add that quote to the website testimonials 😉

Should be pretty easy to add this. I'll take a look into it when I get a chance.

@drbecavin
Copy link
Author

drbecavin commented Apr 20, 2018

No problem, I really enjoy the way you manage all these logs, and the quality of the html report created. Thanks!
Maybe I should create another issue for read_quality.py, another tool of RSeQC?

@ewels
Copy link
Member

ewels commented Apr 20, 2018

Sure 👍 It's good to have multiple issues where possible to break things up.

@EngineerReversed
Copy link

EngineerReversed commented Sep 22, 2020

Has this been included in MultiQC_report?

@guidohooiveld
Copy link

guidohooiveld commented Jun 10, 2021

Hi, I also kindly second this request to include the results of tin.py in a MultiQC report.
An example picture (box plot; Figure 1B) of representing the data of multiple samples can be found in this paper, but other representations of the results (median TIN score + SD or IQR of all transcripts in a sample) in a table or graph may be more appropriate... ??

For completeness below the code used to run the TIN module, and the results have been attached. (the txt file *out.summary.txt contains the summary of the sample (i.e. median + SD), the xls [actually, also a tab delim txt] file *out.tin.xls the TIN score per transcript).

[guidoh@localhost P15-1-6h]$ tin.py -i P15-1-6h_Aligned.sortedByCoord.out.bam -r /mnt/files/guido/INDEX/STAR/Housekeeping_TranscriptsHuman2158.bed
@ 2021-06-10 11:44:04: Get BAM file(s) ...
Total 1 BAM file(s):
        P15-1-6h_Aligned.sortedByCoord.out.bam
@ 2021-06-10 11:44:04: Processing P15-1-6h_Aligned.sortedByCoord.out.bam
[guidoh@localhost P15-1-6h]$ 

Thank you for having a look at this!
G

output_tin.py.zip

@ewels ewels added this to the MultiQC v1.11 milestone Jul 2, 2021
@ErikDanielsson ErikDanielsson mentioned this issue Jul 2, 2021
11 tasks
ewels added a commit to MultiQC/test-data that referenced this issue Jul 2, 2021
@ewels
Copy link
Member

ewels commented Jul 2, 2021

Hi all,

Apologies for the very (very, very) long time it's taken to get this added. @ErikDanielsson has just put together a new RSeQC submodule to support this output in #1481 and it will be part of the v1.11 release any day now.

In the end, I decided that we should keep it simple. It adds two columns to the General Statistics table: the median TIN and the stdev. The latter is hidden by default, it can be shown via the Configure Columns button or at report generation time via a config (see docs).

I hope this is still helpful to you all, despite coming over 3 years late! 😁 Shout if you hit any problems with it.

Many thanks,

Phil

@ewels ewels closed this as completed Jul 2, 2021
@guidohooiveld
Copy link

guidohooiveld commented Jul 5, 2021

Thanks Phil and Erik for creating the MultiQC RSeQC TIN submodule; much appreciated!

Earlier today I updated MutiQC to the latest development version, and ran it again on a map containing various QC output files, including TIN.
Et voila, the 2 columns (of which one is hidden) were indeed added to the General Statistics table. Nice & thanks!

One comment/question, though, regarding the sample names used for the TIN values in the General Statistics table: these are not the same as used for the other RSeQC modules. This makes the table 'less nice' and more difficult to read. See 1st screenshot below.

I think this is due to the fact that within the TIN "summary" file (the txt file *out.summary.txt) the full name of the BAM file is returned (used) by RSeQC (see its copied content below), which is then extracted (parsed) by the MultiQC TIN module, and subsequently used in the General Statistics table.

Therefore: would you have any suggestion to prevent this form happening? So that only the 'base name' is used in the table? Maybe by somehow using on-the-fly the function fn_clean_sample_names?
Note that I am not an expert on how to do this and it may be a too naive thought... but since the 'other' files seem to be correctly recognized and name cleaned (see 2nd screenshot), this may be feasible.

Thus, in summary: in the General Statistics table the full name present in the TIN summary file (*out.summary.txt) is used (e.g. "P26-1-6h_Aligned.sortedByCoord.out.bam"), whereas just the use of only the sample ID (base name) "P26-1-6h" would be preferred.

Content TIN summary file (P26-1-6h_Aligned.sortedByCoord.out.summary.txt):

Bam_file	TIN(mean)	TIN(median)	TIN(stdev)
P26-1-6h_Aligned.sortedByCoord.out.bam	53.72327495737302	53.34221052273402	18.530355596890026

An example file is present in my previous post in this thread (#737 (comment)).

image

Below a screenshot of a map containing for a sample the output of STAR, but also RSeQC and Picard. All relevant files are nicely recognized by MultiQC, and their names are properly 'cleaned' when used in the MultiQC report. Hence my (naive) thought above...

image

@ewels
Copy link
Member

ewels commented Jul 6, 2021

Ah yes, our first v1.11 release bug! You're totally right, we missed passing the sample name through the self.clean_s_name() function (docs). It's a one-line fix, I'll try to get to it later today.

@ewels
Copy link
Member

ewels commented Jul 6, 2021

Moved into a dedicated issue: #1484

@ewels
Copy link
Member

ewels commented Jul 6, 2021

(fixed in v1.12dev)

vladsavelyev pushed a commit to vladsavelyev/MultiQC_TestData that referenced this issue Apr 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants