agat_sp_statistics.pl for outputing whole distribution results as csv format #507

Jasminefyh · 2024-10-29T12:28:23Z

Dear authors,

Thank you for providing this excellent tool for genome structure statistics. I have been using the command agat_sp_statistics.pl and found it very helpful for generating distribution plots of genome features, such as exons, introns, and CDS regions.

I am wondering if it is possible to obtain the statistical distribution results as a CSV file, including details like Transcript ID, Transcript Sequence Length, Number of Exons, Total Exon Sequence Length, etc. This would greatly help us in performing additional custom analyses and plotting genome features independently.

I appreciate your time and look forward to your response.

Sincerely,
Jasmine

The text was updated successfully, but these errors were encountered:

Juke34 · 2024-10-29T13:16:22Z

The statics can be outputed in yaml. This format is easily parsable computationally. Is is not sufficient for what you want to achieve ?

gchevignon · 2024-12-03T16:13:24Z

Hello,
I am not sure the yaml contain the data asked by Jasmin, if I understand well and it is related to my following question :
Is it possible to output a table of the raw data produced by the script to produce the different plots ?
This will allow us to fine tune plots directly with R.
I hope this is clear enough ...
Best
Germain

Juke34 · 2024-12-03T16:30:38Z

If you get the yaml file e.g.:

repeat_region:
  isoform: NA
  value:
    90 percentile repeat_region length (bp): na
    Longest repeat_region (bp): 101
    Number of repeat_region: 2
    Number repeat_region overlapping: 0
    Shortest repeat_region (bp): 101
    Total repeat_region length (bp): 202
    mean repeat_region length (bp): 101
    median repeat_region length (bp): 101

It should be easy to get a table programmatically

title	value
90 percentile repeat_region length (bp)	na
Longest repeat_region (bp)	101
Number of repeat_region	2
Number repeat_region overlapping	0
Shortest repeat_region (bp)	101
Total repeat_region length (bp)	202
mean repeat_region length (bp)	101
median repeat_region length (bp)	101

gchevignon · 2024-12-03T16:59:49Z

Yes indeed but this not what we ask ...
We want the table that the script use to produce this type of plot :
mrnaClass_cds.pdf
And those data are not in the yaml file ...
Ideally we'll have a table per plot
Thanks !

Juke34 · 2024-12-03T18:35:01Z

Ok I understand you do not want the resulting statistics but the raw data. Yes it is feasible. Will see when time allow this implementation.

Juke34 added a commit that referenced this issue Dec 4, 2024

fix #507 - add option raw to print raw data in a dedidcated folder

e7f0180

Juke34 closed this as completed in d8a3f15 Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agat_sp_statistics.pl for outputing whole distribution results as csv format #507

agat_sp_statistics.pl for outputing whole distribution results as csv format #507

Jasminefyh commented Oct 29, 2024

Juke34 commented Oct 29, 2024

gchevignon commented Dec 3, 2024

Juke34 commented Dec 3, 2024 •

edited

Loading

gchevignon commented Dec 3, 2024

Juke34 commented Dec 3, 2024

agat_sp_statistics.pl for outputing whole distribution results as csv format #507

agat_sp_statistics.pl for outputing whole distribution results as csv format #507

Comments

Jasminefyh commented Oct 29, 2024

Juke34 commented Oct 29, 2024

gchevignon commented Dec 3, 2024

Juke34 commented Dec 3, 2024 • edited Loading

gchevignon commented Dec 3, 2024

Juke34 commented Dec 3, 2024

Juke34 commented Dec 3, 2024 •

edited

Loading