Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agat_sp_statistics.pl for outputing whole distribution results as csv format #507

Closed
Jasminefyh opened this issue Oct 29, 2024 · 5 comments

Comments

@Jasminefyh
Copy link

Dear authors,

Thank you for providing this excellent tool for genome structure statistics. I have been using the command agat_sp_statistics.pl and found it very helpful for generating distribution plots of genome features, such as exons, introns, and CDS regions.

I am wondering if it is possible to obtain the statistical distribution results as a CSV file, including details like Transcript ID, Transcript Sequence Length, Number of Exons, Total Exon Sequence Length, etc. This would greatly help us in performing additional custom analyses and plotting genome features independently.

I appreciate your time and look forward to your response.

Sincerely,
Jasmine

@Juke34
Copy link
Collaborator

Juke34 commented Oct 29, 2024

The statics can be outputed in yaml. This format is easily parsable computationally. Is is not sufficient for what you want to achieve ?

@gchevignon
Copy link

Hello,
I am not sure the yaml contain the data asked by Jasmin, if I understand well and it is related to my following question :
Is it possible to output a table of the raw data produced by the script to produce the different plots ?
This will allow us to fine tune plots directly with R.
I hope this is clear enough ...
Best
Germain

@Juke34
Copy link
Collaborator

Juke34 commented Dec 3, 2024

If you get the yaml file e.g.:

repeat_region:
  isoform: NA
  value:
    90 percentile repeat_region length (bp): na
    Longest repeat_region (bp): 101
    Number of repeat_region: 2
    Number repeat_region overlapping: 0
    Shortest repeat_region (bp): 101
    Total repeat_region length (bp): 202
    mean repeat_region length (bp): 101
    median repeat_region length (bp): 101

It should be easy to get a table programmatically

title value
90 percentile repeat_region length (bp) na
Longest repeat_region (bp) 101
Number of repeat_region 2
Number repeat_region overlapping 0
Shortest repeat_region (bp) 101
Total repeat_region length (bp) 202
mean repeat_region length (bp) 101
median repeat_region length (bp) 101

@gchevignon
Copy link

Yes indeed but this not what we ask ...
We want the table that the script use to produce this type of plot :
mrnaClass_cds.pdf
And those data are not in the yaml file ...
Ideally we'll have a table per plot
Thanks !

@Juke34
Copy link
Collaborator

Juke34 commented Dec 3, 2024

Ok I understand you do not want the resulting statistics but the raw data. Yes it is feasible. Will see when time allow this implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants