Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behavior when applying agat_convert_sp_gxf2gxf.pl to uORFs #315

Closed
svigneau opened this issue Dec 12, 2022 · 3 comments
Closed

Comments

@svigneau
Copy link

svigneau commented Dec 12, 2022

When running the command:

docker run -v $(pwd):/mnt quay.io/biocontainers/agat:1.0.0--pl5321hdfd78af_0  agat_convert_sp_gxf2gxf.pl --gff /mnt/saccharomyces_cerevisiae_R64-3-1_20210421_debug.gff -o /mnt/saccharomyces_cerevisiae_R64-3-1_20210421_debug_agat.gff > agat_convert_sp_gxf2gxf_debug.log 2>&1

on the following GFF file extracted from SGD's latest annotation (saccharomyces_cerevisiae_R64-3-1_20210421.gff):

chrVII  SGD     gene    182390  184084  .       +       .       ID=YGL171W;Name=YGL171W;gene=ROK1;Alias=ROK1,RNA-dependent%20ATPase%20ROK1;Ontology_term=GO:0000447,GO:0000472,GO:0000480,GO:0003724,GO:0005730,GO:0008186,GO:0008186,GO:0030490,GO:0030686,GO:0048254,SO:0000704;Note=RNA-dependent%20ATPase%3B%20involved%20in%20pre-rRNA%20processing%20at%20sites%20A0%2C%20A1%2C%20and%20A2%2C%20and%20in%20control%20of%20cell%20cycle%20progression%3B%20contains%20two%20upstream%20open%20reading%20frames%20%28uORFs%29%20in%205'%20untranslated%20region%20which%20regulate%20translation;display=RNA-dependent%20ATPase;dbxref=SGD:S000003139;orf_classification=Verified;curie=SGD:S000003139
chrVII  SGD     uORF    182286  182407  .       +       .       Parent=YGL171W_id002,YGL171W_id001;Name=YGL171W_uORF;orf_classification=Verified
chrVII  SGD     uORF    182291  182329  .       +       .       Parent=YGL171W_id002,YGL171W_id001;Name=YGL171W_uORF;orf_classification=Verified
chrVII  SGD     CDS     182390  184084  .       +       0       Parent=YGL171W_id002,YGL171W_id001;Name=YGL171W_CDS;orf_classification=Verified;protein_id=UniProtKB:P45818
chrVII  SGD     mRNA    182352  184232  .       +       .       ID=YGL171W_id002;Name=YGL171W_id002;Parent=YGL171W;transcript_id=SGD:S000289373;dbxref=RefSeq:NM_001181036.1;conditions=GAL
chrVII  SGD     mRNA    182374  184383  .       +       .       ID=YGL171W_id001;Name=YGL171W_id001;Parent=YGL171W;transcript_id=SGD:S000294290;dbxref=RefSeq:NM_001181036.1;conditions=YPD

the following output is produced:

##gff-version 3
chrVII  SGD     gene    182286  184383  .       +       .       ID=YGL171W;Alias=ROK1,RNA-dependent ATPase ROK1;Name=YGL171W;Note=RNA-dependent ATPase%3B involved in pre-rRNA processing at sites A0%2C A1%2C and A2%2C and in control of cell cycle progression%3B contains two upstream open reading frames (uORFs) in 5' untranslated region which regulate translation;Ontology_term=GO:0000447,GO:0000472,GO:0000480,GO:0003724,GO:0005730,GO:0008186,GO:0008186,GO:0030490,GO:0030686,GO:0048254,SO:0000704;curie=SGD:S000003139;dbxref=SGD:S000003139;display=RNA-dependent ATPase;gene=ROK1;orf_classification=Verified
chrVII  SGD     mRNA    182286  184232  .       +       .       ID=YGL171W_id002;Parent=YGL171W;Name=YGL171W_id002;conditions=GAL;dbxref=RefSeq:NM_001181036.1;transcript_id=SGD:S000289373
chrVII  SGD     exon    182352  184232  .       +       .       ID=nbis-exon-2;Parent=YGL171W_id002;Name=YGL171W_CDS;orf_classification=Verified;protein_id=UniProtKB:P45818
chrVII  SGD     CDS     182390  184084  .       +       0       ID=cds-1;Parent=YGL171W_id002;Name=YGL171W_CDS;orf_classification=Verified;protein_id=UniProtKB:P45818
chrVII  SGD     five_prime_UTR  182352  182389  .       +       .       ID=nbis-five_prime_utr-2;Parent=YGL171W_id002;Name=YGL171W_CDS;orf_classification=Verified;protein_id=UniProtKB:P45818
chrVII  SGD     three_prime_UTR 184085  184232  .       +       .       ID=nbis-three_prime_utr-2;Parent=YGL171W_id002;Name=YGL171W_CDS;orf_classification=Verified;protein_id=UniProtKB:P45818
chrVII  SGD     uORF    182286  182329  .       +       .       ID=uorf-1;Parent=YGL171W_id002;Name=YGL171W_uORF;orf_classification=Verified
chrVII  SGD     mRNA    182286  184383  .       +       .       ID=YGL171W_id001;Parent=YGL171W;Name=YGL171W_id001;conditions=YPD;dbxref=RefSeq:NM_001181036.1;transcript_id=SGD:S000294290
chrVII  SGD     exon    182374  184383  .       +       .       ID=nbis-exon-1;Parent=YGL171W_id001;Name=YGL171W_CDS;orf_classification=Verified;protein_id=UniProtKB:P45818
chrVII  SGD     CDS     182390  184084  .       +       0       ID=cds-1;Parent=YGL171W_id001;Name=YGL171W_CDS;orf_classification=Verified;protein_id=UniProtKB:P45818
chrVII  SGD     five_prime_UTR  182374  182389  .       +       .       ID=nbis-five_prime_utr-1;Parent=YGL171W_id001;Name=YGL171W_CDS;orf_classification=Verified;protein_id=UniProtKB:P45818
chrVII  SGD     three_prime_UTR 184085  184383  .       +       .       ID=nbis-three_prime_utr-1;Parent=YGL171W_id001;Name=YGL171W_CDS;orf_classification=Verified;protein_id=UniProtKB:P45818
chrVII  SGD     uORF    182286  182329  .       +       .       ID=nbis-uorf-1;Parent=YGL171W_id001;Name=YGL171W_uORF;orf_classification=Verified

The original GFF file contains two uORFs with distinct coordinates (each with two mRNA parents) but, surprisingly, the output file contains two uORFs sharing the same coordinates (each with a different mRNA parent), with the start coordinate taken from one of the original uORFs and the end coordinate taken from the other one.

Do you know what the underlying logic leading to this behavior is and whether AGAT can be configured to preserve the original uORFs coordinates in cases like this one?

@Juke34
Copy link
Collaborator

Juke34 commented Jan 2, 2023

Right The two features are merged, but the second one is supposed to be removed... but this is a bug because such type of feature is not supposed to be merged. uORF at this line must be replaced by uorf

@Juke34
Copy link
Collaborator

Juke34 commented Feb 17, 2023

Actually the second one is not supposed to be removed. The only problem is that this type of feature is not supposed to be merged. Will try to implement something to either tell AGAT to not merge any feature, or to specifically decide which type of feature to merge.

@Juke34
Copy link
Collaborator

Juke34 commented Feb 17, 2023

Actually you can already deactivate the behaviour completely by deactivating check_all_level3_locations.
To modify the config you must run first:

agat config --expose --no-check_all_level3_locations

Juke34 added a commit that referenced this issue Feb 17, 2023
…h feature to skip at the check_all_level3_locations step
@Juke34 Juke34 closed this as completed in b63b1a7 Feb 17, 2023
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants