Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

script to detect premature stop codons in CDS #58

Closed
AMMMachado opened this issue Jul 9, 2020 · 3 comments
Closed

script to detect premature stop codons in CDS #58

AMMMachado opened this issue Jul 9, 2020 · 3 comments
Labels
enhancement New feature or request

Comments

@AMMMachado
Copy link

Hi Juke,
The agat toolkit is amazing.
I'm using a lot of your scripts to correct and add missing features to a gff3 file of a vertebrate genome.
Do you have some tool/script to correct/remove non in frame stop condons from gff3 file ? Permature stop codons?

Best Regards

André

@Juke34
Copy link
Collaborator

Juke34 commented Jul 9, 2020

Hi,
Happy to hear that you find AGAT useful.
I'm not sure to understand what you are looking for. Either you search to modify the underlying sequence and thus it is an assembly side problem, or you search to modify the structural annotation (redefining the exon-intron structure to avoid the stop codon (you have to take into account the donor/acceptor splicing site)) and it is a gene prediction problem.

AGAT does not have any script to modify the assembly (yet).
There is a script to redefine the ORFs (it will stop the CDS at the first stop met, keeping the longest ORF) see agat_sp_fix_longest_ORF.pl
I could write a script to flag gene with stop codon in frame.

@AMMMachado
Copy link
Author

AMMMachado commented Jul 9, 2020

Hi,
The assembly it self is not the target. It's the gff3 file. A script to flag genes with stop codon in frame would be a tremendous help for some projects. I tried to understand this script ( agat_sp_fix_longest_ORF.pl), but I m bit confuse, with the models. How the script work to fix gene models and how to detect putative pseudogene?

Andre

@Juke34
Copy link
Collaborator

Juke34 commented Jul 16, 2020

The script extracts the CDS sequence and predict the longest ORF. Then it compares the prediction to the original ORF and classify this prediction according the different cases that can be met: i) Do not overlap; ii)overlap but in a different frame; iii) overlap in the same frame but longer; iv) overlap in the same frame but shorter. The case i) will split the original gene model into 2 gene models. The case iv) is a bit special, it can be the result of an annotation error in the original annotation, it can be a Stop-codon read-through (but agat_sp_fix_longest_ORF.pl do not deal with that), or it can be a pseudogene (shorter than expected. Most likely a case where the original annotation comes from a annotation lift-over). I think I have commented everything referring to pseudogene, because the pseudogene detection was just an experimental feature.

@Juke34 Juke34 changed the title Permature stop codons Premature stop codons Jul 17, 2020
@Juke34 Juke34 added the enhancement New feature or request label Sep 2, 2020
@Juke34 Juke34 changed the title Premature stop codons script to detect premature stop codons in CDS Sep 2, 2020
@Juke34 Juke34 closed this as completed in a9a9ce8 Sep 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants