-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
agat_sp_manage_features.pl includes empty interpro output #147
Comments
I also have a separate problem but is still related to parsing of the attributes column. I noticed database references are added as "Dbxref:", is this distinct from "db_xref:" that GenBank uses, following insdc standards? Another thing easy for me to do a simple string substitution (or use _manage_attributes.pl to fix ;) ) |
Hi, we can definitly fix the problem and remove skip the Yes true we use |
Ah, ok. It was looking like this output was very close to INSDC standard
but slightly different, and that makes sense.
I'm curious, during the emblmygff3 conversion, do you also move the value
of the uniprot_id= tag into the db_xref list?
NCBI now has a tabl2asn_GFF tool so the GAG tool you reference in
emblmyGFF3 will, thankfully, soon no longer be necessary. I've been testing
GFFs from AGAT directly through that NCBI tool
https://www.ncbi.nlm.nih.gov/genbank/genomes_gff/
…On Thu, Jul 1, 2021, 12:31 AM Jacques Dainat ***@***.***> wrote:
Hi, we can definitly fix the problem and remove skip the - from the
output.
Yes true we use Dbxref originally to be compliant with the GFF3
specification and genome browsers like Webapollo.
INSDC use instead the tag db_xref but it is exactly the same thing except
INSDC accept only information from specific databases to be stored in this
attribute while GFF3 does not care.
When Submitting to INSDC DB archive we use the ENA gate, and use
EMBLmyGFF3 <https://github.com/NBISweden/EMBLmyGFF3> tool to prepare the
required EMBL file. During the conversion we translate some attribute to
match the expected term of INSDC (see here
<https://github.com/NBISweden/EMBLmyGFF3/blob/master/EMBLmyGFF3/modules/translation_gff_attribute_to_embl_qualifier.json>),
and as example Dbxref is translated into db_xref.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#147 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABMUDUSX5L5HWXAJBEYTS3DTVQKVDANCNFSM47TGICKQ>
.
|
Not by default but everything is possible within EMBLmyGFF3 ^^ you just need to tune the proper "mapping file" in this case it will be the
|
Ok, great! Thanks for answering my questions, this has clarified a lot for
me
…On Thu, Jul 1, 2021, 6:40 AM Jacques Dainat ***@***.***> wrote:
I'm curious, during the emblmygff3 conversion, do you also move the value
of the uniprot_id= tag into the db_xref list?
Not by default but everything is possible within EMBLmyGFF3 ^^ you just
need to tune the proper "mapping file" in this case it will be the
translation_gff_attribute_to_embl_qualifier.json file that you can access
by running EMBLmyGFF3 --expose_translations and then add the following
information:
"uniprot_id": {
"source description": "uniprot database cross reference.",
"target": "db_xref",
"dev comment": "Nothing special to say here"
},
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#147 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABMUDUR2OU7HKAETILTXY4LTVRV3HANCNFSM47TGICKQ>
.
|
I noticed when an interpro domain is not found for an interproscan hit, it's still added to the dbxref list as '-'. This is easy enough to 'sed' out of the gff but wanted to report it anyway. I also don't think this invalidates the gff, but wanted to report it anyway. I noticed hits to CDD are a common culprit of this
example ipr output
corresponding gff output (entry following Gene3D hit)
Edit: To remove this from the output I used
sed -i -E -e 's/InterPro:-,|,InterPro:-//g' my.gff
The text was updated successfully, but these errors were encountered: