Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make parent term for transcription factor activity #13588

Closed
hattrill opened this issue Jun 4, 2017 · 22 comments
Closed

Make parent term for transcription factor activity #13588

hattrill opened this issue Jun 4, 2017 · 22 comments

Comments

@hattrill
Copy link

hattrill commented Jun 4, 2017

Could you make a parent term "transcription factor activity"

children-
GO:0001071 nucleic acid binding transcription factor activity
GO:0000989 transcription factor activity, transcription factor binding

(I know that there is probably some historical baggage here, but it would save us a lot of pain as our join solution for the FB ribbon cell is flakey).

@ukemi
Copy link
Contributor

ukemi commented Jun 9, 2017

I'm not sure how this is handled in the MF refactoring project, but the reason for not including it in the ontology traditionally is that it would be defined as a generic function that regulates transcription. An annotation would be no more meaningful than an annotation to the process.

@thomaspd @krchristie @dosumis

@hattrill
Copy link
Author

But what is the activity that is described by these terms? Just protein binding, just nucleic acid binding? These terms are not linked to these MFs. There is a vague activity: "...to modulate transcription", which is seems processy, but would seem to be at the core of the definition.

From a user perspective, I think that transcription factor activity is useful and it would seem helpful/expect to have a grouping term.


nucleic acid binding transcription factor activity
Interacting selectively and non-covalently with a DNA or RNA sequence in order to modulate transcription. The transcription factor may or may not also interact selectively with a protein or macromolecular complex.

transcription factor activity, protein binding
Interacting selectively and non-covalently with any protein or protein complex (a complex of two or more proteins that may include other nonprotein molecules), in order to modulate transcription. A protein binding transcription factor may or may not also interact with the template nucleic acid (either DNA or RNA) as well.

@krchristie
Copy link
Contributor

Part of the background for deleting the original "transription regulator activity" term from the MF branch is that when I went to transcription specific meeting in 2010 with a poster specifically about the transcription overhaul, I talked to some researchers who found it confusing to remember whether they needed to do their enrichment using MF terms or BP terms because of the fact that there didn't seem to be any difference in meaning between the MF term "transcription regulator activity" and the BP term for "regulation of transcription" (now named "regulation of nucleic acid-templated transcription").

Looking into it, David H and I agreed that there did not seem to be any way to define the MF term "transcription regulator activity" such that it could be distinguished from the BP term. The initial setup of GO indicates that MF, BP, and CC are supposed to be orthogonal, i.e. non-overlapping, so it's a problem to have a term in both MF and BP that means exactly the same thing.

I don't know how the MF refactoring is planning to handle this either, but I still don't see how to define the grouping term in MF that you are requesting in a way that makes it distinct from "regulation of transcription" in BP.

@hattrill
Copy link
Author

Was it the name "transcription regulator activity" that was confusing, rather than distinguishing between regulating transcription and the (not-very-definable) quality of possessing transcription factor activity? After all, not all things that regulate transcription are "transcription factors". The people I've discussed it with find it more confusing to have two terms with near-identical definitions unlinked in the ontology.
Can @thomaspd resolve this with MF re-factoring?

@ValWood
Copy link
Contributor

ValWood commented Jun 14, 2017

I really wish we could call the main DNA binding sequence specific transcripition factor term

sequence specific RNA polymerase II transcription factor activity

instead of:

transcriptional activator activity, RNA polymerase II core promoter proximal region sequence-specific binding

which confuses me every day (promoter or proximal region can be a SO extension on the concurrent DNA binding term).

In fact I still don't see the need for 2 terms in different branches (the transcription factor activity and the binding term),

a single binding term

MF DNA binding transcription factor activity with extension "SO term describing region"
involved in (part_of) BP regulation of transcription from RNA polymerase II promoter.

Job done.

The problem is the word "factor". Its a term used by biologists historically when they didn't know what the actual molecular function was (splicing factor, translation factor), so they are really often"processes of function unknown"

But I agree that different types of TF in the MF ontology would benefit from a common grouping term. It would make the correct term more findable by drill down during curation.

@dosumis
Copy link
Contributor

dosumis commented Jun 14, 2017

For the refactoring:

Here's a (quite old) ticket on renaming - which further simplifies the naming scheme based on a suggestion from Paul T:
geneontology/molecular_function_refactoring#5

Here's the design pattern/template ticket:

geneontology/molecular_function_refactoring#23 (comment)

In LEGO, if the factor is also known to bind to another TF or to RNA polII as part of regulating transcription, this can also be represented. It would be straightforward to add classes that capture this.

@pgaudet
Copy link
Contributor

pgaudet commented Oct 9, 2017

(See below for a newer version of this term)

How about this as a parent:

+[Term]
+id:GO:0140110
+name: transcription regulator activity
+namespace: molecular_function
+def: "A molecular function that controls the rate, timing and/or magnitude of transcription of genetic information. The function of transcriptional regulators is to modulate gene expression at the transcription step so that they are expressed in the right cell at the right time and in the right amount throughout the life of the cell and the organism.." [GOC:pg, Wikipedia:Transcription_factor]
+subset: gocheck_do_not_annotate
+is_a: GO:0003674 ! molecular_function
+created_by: pg
+creation_date: 2017-10-18T07:05:44Z

Thanks in advance for your input.

Pascale

@dosumis
Copy link
Contributor

dosumis commented Oct 9, 2017

Needs something about directness of regulation.

For BP - 'regulation of transcription' covers the activity signal transduction pathways that regulate transcription. But we need something more direct for MF. Also - isn't all of this regulation of transcription initiation?

@dosumis
Copy link
Contributor

dosumis commented Oct 9, 2017

#14318

@ValWood
Copy link
Contributor

ValWood commented Oct 9, 2017

isn't all of this regulation of transcription initiation?

not necessarily, a transcription regulator can regulate elongation or termination

This bit
"by binding to a specific DNA sequence or to other regulatory protein factors."
could be tighter.

Probably needs to be clear that the gene product needs to bind RNA polymerase (directly or indirectly through another transcription regulator)? Maybe this does not apply to termination factors so an additional clause might be required for this?

@krchristie
Copy link
Contributor

Seems kind of odd that you guys are trying to put "transcription regulator activity" back into the ontology at the same time that the term "transmembrane transporter activity involved in import into cell" is being proposed for obsoletion because:

From: go-discuss go-discuss-bounces@lists.stanford.edu
Date: Monday, October 9, 2017 at 8:19 AM
Subject: [go-discuss] Proposal to obsolete 'GO:0098663 transmembrane transporter activity involved in import into cell’

Dear all,

The proposal has been made to obsolete 'GO:0098663 transmembrane transporter activity involved in import into cell’. The reason for obsoletion is that this term contains information that belongs in the Process ontology ("import into cell”). [snip]

@ukemi and I spent a long time thinking about "transcription regulator activity" as a MF term. The fact that you can not define it in terms of function really highlights that this does not represent A function, but rather involvement in a process. @ValWood is correct that a constraint to try to say that a "transcription factor" must bind the RNA polymerase either directly or indirectly is not sufficient as bacterial antiterminators such as NusA bind the RNA of the nascent transcript, .

It seems really inconsistent with our founding principles that the three aspects, MF, BP, and CC should be orthogonal, but we are putting back a term that can not be defined in a way that distinguishes it from a biological process term.

@ValWood
Copy link
Contributor

ValWood commented Oct 10, 2017

Hi Karen,

I think the "transporter involved in process" is a slightly different issue.
The MF terms which are "Function involved in process" are clearly unsustainable and transport direction is represented the process branch.

For transcription regulators, a broad grouping term will be helpful for curators to locate the "transcription factor activity" functions. There is clearly a problem in locating the terms, evident from annotation inconsistencies. The terms for sequence specific transcription factors are buried deep int he graph, in such a way that even experienced curators cannot locate the correct terms, and our users cannot find them.

However, I would be very happy to go the same route with TFs: MF "DNA binding transcription factor" involved in BP "transcription from RNA polymerase II promoter". I don't believe the separate branches for "DNA or protein binding" and "transcription factor activity" help the curators, or users of GO.

In this scenario we would only have a small number of transcription factor MF terms which represent the major classes they are naturally grouped into by biologists. I think that is what Pascale and Paul are trying to achieve, in which case I absolutely support this change too.

@mah11
Copy link
Collaborator

mah11 commented Oct 10, 2017

fwiw, I agree with Karen.

@ValWood
Copy link
Contributor

ValWood commented Oct 10, 2017

But the whole of the branch under:
https://www.ebi.ac.uk/QuickGO/term/GO:0001071 in the MF ontology
represent processes not functions......

The grouping is only a way to make it easier for curators to locate them. Really most of these "transcription factor activity" terms should not be in the MF ontology at all. What we have is deeply unsatisfactory.

Personally, I would be happy for most of these to go and only use the ones under
https://www.ebi.ac.uk/QuickGO/term/GO:0000976
...most people do not find these "DNA binding" counterparts of the "MF transcription factor activity" terms when annotating.
It might not be ontologically pure to add such a grouping term, but its a pragmatic move to promote annotation consistency, and deal with a far larger problem.

The alternative is to obsolete the "transcription factor which is only related to 'regulation of transcription' branch. I'd be happy for this to happen too. However, there is clearly a large need in the community to identify " DNA-associated transcription factors.

@dosumis
Copy link
Contributor

dosumis commented Oct 10, 2017

But the whole of the branch under:
https://www.ebi.ac.uk/QuickGO/term/GO:0001071 in the MF ontology
represent processes not functions......

The grouping is only a way to make it easier for curators to locate them. Really most of these "transcription factor activity" terms should not be in the MF ontology at all. What we have is deeply unsatisfactory.

I really don't understand this at all. I think GO should aim to represent the 'functions' of gene products and complexes as biologists understand them, using terms that they recognise. In some cases this requires defining MFs at least partially in terms of some biological process context. dbTF and signalling receptor activity are examples of this. What do we gain by taking such a narrow view of molecular function that we don't allow for this? For the record though, I don't think that defining molecular function as a process that a single protein or complex can carry out precludes that some MFs require a particular process context.

Pascal's proposed TF activity grouping term is not redundant with BP 'regulation of transcription'. Here's her proposed definition:

"Controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence or to other regulatory protein factors."

This is much narrower than the BP term:

image

It covers any upstream process that regulates transcription, including signal transduction pathways. This is not something you can make a curation rule against (as - it comes from the definition of regulates as a transitive relation. Whether you like it or not, it will creep back in with inference from perfectly legal and reasonable GO-CAM models.

We could define a BP for direct regulation of transcription that just covers this. But I think it's just easier to use a directly_regulates relationship to a BP term to define the general MF. This then provides a simple general pattern for the whole branch.

A more general discussion and details of proposed TF refactoring is here:

https://docs.google.com/document/d/11BH6PsdH6u0hgkS_KYhYlAHEhdufg_FPduiFwsGNA04/edit#
(Doc based on general work on MF refactoring patterns to represent compound functions + discussions with Paul and Astrid).

A partial implementation is in this branch #14318 - but may be at least partially superseded by Paul and Pascale's attempt to simplify the branch (I agree that some of the more complex terms should probably go - or at least have simpler names). Note that the proposed name changes are separable from the formalisation (and can be practically separated because implementation is via TSV and script).

@ValWood
Copy link
Contributor

ValWood commented Oct 11, 2017

I am referring specifically to this term, which is only about the regulation of transcription part of the function:
https://www.ebi.ac.uk/QuickGO/term/GO:0001071
This term currently needs to be used in conjunction with a term which more clearly represents the "molecular function" of transcription factors. If you had been curating transcription factors you would understand what I mean.

Currently we need to provide users with specific instructions to find DNA binding TFs

"All sequence-specific DNA-binding transcription factors should be annotated to at least two GO Molecular Function terms, either directly or by transitivity (i.e. annotated to a more specific "descendant" term linked to one of these terms):

GO:0000976 transcription regulatory region sequence-specific DNA binding (view in QuickGO or AmiGO)
GO:0003700 sequence-specific DNA binding transcription factor activity (view in QuickGO or AmiGO)
"

and periodically check that both branches are annotated appropriately (it isn't simple because you need to make different checks to cover pol I, II and III).

I still think a single MF term for each flavour of TF + a connection to the appropriate BP for transcription would be sufficient.
I'm not arguing that there should not be terms for TFs in the MF ontology.

I think it would also be less confusing if the grouping term in the MF ontology was "transcription factor activity" rather than "transcription regulator activity", with Pascale's proposed definition.

@ValWood
Copy link
Contributor

ValWood commented Oct 11, 2017

I can see that the comment I made yesterday might be confusing. I think that the redundancy will become clearer once the terms are collected together under a "transcription factor term".

@dosumis
Copy link
Contributor

dosumis commented Oct 11, 2017

I understand the redundancy issue. Please see the Google doc on TF refactoring proposal linked above. If I understand correctly, the current structure is an attempt to bake into the ontology the strict GO rules about evidence and the limited means we have of recording it (one evidence code and paper per statement). The move to GOCAM makes hybrid terms (DNA binding + some specific protein binding target) more viable, if you can come up with good names for the terms. Making it easier to add multiple evidence to classical go annotations would help too.

@pgaudet
Copy link
Contributor

pgaudet commented Oct 11, 2017

Hi all,

I understand the concerns - however the annotation inconsistencies are so high in the area that we really need to do something to address it.

We'll send a proposal shortly.

Pascale

@pgaudet
Copy link
Contributor

pgaudet commented Oct 19, 2017

First part of the proposal: the transcription regulator terms will all be grouped under a general term, 'transcription regulator activity'

+id: GO:0140110
+name: transcription regulator activity
+namespace: molecular_function
+def: "A molecular function that controls the rate, timing and/or magnitude of transcription of genetic information. The function of transcriptional regulators is to modulate gene expression at the transcription step so that they are expressed in the right cell at the right time and in the right amount throughout the life of the cell and the organism.." [GOC:pg, Wikipedia:Transcription_factor]
+is_a: GO:0003674 ! molecular_function
+created_by: pg
+creation_date: 2017-10-18T07:05:44Z

@dosumis
Copy link
Contributor

dosumis commented Oct 19, 2017

This sounds too broad. Strictly this could apply to MAPK or a TGF-beta receptor (they are both molecular functions and both (indirectly) regulate transcription. The earlier def seems better as specified mechanism:

"by binding to a specific DNA sequence or to other regulatory protein factors."

although to be sufficiently tight this should mention what 'regulatory protein factors' count.

The other way to do this is by distinguishing direct regulation of transcription from indirect.

@pgaudet
Copy link
Contributor

pgaudet commented Oct 19, 2017

Hi @dosumis
The bit about "other regulatory protein factors" is a bit vague as well.

How about: "by binding to a specific regulatory sequence or the transcriptional machinery." ?
(we also need to include factors that bind to the RNA).

(In any case this is a 'do not annotate' term, so what is represents should also be captured by its children).
Thanks, Pascale

pgaudet added a commit that referenced this issue Jan 15, 2018
added GOC:txnOH-2018cross-reference for #13588
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

7 participants