Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patterns for axiomatisation of transcription factor activities #23

Closed
dosumis opened this issue Nov 8, 2016 · 14 comments
Closed

Patterns for axiomatisation of transcription factor activities #23

dosumis opened this issue Nov 8, 2016 · 14 comments

Comments

@dosumis
Copy link
Contributor

dosumis commented Nov 8, 2016

From @dosumis on October 28, 2015 13:46

From @dosumis on August 28, 2015 9:56

We currently have patterns like this:

  • molecular_function that ('has part' some 'nucleic acid binding') and ('part of' some 'regulation of transcription, DNA-templated')
  • molecular_function and ('part of' some 'regulation of nucleic acid-templated transcription') and ('has part' some 'RNA polymerase core enzyme binding')

But these patterns are not safe. It is not necessarily the case that being part of a regulatory process entails being a regulator of the regulated process. This pattern probably arose from implementation of the general MF part_of BP pattern. In this case, it would be better to directly assert MF regulation of BP. But which relation to use?

Perhaps directly activates:

Current def: "p directly activates q if and only if p is immediately upstream of q and p is the realization of a function to increase the rate or activity of q."

But see notes from 2015-07-23 eds meeting on defining directly postively regulates.

CC @cmungall @ukemi

Copied from original issue: geneontology/go-ontology#12033

Copied from original issue: geneontology/design_patterns#2

@dosumis
Copy link
Contributor Author

dosumis commented Nov 8, 2016

From @cmungall on August 28, 2015 16:16

I think 'directly activates' is correct here, even if we have a more general direct regulation parent that is not restricted to activities

@dosumis
Copy link
Contributor Author

dosumis commented Nov 8, 2016

From @ukemi on August 28, 2015 17:5

It fits the definition of directly activates.

@thomaspd
Copy link

Main step to take is a top level restructuring. It removes the top level distinction between protein-binding vs DNA binding, as the current protein binding class includes both DNA binding TFs and nonDNA binding "cofactors". The main distinction is between DNA binding (TF) and non-DNA binding (T co-F), followed by effect (activators/coactivators vs repressors/corepressors). I've kept any older classes that would remove more than a few existing annotations, just to prevent any issues with annotations.
screen shot 2017-02-20 at 11 25 46 am
screen shot 2017-02-20 at 11 26 07 am

top level class should be transcription regulator activity
is_a binding that directly regulates transcription

transcription factor (GO:0003700):
is_a (or has_part?) sequence-specific DNA binding AND directly regulates transcription

transcription cofactor is a (or has_part?) protein binding, AND NOT sequence-specific DNA binding, that directly regulates transcription

transcription activator is a transcription factor that directly positively regulates transcription

transcription coactivator is a transcription cofactor…

@dosumis
Copy link
Contributor Author

dosumis commented Feb 28, 2017

The ontology currently uses this pattern:

molecular_function that ('has part' some 'nucleic acid binding') and ('part of' some 'regulation of transcription, DNA-templated')

But in Noctua/LEGO curators are using directly_activates rather than part of:

image

http://noctua.berkeleybop.org/editor/graph/gomodel:583f430000000041

directly_positively_regulates may be justified here (based on direct interaction between the TF and other parts of transcriptional apparatus).

We need to decide on one of these two patterns (or reconcile the two with reasoning if that is possible).

@dosumis
Copy link
Contributor Author

dosumis commented May 11, 2017

How to formalise this:

image

?

Differentia:

1. Regulatory effect on transcription - record via link to BP. Two possible patterns:

(a) part_of some 'regulation of transcription, DNA templated' # (use +ve/-ve R terms for transcriptional activator/repressor terms)

OR

(b) directly_regulates some 'transcription, DNA templated' # (use directly_(positively/negatively)_regulates edges for transcriptional activator/repressor terms

One of the aims of MF design patterns for compound functions such as this one is to maximise useful causal inference chains in LEGO. In this respect, pattern (b) is better. It doesn't obscure regulation of the whole process via a part_of link. However, ideally we'd still get inferred annotation to the relevant regulation of transcription BP term. solutions to this:

  • Add intermediate MF terms, e.g. ‘transcription_regulator_activity’ EquivalentTo: molecular_function that directly_regulates some ‘transcription’; SubClassOf part_of some ‘regulation of transcription’*)
  • GCIs: (molecular_function that directly_regulates some transcription SubClassOf part_of some ‘regulation of transcription’*) etc

This is a general issue - covered in #49

2. RNA polymerase type:

formalise via transcription type.

image

3. DNA target bound: has_necessary_component* {transcription regulation region DNA binding}

* see #25 (comment)

Some cleanup or target terms needed:

image

image

These can be defined using logical defs that use SO terms as differentia. The promoter_element hierarchy might cover what's needed.

See also geneontology/go-ontology#13002

4. direct regulation e.g. by binding ligand or metal ion binding

  • Use has_positive_regulatory_component* some 'X binding'

* see #25 (comment)

@dosumis
Copy link
Contributor Author

dosumis commented May 24, 2017

Draft pattern here:

https://github.com/geneontology/molecular_function_refactoring/blob/master/patterns/transcription_factor_DNA_binding.yaml

Still need to sort out naming.

Some notes:

  • High level genus possibly a little confusing to editors/curators. Consider something more specific?
  • Would be good to be able to specify relation as var, but this is currently missing from DOSDP spec (a deliberate choice, but with hindsight seems unwise)
  • With the new, simplified pattern for compound MFs, instance graphs are starting to look like simple copies of the class level pattern without quantifiers. It should be possible to derive these automatically - reserving instance graphs for more complex cases.

@dosumis
Copy link
Contributor Author

dosumis commented May 24, 2017

CC @thomaspd

@dosumis
Copy link
Contributor Author

dosumis commented May 26, 2017

Paul:

If we have a general 'necessary part of':

If transcription of gene X requires the activity of TFs A, B and C, we could say each activity is a 'necessary part of' transcription of gene X.

By analogy to necessary_component_of this would be a subproperty of a causal relation - so not => loss of causal chain.

@dosumis
Copy link
Contributor Author

dosumis commented Jul 14, 2017

Notes on name and definition changes.

The original refactoring/implementation of detail TF terms in the GO relied assumptions about curation that may not longer apply in the new era of GO-cam curation. Classic GO curation is very granular, with small amounts of evidence - often single experiments (?) - being used for annotation. It is rare for a single experiment to show that a transcription factor acts to regulate transcription via DNA binding and via protein binding. To cope with this, two branches were added to the GO - one covering activities that (directly) regulate transcription via protein binding and another covering activities that directly regulate transcription via DNA binding. A broad interpretation of the phrase 'transcription factor' - covering cofactors and DNA binding transcription factors - allowed both branches to include this phrase in their labels. As DNA binding transcription also bind proteins as part of their regulation of transcription*, annotation of these activities relied on co-annotation with appropriate terms from each branch (although a small number of terms appear under both branches).

With GO-CAM modeling, we can much more easily combine different pieces of evidence to build a model of gene product activity, so these considerations no longer apply. In GO-CAM, and as part of ongoing work on refactoring molecular function, we aim to model the compound nature of molecular functions as far as possible. We therefore need new design patterns for (DNA binding) transcription factor activity that allow us to capture its compound nature (DNA binding and protein binding components and their relationship to regulation of transcription).

The original refactoring deliberately omitted a general term for transcription regulator activity. There were two reasons for this:

  1. A very restricted view of what terms count as molecular functions: that they require a clear specification of mechanism**. This seems unwarranted, as we have large numbers of MF terms to which this does not apply including molecular_function itself and all the regulator activity terms. This approach prevents us from using one of the great strengths of ontologies - that when we don't know details we can annotate to a more general class. This approach clashes with the preferred definition of molecular function used in the molecular function refactoring that the current work is part of: A process that can be carried out by a single gene product or complex.
  2. A concern that any 'transcription regulator activity' class would be redundant with the biological process: regulation of transcription. This concern seems unwarranted. The biological process term encompasses processes that are far upstream of transcription factor activity and processes that encompass multiple molecular functions. Signal transduction pathways that regulate transcription are an example of both.

This refactoring:

  1. Uses a tighter definition of transcription factor limiting it to activities that regulate transcription by binding DNA.
  2. Adapts an existing term to make a general transcription regulator class
  3. Reflects the compound nature of (DNA binding) TF activity

* I would be very interested to hear of any known exceptions to this.
** This is my understanding.

New proposed labels & textual definitions

(Proposed name changes are also discussed in #5 and in Paul's comment upthread). Template-based textual definitions for TFs using the names of component activities are proving hard to specify, so a free-er approach is taken here.


transcription factor activity, protein binding: "Interacting selectively and non-covalently with any protein or protein complex (a complex of two or more proteins that may include other nonprotein molecules), in order to modulate transcription. A protein binding transcription factor may or may not also interact with the template nucleic acid (either DNA or RNA) as well."

-->

transcription regulator activity: "Direct regulation of DNA-templated transcription via selective, non-covalent interaction with elements of the transcription initiation complex or associated proteins. Associated proteins include any protein capable of interacting, directly or indirectly with the transcription initiation complex."
comment: This term is a general class that encompasses (DNA binding) transcription factors as well as cofactors.

Questions:

  • Should we add 'direct' to the name to more clearly distinguish from upstream regulators?
  • Anticipated objection: we don’t actually have TI complex in GO. Perhaps 'basal transcription machinery would be a better term?

transcription cofactor activity: "Interacting selectively and non-covalently with a regulatory transcription factor and also with the basal transcription machinery in order to modulate transcription. Cofactors generally do not bind the template nucleic acid, but rather mediate protein-protein interactions between regulatory transcription factors and the basal transcription machinery."

-->

transcription cofactor activity: "Interacting selectively and non-covalently with a regulatory transcription factor and also with the basal transcription machinery in order to modulate transcription. Cofactor activity does no involve nucleic acid binding, but rather mediates protein-protein interactions between regulatory transcription factors and the basal transcription machinery."
is_a: transcription regulator activity


transcription factor activity, sequence-specific DNA binding: "Interacting selectively and non-covalently with a specific DNA sequence in order to modulate transcription. The transcription
factor may or may not also interact selectively with a protein or macromolecular complex."

-->

transcription factor activity: "Direct regulation of DNA-templated transcription via sequence-specific DNA binding and selective, non-covalent interaction with elements of the transcription initiation complex or associated proteins. Associated proteins include any protein capable of interacting, directly or indirectly with the transcription initiation complex."
is_a: direct transcription regulator activity

Notes:

  • The new labels follows a more restricted but use of the term Transcription Factor than the original term by only including DNA binding factors, but this follows a large proportion (almost certainly the majority) of community usage.

  • Unlike in the original branch, this mandates some protein interaction. I've yet to come across a case of a TF that doesn't bind

Questions:

  • Is this clause about interaction with with TI complex too strong? Anticipated objection is that we don’t actually have TI complex in GO. Perhaps 'basal transcription machinery would be a better term?

RNA polymerase II transcription factor activity, sequence-specific DNA binding: Interacting selectively and non-covalently with a specific DNA sequence in order to modulate transcription by RNA polymerase II. The transcription factor may or may not also interact selectively with a protein or macromolecular complex."

-->

RNA polymerase II transcription factor activity: "Direct regulation of transcription from an RNA polymerase II promoter via sequence-specific DNA binding and selective, non-covalent interaction with elements of the transcription initiation complex or associated proteins. Associated proteins include any protein capable of interacting, directly or indirectly with the transcription initiation complex."
is_a: transcription factor activity


transcription factor activity, sequence-specific DNA binding transcription factor recruiting:
"The function of binding to a specific DNA sequence and recruiting another transcription factor to the DNA in order to modulate transcription. The recruited factor may bind DNA directly, or may be colocalized via protein-protein interactions."

-->

transcription factor activity, transcription regulator recruiting: "Direct regulation of DNA-templated transcription via sequence-specific DNA binding and recruitment of a transcription regulator (transcription factor or cofactor) via direct, non-covalent interaction with the regulator. Recruitment here means that the activity in question is required to bring the transcription regulator to the transcription initiation complex or associated proteins."
is_a: transcription factor activity

Questions:

  • Is the definition of recruitment OK here?

transcription factor activity, sequence-specific DNA binding transcription factor recruiting: "The function of binding to a specific DNA sequence and recruiting another transcription factor to the DNA in order to modulate transcription. The recruited factor may bind DNA directly, or may be colocalized via protein-protein interactions."

-->

transcription factor activity, transcription factor recruiting: "Direct regulation of DNA-templated transcription via sequence-specific DNA binding and binding of another transcription factor leading to its recruitment to an binding of a DNA regulatory region.


CC @astridla, @RLovering, @thomaspd - Comments please.

@dosumis
Copy link
Contributor Author

dosumis commented Jul 14, 2017

Notes on problematic terms

RNA polymerase II transcription factor activity, sequence-specific transcription regulatory
region DNA binding
: "Interacting selectively and non-covalently with a specific sequence of DNA that is part of a regulatory region that controls transcription of that section of the DNA by RNA polymerase II and recruiting another transcription factor to the DNA in order to modulate transcription by RNAP II."

  1. Needs name to more clearly distinguish from:
    RNA polymerase II transcription factor activity, sequence-specific DNA binding: "Interacting selectively and non-covalently with a specific DNA sequence in order to modulate transcription by RNA polymerase II. The transcription factor may or may not also interact selectively with a protein or macromolecular complex."

  2. Why does it live here:
    image
    but not under 'RNA polymerase II transcription factor activity, sequence-specific DNA binding'

@astridla
Copy link

astridla commented Jul 14, 2017 via email

@astridla
Copy link

astridla commented Jul 14, 2017 via email

@dosumis
Copy link
Contributor Author

dosumis commented Jul 17, 2017

Notes on discussion with Astrid:

Tie this branch down to: "direct regulation of transcription initiation"

  1. This = regulation of 'transcription because regulation of transcription initiation' is_a regulation of transcription (or should be). It would also be better to rename the general classes to '(direct) transcription initiation regulator activity' - if this is not too wordy.

  2. Directness: "... via selective, non-covalent interaction with elements of the transcription initiation complex or associated proteins. Associated proteins include any protein capable of interacting, directly or indirectly with the transcription initiation complex."

  3. What is recruiting?
    Use Astrid's figure (TBA).

@pgaudet
Copy link
Contributor

pgaudet commented Mar 1, 2019

This issue was moved to geneontology/go-ontology#16970

@pgaudet pgaudet closed this as completed Mar 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants