Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix composite-metazoan pipeline #2588

Closed
shawntanzk opened this issue Aug 8, 2022 · 12 comments
Closed

Fix composite-metazoan pipeline #2588

shawntanzk opened this issue Aug 8, 2022 · 12 comments
Assignees

Comments

@shawntanzk
Copy link
Collaborator

shawntanzk commented Aug 8, 2022

  1. Look at Recent changes with CL terms in composite-metazoan.owl? #2584 -> how did CL relationship go missing? (CL variant that is imported in this process that is probably not the right one? or CL is not imported?)
  2. Multiple labels for one term #1466
  3. ?
@shawntanzk
Copy link
Collaborator Author

shawntanzk commented Oct 17, 2022

pipeline to build composite ontologies are dependent on owltools that do not exist in robot - therefore if we keep this, we cannot get rid of owltools (unless we change the way we build composite ontologies)

  • might try to port the features to ROBOT

2 features needed, most important
merge species ontologies -> removes taxon specific and replace with the equivalent (eg neutral structure and part of taxon) - (avoids ragged lettuce)

@gouttegd
Copy link
Collaborator

FYI, owltools is used in two steps of the pipeline that builds the composite-*.owl products.

The first step is to build $TMPDIR/merged-composite-*.owl. This is a pretty straightforward merging step where owltools could probably be directly replaced with robot merge without any trouble.

The second step is to build $TMPDIR/unreasoned-composite-*.owl. That step uses two features of owltools that, AFAIK, have no equivalents in ROBOT or any of our other tools.

The first feature is --merge-species-ontology. It exploits the “bridging axioms” in the uberon-bridge-to-*.owl and cl-bridge-to-*.owl bridge files to generate a “composite” ontology in which the taxon-specific classes are replaced by equivalent class expressions, in a process the owltools developers call unfolding (for more details, see code and comments in owltools’ owltools.mooncat.SpeciesMergeUtil class).

The second feature is --merge-equivalence-sets. It allows to merge classes inferred to be equivalent when merging ontologies, and uses a scoring system to decide which classes are kept and which classes are dropped (see owltools’ owltools.mooncat.EquivalenceSetMergeUtil class).

For example, if owltools --merge-equivalence-sets is called with -s UBERON 10 -s CL 9, then if we merge Uberon and CL and a Uberon class is found to be equivalent to a CL term, then in the resulting merged ontology only the Uberon class will be kept, because we gave Uberon a higher score (10) than CL (9).

@gouttegd
Copy link
Collaborator

To illustrate the effects of --merge-species-ontology:
SpeciesMerge

On the left is a merge of Uberon with FBbt, before calling owltools --merge-species-ontology. There are two ovary classes: one from Uberon and one from FBbt, the latter being a subclass of the former thanks to the bridging axiom in uberon-bridge-to-fbbt.owl (which says that FBbt ovary is a Uberon ovary that is part of the Dmel taxon). All other Drosophila-specific terms are below the FBbt class.

On the right, the same merged ontology after calling owltools --merge-species-ontology. Note that there is now only one ovary class (the Uberon one), and that all Drosophila-specific terms have now been “rattached” to that taxon-neutral class.

@gouttegd
Copy link
Collaborator

From the Uberon call of October 24th, 2022: The whole point of the composite-* products is precisely to have the kind of “unfolded” structure that owltools --merge-species-ontology generates, so this should not be changed.

-> Keep the owltools-dependent pipeline for now, and start thinking whether the --merge-species-ontology feature should be ported/re-implemented elsewhere.

@aschroed
Copy link

Is there anywhere that a recent version of composite-metazoan.owl is available for download and use?

@anitacaron
Copy link
Collaborator

Hi @aschroed, the composite-metazoan.owl is now available in the release assets.

@anitacaron anitacaron added the tech label Jan 9, 2023
@gouttegd
Copy link
Collaborator

gouttegd commented Feb 6, 2023

I propose to use this ticket to centralise all discussions about the overhauling of the composite-metazoan (hereafter CM) pipeline.

Here’s a list of all the issues affecting this product. I strongly suspect most if not all are directly caused by the complexity of the CM pipeline (and the lack of love shown to that pipeline over the last few years), and therefore most if not all of them could be fixed by streamlining the pipeline (one can always hope, at least).

Regarding #1952, the unsats may be caused by the actual contents of the merged ontologies and as such may not be fixed by a new pipeline, but a cleaner, faster pipeline may make testing for unsats easier and help detecting and fixing those in the future.

@gouttegd
Copy link
Collaborator

gouttegd commented Feb 6, 2023

Here’s a quick overview of the current CM pipeline.

Step 1: $TMPDIR/merged-composite-metazoan.owl

This first step is merely a big merge of a bunch of files:

  • $TMPDIR/ext-weak.owl (mostly Uberon itself, minus some disjointness axioms and equivalent-to-nothing axioms);
  • all the cross-ontology bridges (e.g. uberon-to-fbbt-bridge.owl, etc.);
  • imports/local-*.owl (where * = fbbt, wbbt, zfa, etc.)
    • these files are generated from the corresponding mirrors, in what seems to be a manual step that is not part of any pipeline (they were last updated in 2021);
    • according to a comment in the Makefile, a local-something file is a version of the something mirror “with all the wrangling needed to massage it it into the right form”;
    • whether that “wrangling” is still needed nowadays is unclear (a later comment mentions that “many ontologies have fixed their legacy properties”, hinting that this step may no longer be needed;
    • some local-* files are generated by dedicated custom rules (again, whether they are still really needed or not is unclear);
  • $TMPDIR/allen-*.obo (a bunch of files generated from JSON files downloaded from api.brain-map.org)
  • uberon.owl (this seems redundant with ext-weak.owl to me);
  • all the bridges/collected-*.owl files
    • those are small files that merely bind together several ontologies via imports declaration;
    • for example, collected-mouse.owl binds uberon and emapa by importing uberon.owl, emapa.owl, and the corresponding bridges uberon-bridge-to-emapa.owl and cl-bridge-to-emapa.owl.

Step 2: $TMPDIR/stripped-composite-metazoan.owl

This file is built from the previous one (merged-composite-metazoan.owl) through a single ROBOT-based step to remove some unsatisfiable classes that are listed in the unsats.txt file (last updated in 2021).

Step 3: $TMPDIR/unreasoned-composite-metazoan.owl

This file is produced from the previous one by a OWLTOOLS-based reasoning step. That step is critically dependent on OWLTOOLS as it requires two commands that have no equivalents AFAIK in other tools (e.g. ROBOT):

  • --merge-species-ontology, which performs “unfolding” as described in OWLTOOLS’s source code;
  • --merge-equivalence-sets, which merges equivalent classes with control of which ontology “wins” when a class in one ontology is equivalent to a class in another ontology (see OWLTOOLS’s source code).

Step 4: composite-metazoan.owl

Lastly, the final composite-metazoan.owl product is generated from unreasoned-composite-metazoan.owl by what seems to be a fairly standard ROBOT-based reasoning step.

@github-actions
Copy link

github-actions bot commented Aug 6, 2023

This issue has not seen any activity in the past 6 months; it will be closed automatically one year from now if no action is taken.

@github-actions github-actions bot added the Stale label Aug 6, 2023
@gouttegd gouttegd removed the Stale label Aug 6, 2023
@gouttegd
Copy link
Collaborator

For reference and discussion, here are some of the changes I plan to introduce to fix/overhaul the composite pipeline.

Make sure CL is merged in every composite

Currently, CL is not explicitly merged in the composite ontologies – it is only merged through Uberon’s CL import. As a result, composite-metazoan only contains about 50% of CL’s terms. Several projects are using composite-metazoan to annotate single-cell RNA-sequencing data, so it should include all of CL. All the CL bridges with taxon-specific ontologies are already generated in Uberon, so excluding CL from composite-metazoan makes little sense.

Use the Uberon.Makefile as sole source of truth for the composition of the collected ontologies

With the current pipeline, the various components that make up a composite ontology (including composite-metazoan) are decided in at least three different places:

  1. in the src/ontology/bridge/collected-NAME.owl file, which contains import declarations for each component, e.g.:
    <owl:imports rdf:resource="http://purl.obolibrary.org/obo/uberon/bridge/collected-vertebrate.owl"/>
    <owl:imports rdf:resource="http://purl.obolibrary.org/obo/uberon/bridge/uberon-bridge-to-fbbt.owl"/>
    <owl:imports rdf:resource="http://purl.obolibrary.org/obo/uberon/bridge/uberon-bridge-to-wbbt.owl"/>
    <owl:imports rdf:resource="http://purl.obolibrary.org/obo/uberon/bridge/uberon-bridge-to-fbdv.owl"/>
    <owl:imports rdf:resource="http://purl.obolibrary.org/obo/uberon/bridge/uberon-bridge-to-wbls.owl"/>
    <owl:imports rdf:resource="http://purl.obolibrary.org/obo/uberon/bridge/cl-bridge-to-fbbt.owl"/>
    <owl:imports rdf:resource="http://purl.obolibrary.org/obo/uberon/bridge/cl-bridge-to-wbbt.owl"/>
    <owl:imports rdf:resource="http://purl.obolibrary.org/obo/uberon/ssso-merged-uberon.owl"/>
    <owl:imports rdf:resource="http://purl.obolibrary.org/obo/fbbt.owl"/>
    <owl:imports rdf:resource="http://purl.obolibrary.org/obo/fbdv.owl"/>
    <owl:imports rdf:resource="http://purl.obolibrary.org/obo/wbbt.owl"/>
    <owl:imports rdf:resource="http://purl.obolibrary.org/obo/wbls.owl"/>
    <owl:imports rdf:resource="http://purl.obolibrary.org/obo/ceph.owl"/>
    <owl:imports rdf:resource="http://purl.obolibrary.org/obo/cteno.owl"/>
    <owl:imports rdf:resource="http://purl.obolibrary.org/obo/poro.owl"/>
  1. in the src/ontology/catalog-v001.xml, which redirects the ontology IRIs to local resources, e.g.:
      <uri name="http://purl.obolibrary.org/obo/uberon/ssso-merged-uberon.owl" uri="tmp/developmental-stage-ontologies/src/ssso-merged.obo"/>
      <uri name="http://purl.obolibrary.org/obo/uberon/bridge/collected-vertebrate.owl" uri="bridge/collected-vertebrate.owl"/>
      <uri name="http://purl.obolibrary.org/obo/uberon/bridge/collected-tetrapod.owl" uri="bridge/collected-tetrapod.owl"/>
      <uri name="http://purl.obolibrary.org/obo/uberon/bridge/collected-teleost.owl" uri="bridge/collected-teleost.owl"/>
      <uri name="http://purl.obolibrary.org/obo/uberon/bridge/collected-amniote.owl" uri="bridge/collected-amniote.owl"/>
      <uri name="http://purl.obolibrary.org/obo/uberon/bridge/collected-mammal.owl" uri="bridge/collected-mammal.owl"/>
  1. in the src/ontology/uberon.Makefile, which does amusing things like this:
owltools --catalog-xml catalog-v001.xml \
         --map-ontology-iri $URIBASE/uberon.owl $TMPDIR/ext-weak.owl \
         --map-ontology-iri $URIBASE/fma.owl $COMPONENTSDIR/null.owl \
         --map-ontology-iri $URIBASE/uberon/bridge/uberon-bridge-to-fma.owl $COMPONENTSDIR/null.owl

All of this makes it needlessly complicated to figure out what exactly goes in each composite product.

Therefore, I plan to get rid of both the import declarations in the src/ontology/bridge/collected-*.owl files and the use of the catalog. All the components that make up a given composite ontology will be listed in uberon.Makefile as pre-requisites of that composite ontology, and will be “manually” merged together by robot merge. This will make for a somewhat long section in the Makefile, but at least everything will be at the same place.

Publish collected-metazoan (at least) as a release artefact

collected-metazoan is an intermediate product towards composite-metazoan. It’s basically composite-metazoan before any reasoning operation is performed. I think it’s a useful product in its own right, especially for annotation purposes, so it should be published as a release artefact.

Commit copies of all external resources, and only refresh them explicitly

Several components of composite-metazoan are automatically downloaded every time the pipeline is run. I want to generalise the approach already used in the bridge pipeline, in which all external resources are committed to the repository as local copies, and only refreshed when 1) we run the pipeline under IMP=true (which we do not as part of the QC or release process) or 2) we explicitly request a refresh (make refresh-external-resources).

@gouttegd
Copy link
Collaborator

gouttegd commented Nov 29, 2023

I also plan to deprecate the following composite- products:

  • composite-mouse-plus-stage: this relies on EMAP, which is deprecated and whose bridge has already been retired;
  • composite-vhog: VHOG is deprecated and its bridge has already been retired;
  • composite-teleost: this is in effect the same product as composite-zebrafish;
  • composite-fma: we have been explicitly excluding FMA from composite-metazoan.owl for a while, so I doubt there is an interest specifically for this one.

There are also two other products that are “larger” than composite-metazoan (i.e. they are made of composite-metazoan plus some additional ontologies):

  • composite-opisthokont (= collected-metazoan + FAO);
  • composite-eukaryote (= collected-opisthokont + PO + DDANAT)

The mere fact that we have always only ever talked about composite-metazoan and not those two, not to mention the fact that they have never been published as proper release artefact, strongly suggests that there is little interest in them and I plan to retire them as well.

@gouttegd
Copy link
Collaborator

gouttegd commented Dec 1, 2023

Fixed by #3129

@gouttegd gouttegd closed this as completed Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants