Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New composite pipeline #3129

Merged
merged 20 commits into from
Dec 1, 2023
Merged

New composite pipeline #3129

merged 20 commits into from
Dec 1, 2023

Conversation

gouttegd
Copy link
Collaborator

This is an overhaul of the pipeline that creates the composite-* products, most importantly the composite-metazoan product.

  • Any use of owltools is replaced by robot, when necessary using a Uberon-specific plugin. This allows the pipeline to work with some OWL constructs introduced in CL and FBbt over the past two years, that OWLTools was not able to handle.
  • The src/ontology/bridge/collected-*.owl files, which were used to define the contents of each composite product (through import declarations) are not used anymore, and are removed. Instead, Each product is solely defined by pre-requisites listed in uberon.Makefile.
  • Some composite products are retired (see this comment in Fix composite-metazoan pipeline  #2588 for which products are retired and why).
  • The immediate precursor to composite-metazoan, collected-metazoan, is turned into an official release product.
  • All the foreign ontologies that end up in one or several of the composite products are treated as “local imports” and are committed to the repository. This was already the case for most of them, this PR generalises this behaviour for all of them.

Our local copy of CEPH contains a disjointUnionOf axiom on
UBERON:0001062. That axiom causes problems when we merge CEPH with other
ssAOs to create composite-metazoan --- there is a reason why we remove
all disjointness axioms from Uberon when creating composite-metazoan:
many ssAOs cannot cope with such disjointness axioms.
Bracket the entire LOCAL IMPORTS section with a test on IMP=true, to
skip the entire section when we run under IMP=false without having to
pollute each rule with a shell conditional.
Do not systematically refresh the Allen ontologies (DBA, HDBA, etc.)
whenever we are building composite-metazoan. Instead, treat them as the
"local imports": keep a (committed) copy in the imports directory, and
only use these local copies wherever we need them. Refreshing the local
copies requires to run under IMP=true.
We apply to CL, SSSO (the species-specific stages ontology), and ZFA the
same logic as already applied to the Allen ontologies: we placed them
into the imports directory under a name prefixed with "local-", commit
them here, and use only the local copies wherever needed.
The custom rules to download mirror (those that are not managed by the
ODK) erroneously and constantly consider that any freshly downloaded
mirror is different from any previously available mirror, because the
previously available is overwritten in the process.

This originates from a known bug in the ODK which has since been fixed.
We apply the same fix here.
We are currently using the "species-specific stages ontology" (SSSO)
by cloning its GitHub repository. There is no reason to treat it
differently from any other foreign ontology though, so we make a mirror
out of it to start treating it more "normally".
The "local-" imports (those used to build the composite-* products) do
not actually need much special treatment, certainly not as much as they
used to need.

Here, the only treatment we constantly apply to all the local imports is
to strip axioms that are about terms outside of the foreign ontology's
namespace (in effect, making the import a "base"). A handful of
ontologies still need some extra care to translate old-style properties
into their modern RO equivalents. We do that using `robot rename`
instead of `owltools --rename-entity`.

We do *not* strip disjointness axioms at this step. This will be done
later, at the beginning on the composite pipeline.
Our use of EMAPA in the composite pipeline requires three different
treatments:

a) replacing old-style properties emapa#part_of. emapa#starts_at, and
emapa#ends_at by their RO equivalents;
b) replacing TS_?? stage identifiers by the equivalent terms in MmusDv;
c) merging with MmusDv.

Steps b) and c) used to be performed at the mirroring step, by
converting the mirror to OBO format and then hacking the OBO version.

Here, we perform step b) when creating the "local-emapa.owl", and we
remove step c) as we will instead merge MmusDv explicitly when creating
composite-metazoan instead of bundling it with EMAPA. This removes all
need for hacking the EMAPA mirror.
Most local imports used to be stored in functional syntax, so we make
sure they remain in that format. It is slightly easier read to read than
RDF/XML (OK, that's debatable) and makes for smaller files.
We treat MmusDv and HsapDv as we have already treated the combined
ontology of life stages ("SSSO"): by mirroring them from their GitHub
repository and making a "local import" out of the mirror, instead of
cloning the entire upstream repo in TMPDIR.
Add a new Make target called "all_local_imports". Similarly to the ODK
standard "all_imports", it allows to refresh in one step all the
non-ODK-managed imports.

That target is called by the 'refresh-external-resources' target, so
that local imports can be refreshed at the same time as the normal
imports and the bridges.
EMAPA has a few more old-style properties that need to be replaced with
their RO counterparts.
Change the composite pipeline completely to:

a) use ROBOT only (with a Uberon-specific plugin), not OWLTOOLs;
b) have all the components that make up a composite product listed
directly in the Makefile, as direct prerequisites for the product.
For each composite- product, add a -hdr file containing basic
annotations (title and description) to be used as the first ontology to
be merged when assembling the product. This has the effect of:

a) ensuring the composite product has a descriptive title and
description that match its actual contents;
b) remove all the Uberon-specific top-level ontology annotations.
The collected-metazoan.owl ontology is useful in its own right, because
it still contains all the original terms from the various
species-specific ontologies, before they are "collapsed" into their
taxon-neutral Uberon equivalents by the composite pipeline.

So we make it into an officially released product by:
* stripping disjointness axioms from it (so that the ontology is at
  least nominally consistent;
* annotating it with a version IRI as for all other released products.
The bridge/collected-%.owl files are no longer used to build the
collected ontologies, since they are entirely defined within the
Makefile. They no longer serve any purpose and can be removed. Same for
the catalog file in the bridge directory, which was here solely so that
the import declarations in the collected files could be resolved.
We no longer need to clone the developmental stages ontology repository
anymore.
Now that collected-metazoan is a release artefact, it can end up in the
top-level directory, where it should be ignored by Git.
Copy link
Contributor

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work. I have left a few comments, and none of them are of any real consequence. If you feel confident after reading them, I will approve!

src/ontology/catalog-v001.xml Show resolved Hide resolved
src/ontology/catalog-v001.xml Show resolved Hide resolved
src/ontology/imports/map-mouse-stages.tsv Show resolved Hide resolved
src/ontology/uberon.Makefile Show resolved Hide resolved
src/ontology/uberon.Makefile Show resolved Hide resolved
src/ontology/uberon.Makefile Show resolved Hide resolved
src/ontology/uberon.Makefile Show resolved Hide resolved
src/ontology/uberon.Makefile Show resolved Hide resolved
Copy link
Contributor

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to ignore the two remaining comments! Excellent work, and thanks for you patience responding :P

@gouttegd
Copy link
Collaborator Author

gouttegd commented Dec 1, 2023

Note that there are still things that can be improved, including:

  • The “merge species” step leaves behind a bunch of class declaration axioms for the taxon-specific classes that have been “merged“ into their taxon-neutral equivalent. That is, when (for example) EMAPA:17443 is replaced by UBERON:00005695 and (BFO:0000050 some NCBITaxon:10090), we still have a class declaration axiom for EMAPA:17443:
<!-- http://purl.obolibrary.org/obo/EMAPA_17443 -->
<owl:Class rdf:about="http://purl.obolibrary.org/obo/EMAPA_17443">
    <rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
</owl:Class>
  • General class axioms are not translated. If we have, say, a GCA that says
FBbt:X and (BFO:0000050 some FBbt:Y) SubClassOf BFO:0000050 some FBbt:Z

and upon merging FBbt:X is merged into UBERON:X and (BFO:0000050 some NCBITaxon:7227), the GCA will still refer to FBbt:X.

  • We have a bunch of FBgn pseudo-terms from the FlyBase ontologies, which in my opinion don’t belong to composite-metazoan.

Those problems are not new – they already existed in previous versions of composite-metazoan. They should be easier to fix now, but I’d rather do that as a separate PR (possibly several separate PRs). My aim for this PR was strictly to replace the existing pipeline by one that is easier to maintain, but that is otherwise equivalent (including in its bugs! :P ).

@gouttegd gouttegd merged commit 3be70c6 into master Dec 1, 2023
1 check passed
@gouttegd gouttegd deleted the new-composite-pipeline branch December 1, 2023 17:44
@anitacaron
Copy link
Collaborator

Just to let you know, I've read all the comments. I'll create an issue to remove CARO from the pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants