-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New composite pipeline #3129
New composite pipeline #3129
Conversation
Our local copy of CEPH contains a disjointUnionOf axiom on UBERON:0001062. That axiom causes problems when we merge CEPH with other ssAOs to create composite-metazoan --- there is a reason why we remove all disjointness axioms from Uberon when creating composite-metazoan: many ssAOs cannot cope with such disjointness axioms.
Bracket the entire LOCAL IMPORTS section with a test on IMP=true, to skip the entire section when we run under IMP=false without having to pollute each rule with a shell conditional.
Do not systematically refresh the Allen ontologies (DBA, HDBA, etc.) whenever we are building composite-metazoan. Instead, treat them as the "local imports": keep a (committed) copy in the imports directory, and only use these local copies wherever we need them. Refreshing the local copies requires to run under IMP=true.
We apply to CL, SSSO (the species-specific stages ontology), and ZFA the same logic as already applied to the Allen ontologies: we placed them into the imports directory under a name prefixed with "local-", commit them here, and use only the local copies wherever needed.
The custom rules to download mirror (those that are not managed by the ODK) erroneously and constantly consider that any freshly downloaded mirror is different from any previously available mirror, because the previously available is overwritten in the process. This originates from a known bug in the ODK which has since been fixed. We apply the same fix here.
We are currently using the "species-specific stages ontology" (SSSO) by cloning its GitHub repository. There is no reason to treat it differently from any other foreign ontology though, so we make a mirror out of it to start treating it more "normally".
The "local-" imports (those used to build the composite-* products) do not actually need much special treatment, certainly not as much as they used to need. Here, the only treatment we constantly apply to all the local imports is to strip axioms that are about terms outside of the foreign ontology's namespace (in effect, making the import a "base"). A handful of ontologies still need some extra care to translate old-style properties into their modern RO equivalents. We do that using `robot rename` instead of `owltools --rename-entity`. We do *not* strip disjointness axioms at this step. This will be done later, at the beginning on the composite pipeline.
Our use of EMAPA in the composite pipeline requires three different treatments: a) replacing old-style properties emapa#part_of. emapa#starts_at, and emapa#ends_at by their RO equivalents; b) replacing TS_?? stage identifiers by the equivalent terms in MmusDv; c) merging with MmusDv. Steps b) and c) used to be performed at the mirroring step, by converting the mirror to OBO format and then hacking the OBO version. Here, we perform step b) when creating the "local-emapa.owl", and we remove step c) as we will instead merge MmusDv explicitly when creating composite-metazoan instead of bundling it with EMAPA. This removes all need for hacking the EMAPA mirror.
Most local imports used to be stored in functional syntax, so we make sure they remain in that format. It is slightly easier read to read than RDF/XML (OK, that's debatable) and makes for smaller files.
We treat MmusDv and HsapDv as we have already treated the combined ontology of life stages ("SSSO"): by mirroring them from their GitHub repository and making a "local import" out of the mirror, instead of cloning the entire upstream repo in TMPDIR.
Add a new Make target called "all_local_imports". Similarly to the ODK standard "all_imports", it allows to refresh in one step all the non-ODK-managed imports. That target is called by the 'refresh-external-resources' target, so that local imports can be refreshed at the same time as the normal imports and the bridges.
EMAPA has a few more old-style properties that need to be replaced with their RO counterparts.
Change the composite pipeline completely to: a) use ROBOT only (with a Uberon-specific plugin), not OWLTOOLs; b) have all the components that make up a composite product listed directly in the Makefile, as direct prerequisites for the product.
For each composite- product, add a -hdr file containing basic annotations (title and description) to be used as the first ontology to be merged when assembling the product. This has the effect of: a) ensuring the composite product has a descriptive title and description that match its actual contents; b) remove all the Uberon-specific top-level ontology annotations.
The collected-metazoan.owl ontology is useful in its own right, because it still contains all the original terms from the various species-specific ontologies, before they are "collapsed" into their taxon-neutral Uberon equivalents by the composite pipeline. So we make it into an officially released product by: * stripping disjointness axioms from it (so that the ontology is at least nominally consistent; * annotating it with a version IRI as for all other released products.
The bridge/collected-%.owl files are no longer used to build the collected ontologies, since they are entirely defined within the Makefile. They no longer serve any purpose and can be removed. Same for the catalog file in the bridge directory, which was here solely so that the import declarations in the collected files could be resolved.
We no longer need to clone the developmental stages ontology repository anymore.
Now that collected-metazoan is a release artefact, it can end up in the top-level directory, where it should be ignored by Git.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic work. I have left a few comments, and none of them are of any real consequence. If you feel confident after reading them, I will approve!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to ignore the two remaining comments! Excellent work, and thanks for you patience responding :P
Note that there are still things that can be improved, including:
<!-- http://purl.obolibrary.org/obo/EMAPA_17443 -->
<owl:Class rdf:about="http://purl.obolibrary.org/obo/EMAPA_17443">
<rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
</owl:Class>
and upon merging
Those problems are not new – they already existed in previous versions of |
Just to let you know, I've read all the comments. I'll create an issue to remove CARO from the pipeline. |
This is an overhaul of the pipeline that creates the
composite-*
products, most importantly thecomposite-metazoan
product.owltools
is replaced byrobot
, when necessary using a Uberon-specific plugin. This allows the pipeline to work with some OWL constructs introduced in CL and FBbt over the past two years, that OWLTools was not able to handle.src/ontology/bridge/collected-*.owl
files, which were used to define the contents of each composite product (through import declarations) are not used anymore, and are removed. Instead, Each product is solely defined by pre-requisites listed inuberon.Makefile
.composite-metazoan
,collected-metazoan
, is turned into an official release product.composite
products are treated as “local imports” and are committed to the repository. This was already the case for most of them, this PR generalises this behaviour for all of them.