Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Views controlled vocabularies proposal #245

Closed
baskaufs opened this issue Sep 18, 2022 · 41 comments
Closed

Views controlled vocabularies proposal #245

baskaufs opened this issue Sep 18, 2022 · 41 comments

Comments

@baskaufs
Copy link
Contributor

baskaufs commented Sep 18, 2022

The Views Controlled Vocabularies Task Group would like to formally submit for ratification two new controlled vocabularies. One controlled vocabulary is intended for use with ac:subjectPart and ac:subjectPartLiteral and the other is intended for use with ac:subjectOrientation and ac:subjectOrientationLiteral.

Since this is a coordinated addition to Audubon Core, the Task Group has followed the procedures for developing vocabulary enhancements as outlined in Section 4 of the TDWG Vocabulary Maintenance Specification (VMS). In particular, the group began its work by creating candidate and final requirements based on use cases submitted by the community. These final requirements are the Feature Report described in Section 4.2.1 of the VMS. With the generous assistance of field testers, the Task Group has published an Implementation Experience Report (see Section 4.2.2 of the VMS) in Biodiversity Information Science and Standards (BISS) at http://doi.org/10.3897/biss.7.94188. As noted in Section 4.2.3 of the VMS, the purpose of these two reports is to facilitate review by the Maintenance Group when deciding whether to advance the proposal to public comment, and as a source of information for the community during the public review.

Documents to become part of Audubon Core

subjectPart controlled vocabulary - contains the normative definitions of the subjectPart concepts that form the vocabulary and the controlled value strings to be used as literal values.

subjectOrientation controlled vocabulary - contains the normative definitions of the subjectOrientation concepts that form the vocabulary and the controlled value strings to be used as literal values.

Ancillary documents (supporting documents not part of the standard)

List of controlled value strings for ac:subjectPartLiteral organized by organism group - intended to guide human users in selecting subjectPart values appropriate for particular organism groups.

JSON-LD serialization of SKOS Collections of subjectPart concepts by organism group - machine-readable metadata intended to be consumed by clients for the purpose of generating pick lists or validating concepts for particular organism groups.

JSON-LD serialization of the SKOS concept scheme for subjectPart - machine-readable metadata to provide multilingual labels, multilingual definitions, controlled value strings, and links to external ontologies for subjectPart concepts.

Metadata for subjectPart terms in tabular form (CSV)

List of controlled value strings for ac:subjectOrientation organized by subject part - intended to guide human users in selecting subjectOrientation values appropriate for particular subject parts.

JSON-LD serialization of SKOS Collections of subjectOrientation concepts by subject part - machine-readable metadata intended to be consumed by clients for the purpose of generating pick lists or validating concepts for particular subject parts.

JSON-LD serialization of the SKOS concept scheme for subjectOrientation - machine-readable metadata to provide multilingual labels, multilingual definitions, controlled value strings, and links to external ontologies for subjectOrientation concepts.

Metadata for subjectOrientation terms in tabular form (CSV)

Reference documents

User guide - provides examples (with screenshots) of implementation of the controlled vocabularies in several ways.

Submitted use cases

Final requirements (feature report)

Implementation experience report

@edwbaker
Copy link
Member

edwbaker commented Oct 31, 2022

At the AudubonCore Maintenance Group meeting on 28/10/2022 it was agreed that this proposal would progress to the public comment period. The public comment period runs from today (31/10/2022) until at least 29/11/2022.

@stanblum
Copy link
Member

stanblum commented Nov 1, 2022

Given that the announcement wasn't published on the TDWG site until 01 Nov, let's leave comments open until 01 Dec.

@tmcelrath
Copy link

This is a good proposal and I support it.

@afuchs1
Copy link

afuchs1 commented Nov 1, 2022

Supported.
@baskaufs If there are suggestions for additional terms how should these be submitted?

@baskaufs
Copy link
Contributor Author

baskaufs commented Nov 1, 2022

@afuchs1 If there are suggestions related to the existing organism groups, I suppose you could put them here. There was a desire to develop terms for other organism groups, but there wasn't sufficient discussion and testing for those other groups to get added in this submission. So that work was archived here with the idea that they could be developed and added in the future. So if it's that kind of change you are suggesting, I'd recommend trying to find some others who are experts on the same organism group and then test the suggested terms on some images from that group to see if they are usable. That's what we did for the groups that are currently included. Does that make sense?

@afuchs1
Copy link

afuchs1 commented Nov 2, 2022

Thanks @baskaufs we also use 'root' and 'bud' in describing subject parts. The other terms we use are appropriate to discussion about bryophytes.

@ianengelbrecht
Copy link

We developed something like this for a project imaging vertebrate and vertebrate fossil type specimens. You're welcome to take a look and use anything from this document if it might be useful: https://docs.google.com/document/d/1oY22dWdeJONPA7Bebqt1jd_7GrLsczcadcmncnsxL0E/edit?usp=sharing

@edwbaker
Copy link
Member

edwbaker commented Nov 8, 2022

The initial comment period has been extended to 7th December 2022 to give 30 days from the announcement on the TDWG mailing list.

@baskaufs
Copy link
Contributor Author

Thanks @baskaufs we also use 'root' and 'bud' in describing subject parts. The other terms we use are appropriate to discussion about bryophytes.

@afuchs1 It seems like "root" would be a logical addition to the vocabulary if it is something that is commonly photographed. In my experience photographing live plants, I don't commonly use it because I don't usually disturb the plants, but I can see for herbarium specimens it could be useful if the roots were photographed in detail or if a region of interest was demarcated for them.

I think that "buds" would be a potential addition as well. My only question there is whether one would designate them apart from "twig". See for example the "Twig/buds" column of this page.

Quercus coccinea buds
Some of the photos like this one are clearly zoomed into the buds,

Quercus imbricaria twig
but others like this one feature the buds prominently, but show the overall orientation of petioles and lenticels on the twig. So in the latter case, I would probably say it was a photo of the twig.

So the question in my mind is whether a user trying to characterize what's in an image would be able to choose between the two subject parts (twig or bud) when describing an image. Similarly, if one were searching for images of twigs or buds, would one end up missing a lot of images of the other category not chosen.

From the standpoint of regions of interest, it would be good to have buds as a separate category, since one could designate where the buds were located within a larger scale image.

I suppose one option would be to have buds and to say that it had a broader category of twig. That would allow one to search for the broader category (twig) and also get narrower categories (bud), or to search for only bud and get only the cases where buds are featured most prominently. But I'm not sure it's actually true to call "twig" a broader category than bud.

Thoughts?

@baskaufs
Copy link
Contributor Author

We developed something like this for a project imaging vertebrate and vertebrate fossil type specimens. You're welcome to take a look and use anything from this document if it might be useful: https://docs.google.com/document/d/1oY22dWdeJONPA7Bebqt1jd_7GrLsczcadcmncnsxL0E/edit?usp=sharing

@ianengelbrecht This is really useful. I wish we had seen this when we were working out the terms.

With the exception of vertebrae and postcranium, I think we have covered most of parts. (I'm not an expert in this area, so I don't really know what "postcranium" is. I presume it's part of a cranium? Or is it a posterior view of a cranium?) We had some discussion in our meetings about skins and whole skeletons and I think we decided that those would fall into the "whole organism" category. I'd have to go back through the meeting notes to refresh my memory, though.

The views categories also seem to mostly correspond to what we have, although you have some additional categories. We had discussed terms like "medial", but concluded that they didn't actually describe orientations, but rather locations within a part (see #236 for more). Perhaps the difference here is that your categories are more generic "views" rather than strictly orientations.

It would be interesting to know how often each of your categories are used. Is there any way to pull that information? That might be useful for deciding whether there are one or more terms that you use frequently that we don't have covered.

@ianengelbrecht
Copy link

Thanks @baskaufs, glad it provides something useful. Postcranium is basically anything from the neck downwards. We developed those terms to cover imaging of vertebrate fossil material also, and we sometimes have various nonspecific chunks of skeletal material that we photograph together to save time.

In terms of how often each term gets used, the section lower down in the document about view per taxon gives some sense - we photograph all of these for each specimen. We're only doing vertebrate types at present. For 'living' taxa this is pretty standard, we're mostly sure to have a whole animal and get all the views we want, but for fossils it gets more complex and we photograph what we can. We try to get as many standard views as we can for the vertebrate fossils, but sometimes end up reverting to 'postcranium'.

I haven't actually done any image tagging using these terms yet, but I'm expecting the first completed batch of images in the next few weeks and will get on it then. I can provide more information then.

Just for interest I wrote a script for tagging herbarium specimen images using tags from a csv file (such as a collection database extract). I'll probably edit this slightly when it comes time to adding views, parts, etc to the vertebrate images: https://github.com/NSCF/image-tagger-python

@DavidFichtmueller
Copy link

DavidFichtmueller commented Nov 23, 2022

I too support the ratification of the two vocabularies.

I have however two suggestions regarding non-normative parts of the two vocabularies:

Examples

Both tables for the metadata of the terms in the csv format have the column example. In general for controlled vocabularies such a column is not really applicable, however in the context of these two vocabularies in particular, there might be another way of using that column by providing example images that would use the particular subjectPart or subjectOrientation terms, just like @baskaufs used in the comment above about "twig/bud". Adding such example images could be done at any point, as the example values can be declared as "non-normative" in the section 1.1 of the respective documents.

Aliases

The Implementation Experience Report mentions that terms "adaxial side" and "abaxial side" were changed to "upper side" and "lower side", to make the two terms more broader applicable and to avoid confusion of these two very similarly looking terms. This however would be a good use case for adding aliases or synonyms for particular terms, both in the human readable form and the machine readable form. For humans this would allow the terms to be found by their synonyms when doing a simple document search (for the case of adaxial/abaxial it would not help much, as those two terms are already mentioned in the usage notes, but for other cases of aliases this might be different). For the machine readable document they would be expressed using skos:altLabel and it would allow systems that help the user with machine guided entry of the terms to pick the correct one, even when the user types in the synonym, similar to the UI behavior of Wikidata/Wikibase when adding an item by its alias.
AC_Feedback_wikidata_input
(example image of how such an input would look like in Wikidata-like system)
Again, adding the aliases could be done independent of the current ratification, as their values can be declared as "non-normative" in the section 1.1 of the respective documents.

@baskaufs
Copy link
Contributor Author

@DavidFichtmueller Thanks for your thoughtful comments. I think your examples idea is a good one and I think it should be relatively easy to implement. Based on previous work, I have sets of exemplar images that could be used for nearly all of the plant subjectPart terms. It probably would not be hard to collect them for the other groups.

Also, your idea about documenting synonyms is a good one. I think the way to make the synonyms available would be to use the same mechanism as for the non-English translations: include them in the machine-readable JSON-LD as values for skos:altLabel. That would be meaningful from a semantic point of view and as you say it follows the pattern used by Wikibase (which also represents the aliases as skos:altLabel. I would need to think about whether there would be a systematic way to include them in the human-readable metadata. As you say, they could just be put in the comments, but perhaps we should consider adding skos:altLabel as a filed in the basic term metadata. I need to think about it some more, but I can't think of any reason why that would be bad and perhaps it might become a useful precedent for all TDWG controlled vocabularies. It seems like this could come up elsewhere.

@baskaufs
Copy link
Contributor Author

@ianengelbrecht Thanks for the clarification. When your guidelines are finished and stored in a stable place, I think it would be useful for Audubon Core to link to them as a reference. Also, as a separate issue from this one, I'm interested to hear how it goes inserting the tags into the EXIF using your script. I've heard of people doing that, but It was not clear to me how useful it was in the end. So it would be good to get an assessment from you after you've been doing it for a while.

@nielsklazenga
Copy link
Member

I support @DavidFichtmueller 's suggestion to include aliases in general, but adaxial side and abaxial side are not synonyms of upper side and lower side. They are different terms, used for things that are generally vertical rather than horizontal. Mapping (from a different vocabulary, which has the terms adaxial side and abaxial side) would be more appropriate than aliasing. So, for this particular example, best to keep it just in the comments and not tie it down by adding skos:altLabels.

@baskaufs
Copy link
Contributor Author

baskaufs commented Dec 7, 2022

I was thinking about @afuchs1's suggestion to add root as a subjectPart for plants. I support that idea, however after thinking about it, I realized that there are many underground plant parts that technically aren't roots. So I'm wondering if it would be better to create several terms for the variety of underground plant parts, and then create undergroundStructure as a broader category. Thus if some non-technical person wanted to indicate that an image or ROI depicted some underground structure but wasn't sure what kind it was, they could use the broader term. If they knew the narrower technical term, they could use that instead. I'm thinking of this:

narrower terms:

  • root
  • corm
  • tuber
  • rhizome
  • bulb
  • stolon

broader term for all of those:

  • undergroundStructure

Thoughts?

@baskaufs
Copy link
Contributor Author

baskaufs commented Dec 7, 2022

Since the public comment period for the views controlled vocabularies proposal is ending today, I would like to solicit thoughts about how we might handle @afuchs1's suggestion about adding "bud" as a subjectPart. I feel like that is a good idea, since one might want to designate the location of buds as regions of interest within a larger photo of a plant. The question I raised in this comment was whether as a practical matter a user could judge whether an image was of a twig or a bud. I am doubtful about the solution that I proposed (saying that twig was a broader concept than bud) because buds could also be on herbaceous plants and underground structures like tubers. So without objection, I think we should just add bud as a subjectPart suitable for herbaceous and woody plants (and potentially other plant groups if they were worked out).

@baskaufs
Copy link
Contributor Author

baskaufs commented Dec 7, 2022

In this comment I mentioned that the main subjectPart concepts that were in @ianengelbrecht's imaging guidelines that we were missing were vertebrae and postcranium. When we were working out the vocabularies, we opted not to designate a separate collection of subjectParts for skeletons, since most of them corresponded to the general parts of vertebrates that we'd already defined. Vertebrae are an exception as they aren't a morphological feature that's generally visible in an external view of a vertebrate. So it seems legitimate to add it, although I would feel better about doing that if we had some testing of the terms in general to see how they work with skeletons. So I think we should defer adding it for this round. I suppose postcranium could be used as a broader term for all vertebrate parts anterior to the head, but again I would like to see how this works out in practice with some implementation testing prior to adding it.

@baskaufs
Copy link
Contributor Author

baskaufs commented Dec 7, 2022

With respect to @DavidFichtmueller's suggestion to supply aliases as skos:altLabel values, this idea seems to have general support. As he noted, these would not be normative, and I think they probably should also be outside of the standard so that they can be handled in a more agile manner, similar to how we handle non-English translations and the SKOS collections for these vocabularies. Given @nielsklazenga's comments that he doesn't think ad-/abaxial are synonyms to lower/upper I guess we will just leave things the way they are for now in the terms metadata.

@baskaufs
Copy link
Contributor Author

baskaufs commented Dec 7, 2022

@DavidFichtmueller noted that there was an unused examples field in the metadata tables for these vocabularies and suggested that it be used to link to example images. I don't see any reason not to do that as long as we can identify images that have stable URLs. I believe that the Darwin Core Maintenance Group handles example changes as "minor editorial errata" and just changes them without instituting the change process. Is that right @tucotuco ? I suppose we also could handle them as Darwin Core handles non-normative changes that technically don't require the full change process: create an issue and hold public comment on them as a courtesy and to make people more aware of the change.

@nielsklazenga
Copy link
Member

@baskaufs, just 👍 -d the extra terms you proposed. Just noting that a stolon is not an underground but an above-ground structure, so should not be part of the broader term.

@baskaufs
Copy link
Contributor Author

baskaufs commented Dec 8, 2022

Thanks, @nielsklazenga ! Always good to have some real botanists around to help amateurs like me! :-) I think it would be best to just drop stolon, then since "stem" could probably just be used.

@nielsklazenga
Copy link
Member

Yes, I imagine if a use case for 'stolon' comes up it is added easily enough.

@edwbaker
Copy link
Member

edwbaker commented Dec 8, 2022

I am closing the public comment period, with thanks to everyone who has made thoughtful and useful comments.

The next step is for the Views Controlled Vocabulary Task Group to consider the comments and present a revised proposal to the Maintenence Group.

@edwbaker edwbaker added the next meeting agenda Issues to be discussed at next MG meeting label Dec 8, 2022
@baskaufs
Copy link
Contributor Author

Hey @nielsklazenga and @afuchs1, can you please take a look at the last rows of this edited subjectPart table and give me an opinion or suggestions on the definitions for the new terms I added (lines 35 and below). For all but undergroundStructure I linked to the Plant Ontology IRI and did a sort of short paraphrase of the PO definitions, which are sometimes overly long and incomprehensible. Since undergroundStructure is a non-specific parent term for all of the ones after it, there isn't really a PO term for it, so I left the IRI blank and just made up a definition.

@afuchs1
Copy link

afuchs1 commented Dec 23, 2022

@baskaufs I have confirmed with our image curators that 'bud' in the context we use it refers to the 'flower bud' which is a sub-category https://ontobee.org/ontology/PO?iri=http://purl.obolibrary.org/obo/PO_0000056 of http://purl.obolibrary.org/obo/PO_0000055.
The other vocabularies look good.

@nielsklazenga
Copy link
Member

nielsklazenga commented Dec 23, 2022

I agree with @afuchs1 that PO_0000056 will be the term that is used most often and also what people first think of when they hear 'bud' (at least here in Australia where most trees are evergreen), but let's call it 'flower bud' then to avoid any confusion.

If there is not a firm use case for it yet, I suggest leaving undergroundStructure out. It is an awkward grouping (which is why you will not find it in PO) and makes me think more of phone lines and water pipes than anything plant-related.

@baskaufs
Copy link
Contributor Author

Hmmm. Well, in my experience in North America, people also photograph vegetative buds to facilitate winter twig identification. So should we include both flower bud and vegetative bud? Or do we just assume that someone photographing a twig bud would categorize it as "twig"?

@baskaufs
Copy link
Contributor Author

@nielsklazenga The reason for including undergroundStructure is to make it possible for non-technical people to label "roots" when they can't tell the difference between the technical underground structures (rhizome, bulb, etc.). I feel that if we leave it out, then we probably should leave out the technical underground structure terms as well, since we haven't done any user testing with them. This initially seemed like a simple case, but is now turning out to be more complicated. In other cases where complicated stuff came up and wasn't tested (fungi, ferns, etc.) we opted not to add them to the vocabularies until there was more work done to ensure that they were actually usable.

@baskaufs
Copy link
Contributor Author

OK, I need to wrap this up. There does not seem to be sufficient support or testing to add the underground structures. So they will have to be added in some future round after user testing.

With respect to "bud", there are clearly use cases for both vegetative and reproductive buds as parts, so I don't think it makes sense to define it narrowly as only one or the other. I'm going to leave it at http://purl.obolibrary.org/obo/PO_0000055, whose definition will serve for either. If it gets used often enough, we can create two narrower terms for the two kinds of buds and link them to "bud" as a broader category. I know that in at least one case (Amorphophallus titanum) it isn't possible to know which kind of bud it is until the shoot develops, so the broader term should probably exist anyway.

@JCGiron
Copy link

JCGiron commented Jan 25, 2023

Hi @baskaufs,
I'm at an event with folks working on ontologies and image protocols from the informatics point of view. We have been discussing the issue of Regions of Interest, how to extract them from or delimiting them in an image, and how to annotate them with terms from anatomy ontologies. Part of this boils down to documenting those annotations as metadata, but then metadata are still far from standardized for images.
Some questions regarding the AC standard that I don't know how to answer:

  • Is there a reason to generate TDWG identifiers rather than using existing ontology identifiers for terms?
  • Is there a somewhat formal ontology of TDWG/AC terms or interest to create one?
    This issue (regions of interest and annotations to them) might be a point to tackle by the Imageomics team as they move forward with their protocols.
    I'm tagging @wdahdul, @pmabee, @susan1637, @xbahadirx, and @seltmann, who have been involved in these discussions.

@baskaufs
Copy link
Contributor Author

Hi @JCGiron. All good questions and I'm not sure I can answer them. But I'll try.

As far as your first question is concerned, I would say that the TDWG terms were minted for the subjectPart controlled vocabularies rather than using existing ontology terms for several reasons:

  • the TDWG controlled vocabularies follow the existing design patterns within TDWG for controlled vocabularies, i.e. treating the terms as SKOS concepts and assigning a controlled value string that people can use in spreadsheets or other text-based systems in lieu of using the term IRI (see Section 4.5.4 of the Standards Documentation Specification for details). As you know, we've tried to map these subjectPart terms to well-known ontologies, but there are some semantic reasons for not declaring them to be equivalent, and the ontologies don't really provide the controlled value strings that most developers want (vs. requiring use of IRIs).
  • the ontologies are generally much larger and way more complicated (i.e. subclass/broader relationships) than the TDWG controlled vocabularies. We wanted to keep the lists for a particular organismal group small enough that it could be used to populate a drop-down pick list, or be scanned as a list by a human who wanted to type something in a spreadsheet. If we said, "just pick terms from the Plant Ontology", for example, there would be an overwhelming number of options -- so many that it would hardly qualify as a controlled vocabulary. Some of the problems with trying to use ontology terms as controlled values have been discussed previously in New term - environmentalMaterial dwc#40 and New term - biome dwc#38 .

With regard to the second question, there is not a formal ontology of AC terms and I don't think anyone has suggested working on one recently. When AC was first adopted, I think there was an assumption that there would be some more formal RDF representation of the main vocabulary. Bob Morris, who was the main author of the standard had put in some work on this, but it was never finished, I think at least in part because of lack of demand. I think his work is on the old Google Code site somewhere, but I don't think it made the transition to GitHub. Since that time, Bob has passed away and TDWG has settled into a pattern where terms are defined with minimal semantics (slang: "bag of terms" approach) and their RDF representations are very lightweight. (You can get a dump of metadata in Turtle using a URL in this pattern http://rs.tdwg.org/dump/audubon.ttl, see this for details.) But it's not what anybody creating ontologies with Protege would be expecting. The current expectation within TDWG is that semantic layers (like ontologies) would be added on top of the basic "bag of terms" layer as a "vocabulary enhancement". This approach is described in Section 4.4.2.2 of the Standards Documentation Specification. Thus, if there were demand for it, the community could create an ontology to describe the semantic relationships among terms and add it as a layer on top of the bag of terms. So far, no one has proposed doing that for Audiovisual Core. The idea has been floated a number of times for Darwin Core, but hasn't yet been done formally within the TDWG Process framework.

With respect to the technical details of extracting and delimiting them in an image, that remains to be worked out. Right now, the ROI recipes document is all we have. For spreadsheets and tables, it assumes the approach suggested by the Audobon Core Structure document, which honestly is pretty out of date. For machine readable data, it suggests an approach that is consistent with the JSON-LD W3C Recommendation. This makes the data Linked Data-ready while serializing the data in a form (JSON) that's more familiar to developers. This approach was influenced by what's going on in the Cultural Heritage part of the museum community, which has rallied around JSON-LD and the IIIF standard for image presentation. IIIF is being actively investigated for use in the Natural History Museum community, so it's my hope that we can make AC ROI data translatable to IIIF features like Annotations, which would allow particular ROIs to be highlighted in a viewer.

Unfortunately, AC is probably somewhat late to the game and people who are doing machine learning already have their own ad hoc systems for handling ROIs. When we were working on the ROI proposal, I spoke with some of the technical people working with Tanya Berger-Wolf (involved in the Imageomics project) about whether they were interested in a standard way of designating ROIs. They said that they might have been if there were one when they started, but at that point they already had a functioning system in place. If however, there is a new effort to share trait data so that it can be aggregated, then it may be time to work out the technical details of how that should be done using the standard AC ROI terms. That would be great if it could be worked out using a use-case driven program with implementation testing. That isn't something that AC can do in isolation, but I'm sure that the AC Maintenance Group would love to help facilitate this if there is interest from developers.

Sorry for the long answer, but these are somewhat complicated issues that haven't fully been worked out. I would love to talk more about this -- perhaps in a new issue in the tracker if its subject can be stated in a meaningful way. This one will hopefully be closed soon if the proposal is ratified.

@JCGiron
Copy link

JCGiron commented Jan 27, 2023

Thank you for your detailed answer, @baskaufs!
One of the conclusions after this week's TraitFest is that we need to communicate better across fields and communities: biologists/technicians who generate and use images, ontology managers/curators who create and define terms, informaticians/software engineers using Machine Learning approaches, and people generating and managing standards. We are all going to the same overall goal of being able to detect and annotate traits in images and documenting what's in them, but walking very different paths, with different expertise and background information, and we would all benefit from better integration and understanding across fields.
Regarding formalized ontologies for TDWG, I think it makes sense to have those, so that it is easier to understand and visualize what's there and how the terms are related. For Biological Ontologies (which I would argue TDWG as Biodiversity Information Standards should be a part of), the OBO Foundry is making an effort aligning anatomical ontologies and establishing pipelines for better integration and interoperability. I think it makes sense connecting these two worlds, even though they work at very different levels of granularity in their data, with very distinct purposes. Both the standards and ontologies are under the umbrella of WC3, so there is at least some common ground. Bringing @matentzn and @cthoyt into this conversation since they work closely with OBO ontologies.
Lots to discuss and lots of work. Would this be the place to start the conversation about TDWG ontologies?

@baskaufs
Copy link
Contributor Author

I'm not sure that the Infrastructure issue tracker would be the best. I don't think many people follow it. If you want to discuss it broadly across TDWG, I'd suggest the Technical Architecture Group tracker: https://github.com/tdwg/tag/issues . If you are specifically interested in Audiovisual (Audubon) Core, I'd put it in the AC tracker: https://github.com/tdwg/ac/issues

Before proposing formal ontology building, there are several things you should probably refer to first.

  1. The Biological Collections Ontology (BCO) is an ongoing effort that uses the OBO Foundry approach and involves many TDWG people (although not an officially sanctioned TDWG effort). It is primarily focused on Darwin Core. I don't think it has gotten into the AC realm. https://bioportal.bioontology.org/ontologies/BCO Ramona Walls was heavily involved with this, but I'm not finding her GitHub username to tag here here. @tucotuco probably knows it and has also been involved.
  2. Many years ago, there was a failed effort to create a "TDWG Ontology". So you may find this experience coloring the views of TDWG "old-timers" about ontology building. As a starting point, I would recommend reading Sections 2.4 and 2.6.1 of the following document for perspective. http://www.gbif.org/resource/80862 The ontologies have been preserved in GitHub at https://github.com/tdwg/ontology although they aren't used much and are not being maintained. If you have more questions about this, I suppose I'm in about as good of a position to answer them as anyone.
  3. The TDWG ABCD group has an ontology that is under development and not actually (yet) part of the ABCD standard. It includes Multimedia Objects, but I don't think that part of the ontology has been developed extensively. @DavidFichtmueller would be a contact for more information about this.
  4. I mentioned Bob Morris' RDF version of Audubon Core. I searched the Google Code archive and found references to it in the Issue tracker (e.g. https://code.google.com/archive/p/auduboncore/issues/18). However, the links are broken. So I'm not sure whether a copy of it exists anywhere. It probably should have gotten transferred to the AC GitHub site, but didn't. That's probably my fault. :-(

I think it's correct to say that the AC Maintenance Group would be keen to support work towards standardizing sharing information about detection and annotation of traits. So it would be great to put that on the agenda of a future meeting. Pinging @edwbaker about that.

@cthoyt
Copy link

cthoyt commented Jan 29, 2023

... Both the standards and ontologies are under the umbrella of WC3, so there is at least some common ground. Bringing @matentzn and @cthoyt into this conversation since they work closely with OBO ontologies. Lots to discuss and lots of work. Would this be the place to start the conversation about TDWG ontologies?

hi all, this is an awfully dense and technical discussion. I'm not sure exactly what you would want from me. If you've got a specific question that doesn't require me to read and catch up on this, I'd be happy to help. I'm going to unsubscribe from this thread for now. Please feel free to ping me again in a new specific discussion on GitHub, shoot me an email at cthoyt@gmail.com, or on the obo foundry slack if you want me to take a look again.

FYI, I had to click a few times through this repository to figure out what TDWG, so maybe you can also reconsider the amount of jargon in this thread that's a barrier for entry for other people to discussion.

That all being said, if there's any way you think that the Bioregistry (code), a registry of biomedical and life science vocabularies can fit into your initiative, I'd be even more keen to participate!

@matentzn
Copy link

I like the discussion here! There are a lot of interesting things to comment on, but one detail keeps sticking out to me across projects like the one you are on:

If we said, "just pick terms from the Plant Ontology", for example, there would be an overwhelming number of options

This seems to be a huge dealbreaker for many projects. However, it is not really the right level of abstraction. Your use cases involve data models (hopefully semantic in some way). Slots in these data model "anatomical entity" involve constraints. So it should be possible to say something like: "this anatomy field in this data model should be populated by the "Uberon:limb" branch", and have the system you use for constraint management to the rest (i.e. LinkML).

If you coin your own IRIs for every single use case, you have an easier time internally (you can just import what you really care about, and you are not beholden to the sometimes idiosyncratic way external ontologies label or define their terms), but you create a polynomial problem for outside users. Because they will, for every single new semantic space introduced (semantic space as in identifier space covering some domain), have to deal with another set of mappings, resulting in a lot of pairwise mappings to deal with). Furthermore, a chance is lot to curate metadata, like synonyms etc, in a centralised place that benefit the whole world!

In any case, I understand how hard your problem is, and in the end, you should do what your budget permits! Re-using existing ontology ids internally will make your much easier to integrate (more FAIR!), but your own curation process twice as expensive.

@baskaufs
Copy link
Contributor Author

FYI, I had to click a few times through this repository to figure out what TDWG, so maybe you can also reconsider the amount of jargon in this thread that's a barrier for entry for other people to discussion.

This discussion is actually not really happening in the right place. This issue was specifically put in place to track the progress of a task group's proposal for ratification. As such, it includes technical details related to the ratification process. The discussion about the use of ontologies within TDWG is really a new issue that is only tangentially related to the proposal. I would suggest moving it to a new issue specifically dealing with this question.

@baskaufs
Copy link
Contributor Author

The consensus expressed by the Audiovisual Core Maintenance Group as of 14 April 2023 was to recommend that this proposal be sent to the Executive Committee for ratification.

@baskaufs baskaufs removed the next meeting agenda Issues to be discussed at next MG meeting label Apr 19, 2023
@baskaufs
Copy link
Contributor Author

Updated proposal text to replace draft Implementation Experience Report with the published version.

@baskaufs
Copy link
Contributor Author

Update proposal text to replace link to implementer instructions with the actual user guide link.

@baskaufs
Copy link
Contributor Author

baskaufs commented May 9, 2023

Proposal ratified by the Executive Committee on 2023-04-26. The List of Terms documents http://rs.tdwg.org/ac/doc/orient/ and http://rs.tdwg.org/ac/doc/part/ are now live and term dereferencing was enabled in this release.

@baskaufs baskaufs closed this as completed May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests