-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Views controlled vocabularies proposal #245
Comments
At the AudubonCore Maintenance Group meeting on 28/10/2022 it was agreed that this proposal would progress to the public comment period. The public comment period runs from today (31/10/2022) until at least 29/11/2022. |
Given that the announcement wasn't published on the TDWG site until 01 Nov, let's leave comments open until 01 Dec. |
This is a good proposal and I support it. |
Supported. |
@afuchs1 If there are suggestions related to the existing organism groups, I suppose you could put them here. There was a desire to develop terms for other organism groups, but there wasn't sufficient discussion and testing for those other groups to get added in this submission. So that work was archived here with the idea that they could be developed and added in the future. So if it's that kind of change you are suggesting, I'd recommend trying to find some others who are experts on the same organism group and then test the suggested terms on some images from that group to see if they are usable. That's what we did for the groups that are currently included. Does that make sense? |
Thanks @baskaufs we also use 'root' and 'bud' in describing subject parts. The other terms we use are appropriate to discussion about bryophytes. |
We developed something like this for a project imaging vertebrate and vertebrate fossil type specimens. You're welcome to take a look and use anything from this document if it might be useful: https://docs.google.com/document/d/1oY22dWdeJONPA7Bebqt1jd_7GrLsczcadcmncnsxL0E/edit?usp=sharing |
The initial comment period has been extended to 7th December 2022 to give 30 days from the announcement on the TDWG mailing list. |
@afuchs1 It seems like "root" would be a logical addition to the vocabulary if it is something that is commonly photographed. In my experience photographing live plants, I don't commonly use it because I don't usually disturb the plants, but I can see for herbarium specimens it could be useful if the roots were photographed in detail or if a region of interest was demarcated for them. I think that "buds" would be a potential addition as well. My only question there is whether one would designate them apart from "twig". See for example the "Twig/buds" column of this page.
So the question in my mind is whether a user trying to characterize what's in an image would be able to choose between the two subject parts (twig or bud) when describing an image. Similarly, if one were searching for images of twigs or buds, would one end up missing a lot of images of the other category not chosen. From the standpoint of regions of interest, it would be good to have buds as a separate category, since one could designate where the buds were located within a larger scale image. I suppose one option would be to have buds and to say that it had a broader category of twig. That would allow one to search for the broader category (twig) and also get narrower categories (bud), or to search for only bud and get only the cases where buds are featured most prominently. But I'm not sure it's actually true to call "twig" a broader category than bud. Thoughts? |
@ianengelbrecht This is really useful. I wish we had seen this when we were working out the terms. With the exception of vertebrae and postcranium, I think we have covered most of parts. (I'm not an expert in this area, so I don't really know what "postcranium" is. I presume it's part of a cranium? Or is it a posterior view of a cranium?) We had some discussion in our meetings about skins and whole skeletons and I think we decided that those would fall into the "whole organism" category. I'd have to go back through the meeting notes to refresh my memory, though. The views categories also seem to mostly correspond to what we have, although you have some additional categories. We had discussed terms like "medial", but concluded that they didn't actually describe orientations, but rather locations within a part (see #236 for more). Perhaps the difference here is that your categories are more generic "views" rather than strictly orientations. It would be interesting to know how often each of your categories are used. Is there any way to pull that information? That might be useful for deciding whether there are one or more terms that you use frequently that we don't have covered. |
Thanks @baskaufs, glad it provides something useful. Postcranium is basically anything from the neck downwards. We developed those terms to cover imaging of vertebrate fossil material also, and we sometimes have various nonspecific chunks of skeletal material that we photograph together to save time. In terms of how often each term gets used, the section lower down in the document about view per taxon gives some sense - we photograph all of these for each specimen. We're only doing vertebrate types at present. For 'living' taxa this is pretty standard, we're mostly sure to have a whole animal and get all the views we want, but for fossils it gets more complex and we photograph what we can. We try to get as many standard views as we can for the vertebrate fossils, but sometimes end up reverting to 'postcranium'. I haven't actually done any image tagging using these terms yet, but I'm expecting the first completed batch of images in the next few weeks and will get on it then. I can provide more information then. Just for interest I wrote a script for tagging herbarium specimen images using tags from a csv file (such as a collection database extract). I'll probably edit this slightly when it comes time to adding views, parts, etc to the vertebrate images: https://github.com/NSCF/image-tagger-python |
I too support the ratification of the two vocabularies. I have however two suggestions regarding non-normative parts of the two vocabularies: ExamplesBoth tables for the metadata of the terms in the csv format have the column example. In general for controlled vocabularies such a column is not really applicable, however in the context of these two vocabularies in particular, there might be another way of using that column by providing example images that would use the particular subjectPart or subjectOrientation terms, just like @baskaufs used in the comment above about "twig/bud". Adding such example images could be done at any point, as the example values can be declared as "non-normative" in the section 1.1 of the respective documents. AliasesThe Implementation Experience Report mentions that terms "adaxial side" and "abaxial side" were changed to "upper side" and "lower side", to make the two terms more broader applicable and to avoid confusion of these two very similarly looking terms. This however would be a good use case for adding aliases or synonyms for particular terms, both in the human readable form and the machine readable form. For humans this would allow the terms to be found by their synonyms when doing a simple document search (for the case of adaxial/abaxial it would not help much, as those two terms are already mentioned in the usage notes, but for other cases of aliases this might be different). For the machine readable document they would be expressed using skos:altLabel and it would allow systems that help the user with machine guided entry of the terms to pick the correct one, even when the user types in the synonym, similar to the UI behavior of Wikidata/Wikibase when adding an item by its alias. |
@DavidFichtmueller Thanks for your thoughtful comments. I think your examples idea is a good one and I think it should be relatively easy to implement. Based on previous work, I have sets of exemplar images that could be used for nearly all of the plant subjectPart terms. It probably would not be hard to collect them for the other groups. Also, your idea about documenting synonyms is a good one. I think the way to make the synonyms available would be to use the same mechanism as for the non-English translations: include them in the machine-readable JSON-LD as values for |
@ianengelbrecht Thanks for the clarification. When your guidelines are finished and stored in a stable place, I think it would be useful for Audubon Core to link to them as a reference. Also, as a separate issue from this one, I'm interested to hear how it goes inserting the tags into the EXIF using your script. I've heard of people doing that, but It was not clear to me how useful it was in the end. So it would be good to get an assessment from you after you've been doing it for a while. |
I support @DavidFichtmueller 's suggestion to include aliases in general, but |
I was thinking about @afuchs1's suggestion to add narrower terms:
broader term for all of those:
Thoughts? |
Since the public comment period for the views controlled vocabularies proposal is ending today, I would like to solicit thoughts about how we might handle @afuchs1's suggestion about adding "bud" as a subjectPart. I feel like that is a good idea, since one might want to designate the location of buds as regions of interest within a larger photo of a plant. The question I raised in this comment was whether as a practical matter a user could judge whether an image was of a twig or a bud. I am doubtful about the solution that I proposed (saying that twig was a broader concept than bud) because buds could also be on herbaceous plants and underground structures like tubers. So without objection, I think we should just add bud as a subjectPart suitable for herbaceous and woody plants (and potentially other plant groups if they were worked out). |
In this comment I mentioned that the main subjectPart concepts that were in @ianengelbrecht's imaging guidelines that we were missing were vertebrae and postcranium. When we were working out the vocabularies, we opted not to designate a separate collection of subjectParts for skeletons, since most of them corresponded to the general parts of vertebrates that we'd already defined. Vertebrae are an exception as they aren't a morphological feature that's generally visible in an external view of a vertebrate. So it seems legitimate to add it, although I would feel better about doing that if we had some testing of the terms in general to see how they work with skeletons. So I think we should defer adding it for this round. I suppose postcranium could be used as a broader term for all vertebrate parts anterior to the head, but again I would like to see how this works out in practice with some implementation testing prior to adding it. |
With respect to @DavidFichtmueller's suggestion to supply aliases as |
@DavidFichtmueller noted that there was an unused examples field in the metadata tables for these vocabularies and suggested that it be used to link to example images. I don't see any reason not to do that as long as we can identify images that have stable URLs. I believe that the Darwin Core Maintenance Group handles example changes as "minor editorial errata" and just changes them without instituting the change process. Is that right @tucotuco ? I suppose we also could handle them as Darwin Core handles non-normative changes that technically don't require the full change process: create an issue and hold public comment on them as a courtesy and to make people more aware of the change. |
@baskaufs, just 👍 -d the extra terms you proposed. Just noting that a stolon is not an underground but an above-ground structure, so should not be part of the broader term. |
Thanks, @nielsklazenga ! Always good to have some real botanists around to help amateurs like me! :-) I think it would be best to just drop stolon, then since "stem" could probably just be used. |
Yes, I imagine if a use case for 'stolon' comes up it is added easily enough. |
I am closing the public comment period, with thanks to everyone who has made thoughtful and useful comments. The next step is for the Views Controlled Vocabulary Task Group to consider the comments and present a revised proposal to the Maintenence Group. |
Hey @nielsklazenga and @afuchs1, can you please take a look at the last rows of this edited subjectPart table and give me an opinion or suggestions on the definitions for the new terms I added (lines 35 and below). For all but |
@baskaufs I have confirmed with our image curators that 'bud' in the context we use it refers to the 'flower bud' which is a sub-category https://ontobee.org/ontology/PO?iri=http://purl.obolibrary.org/obo/PO_0000056 of http://purl.obolibrary.org/obo/PO_0000055. |
I agree with @afuchs1 that PO_0000056 will be the term that is used most often and also what people first think of when they hear 'bud' (at least here in Australia where most trees are evergreen), but let's call it 'flower bud' then to avoid any confusion. If there is not a firm use case for it yet, I suggest leaving |
Hmmm. Well, in my experience in North America, people also photograph vegetative buds to facilitate winter twig identification. So should we include both flower bud and vegetative bud? Or do we just assume that someone photographing a twig bud would categorize it as "twig"? |
@nielsklazenga The reason for including undergroundStructure is to make it possible for non-technical people to label "roots" when they can't tell the difference between the technical underground structures (rhizome, bulb, etc.). I feel that if we leave it out, then we probably should leave out the technical underground structure terms as well, since we haven't done any user testing with them. This initially seemed like a simple case, but is now turning out to be more complicated. In other cases where complicated stuff came up and wasn't tested (fungi, ferns, etc.) we opted not to add them to the vocabularies until there was more work done to ensure that they were actually usable. |
OK, I need to wrap this up. There does not seem to be sufficient support or testing to add the underground structures. So they will have to be added in some future round after user testing. With respect to "bud", there are clearly use cases for both vegetative and reproductive buds as parts, so I don't think it makes sense to define it narrowly as only one or the other. I'm going to leave it at http://purl.obolibrary.org/obo/PO_0000055, whose definition will serve for either. If it gets used often enough, we can create two narrower terms for the two kinds of buds and link them to "bud" as a broader category. I know that in at least one case (Amorphophallus titanum) it isn't possible to know which kind of bud it is until the shoot develops, so the broader term should probably exist anyway. |
Hi @baskaufs,
|
Hi @JCGiron. All good questions and I'm not sure I can answer them. But I'll try. As far as your first question is concerned, I would say that the TDWG terms were minted for the
With regard to the second question, there is not a formal ontology of AC terms and I don't think anyone has suggested working on one recently. When AC was first adopted, I think there was an assumption that there would be some more formal RDF representation of the main vocabulary. Bob Morris, who was the main author of the standard had put in some work on this, but it was never finished, I think at least in part because of lack of demand. I think his work is on the old Google Code site somewhere, but I don't think it made the transition to GitHub. Since that time, Bob has passed away and TDWG has settled into a pattern where terms are defined with minimal semantics (slang: "bag of terms" approach) and their RDF representations are very lightweight. (You can get a dump of metadata in Turtle using a URL in this pattern http://rs.tdwg.org/dump/audubon.ttl, see this for details.) But it's not what anybody creating ontologies with Protege would be expecting. The current expectation within TDWG is that semantic layers (like ontologies) would be added on top of the basic "bag of terms" layer as a "vocabulary enhancement". This approach is described in Section 4.4.2.2 of the Standards Documentation Specification. Thus, if there were demand for it, the community could create an ontology to describe the semantic relationships among terms and add it as a layer on top of the bag of terms. So far, no one has proposed doing that for Audiovisual Core. The idea has been floated a number of times for Darwin Core, but hasn't yet been done formally within the TDWG Process framework. With respect to the technical details of extracting and delimiting them in an image, that remains to be worked out. Right now, the ROI recipes document is all we have. For spreadsheets and tables, it assumes the approach suggested by the Audobon Core Structure document, which honestly is pretty out of date. For machine readable data, it suggests an approach that is consistent with the JSON-LD W3C Recommendation. This makes the data Linked Data-ready while serializing the data in a form (JSON) that's more familiar to developers. This approach was influenced by what's going on in the Cultural Heritage part of the museum community, which has rallied around JSON-LD and the IIIF standard for image presentation. IIIF is being actively investigated for use in the Natural History Museum community, so it's my hope that we can make AC ROI data translatable to IIIF features like Annotations, which would allow particular ROIs to be highlighted in a viewer. Unfortunately, AC is probably somewhat late to the game and people who are doing machine learning already have their own ad hoc systems for handling ROIs. When we were working on the ROI proposal, I spoke with some of the technical people working with Tanya Berger-Wolf (involved in the Imageomics project) about whether they were interested in a standard way of designating ROIs. They said that they might have been if there were one when they started, but at that point they already had a functioning system in place. If however, there is a new effort to share trait data so that it can be aggregated, then it may be time to work out the technical details of how that should be done using the standard AC ROI terms. That would be great if it could be worked out using a use-case driven program with implementation testing. That isn't something that AC can do in isolation, but I'm sure that the AC Maintenance Group would love to help facilitate this if there is interest from developers. Sorry for the long answer, but these are somewhat complicated issues that haven't fully been worked out. I would love to talk more about this -- perhaps in a new issue in the tracker if its subject can be stated in a meaningful way. This one will hopefully be closed soon if the proposal is ratified. |
Thank you for your detailed answer, @baskaufs! |
I'm not sure that the Infrastructure issue tracker would be the best. I don't think many people follow it. If you want to discuss it broadly across TDWG, I'd suggest the Technical Architecture Group tracker: https://github.com/tdwg/tag/issues . If you are specifically interested in Audiovisual (Audubon) Core, I'd put it in the AC tracker: https://github.com/tdwg/ac/issues Before proposing formal ontology building, there are several things you should probably refer to first.
I think it's correct to say that the AC Maintenance Group would be keen to support work towards standardizing sharing information about detection and annotation of traits. So it would be great to put that on the agenda of a future meeting. Pinging @edwbaker about that. |
hi all, this is an awfully dense and technical discussion. I'm not sure exactly what you would want from me. If you've got a specific question that doesn't require me to read and catch up on this, I'd be happy to help. I'm going to unsubscribe from this thread for now. Please feel free to ping me again in a new specific discussion on GitHub, shoot me an email at cthoyt@gmail.com, or on the obo foundry slack if you want me to take a look again. FYI, I had to click a few times through this repository to figure out what TDWG, so maybe you can also reconsider the amount of jargon in this thread that's a barrier for entry for other people to discussion. That all being said, if there's any way you think that the Bioregistry (code), a registry of biomedical and life science vocabularies can fit into your initiative, I'd be even more keen to participate! |
I like the discussion here! There are a lot of interesting things to comment on, but one detail keeps sticking out to me across projects like the one you are on:
This seems to be a huge dealbreaker for many projects. However, it is not really the right level of abstraction. Your use cases involve data models (hopefully semantic in some way). Slots in these data model "anatomical entity" involve constraints. So it should be possible to say something like: "this anatomy field in this data model should be populated by the "Uberon:limb" branch", and have the system you use for constraint management to the rest (i.e. LinkML). If you coin your own IRIs for every single use case, you have an easier time internally (you can just import what you really care about, and you are not beholden to the sometimes idiosyncratic way external ontologies label or define their terms), but you create a polynomial problem for outside users. Because they will, for every single new semantic space introduced (semantic space as in identifier space covering some domain), have to deal with another set of mappings, resulting in a lot of pairwise mappings to deal with). Furthermore, a chance is lot to curate metadata, like synonyms etc, in a centralised place that benefit the whole world! In any case, I understand how hard your problem is, and in the end, you should do what your budget permits! Re-using existing ontology ids internally will make your much easier to integrate (more FAIR!), but your own curation process twice as expensive. |
This discussion is actually not really happening in the right place. This issue was specifically put in place to track the progress of a task group's proposal for ratification. As such, it includes technical details related to the ratification process. The discussion about the use of ontologies within TDWG is really a new issue that is only tangentially related to the proposal. I would suggest moving it to a new issue specifically dealing with this question. |
The consensus expressed by the Audiovisual Core Maintenance Group as of 14 April 2023 was to recommend that this proposal be sent to the Executive Committee for ratification. |
Updated proposal text to replace draft Implementation Experience Report with the published version. |
Update proposal text to replace link to implementer instructions with the actual user guide link. |
Proposal ratified by the Executive Committee on 2023-04-26. The List of Terms documents http://rs.tdwg.org/ac/doc/orient/ and http://rs.tdwg.org/ac/doc/part/ are now live and term dereferencing was enabled in this release. |
The Views Controlled Vocabularies Task Group would like to formally submit for ratification two new controlled vocabularies. One controlled vocabulary is intended for use with ac:subjectPart and ac:subjectPartLiteral and the other is intended for use with ac:subjectOrientation and ac:subjectOrientationLiteral.
Since this is a coordinated addition to Audubon Core, the Task Group has followed the procedures for developing vocabulary enhancements as outlined in Section 4 of the TDWG Vocabulary Maintenance Specification (VMS). In particular, the group began its work by creating candidate and final requirements based on use cases submitted by the community. These final requirements are the Feature Report described in Section 4.2.1 of the VMS. With the generous assistance of field testers, the Task Group has published an Implementation Experience Report (see Section 4.2.2 of the VMS) in Biodiversity Information Science and Standards (BISS) at http://doi.org/10.3897/biss.7.94188. As noted in Section 4.2.3 of the VMS, the purpose of these two reports is to facilitate review by the Maintenance Group when deciding whether to advance the proposal to public comment, and as a source of information for the community during the public review.
Documents to become part of Audubon Core
subjectPart controlled vocabulary - contains the normative definitions of the subjectPart concepts that form the vocabulary and the controlled value strings to be used as literal values.
subjectOrientation controlled vocabulary - contains the normative definitions of the subjectOrientation concepts that form the vocabulary and the controlled value strings to be used as literal values.
Ancillary documents (supporting documents not part of the standard)
List of controlled value strings for ac:subjectPartLiteral organized by organism group - intended to guide human users in selecting subjectPart values appropriate for particular organism groups.
JSON-LD serialization of SKOS Collections of subjectPart concepts by organism group - machine-readable metadata intended to be consumed by clients for the purpose of generating pick lists or validating concepts for particular organism groups.
JSON-LD serialization of the SKOS concept scheme for subjectPart - machine-readable metadata to provide multilingual labels, multilingual definitions, controlled value strings, and links to external ontologies for subjectPart concepts.
Metadata for subjectPart terms in tabular form (CSV)
List of controlled value strings for ac:subjectOrientation organized by subject part - intended to guide human users in selecting subjectOrientation values appropriate for particular subject parts.
JSON-LD serialization of SKOS Collections of subjectOrientation concepts by subject part - machine-readable metadata intended to be consumed by clients for the purpose of generating pick lists or validating concepts for particular subject parts.
JSON-LD serialization of the SKOS concept scheme for subjectOrientation - machine-readable metadata to provide multilingual labels, multilingual definitions, controlled value strings, and links to external ontologies for subjectOrientation concepts.
Metadata for subjectOrientation terms in tabular form (CSV)
Reference documents
User guide - provides examples (with screenshots) of implementation of the controlled vocabularies in several ways.
Submitted use cases
Final requirements (feature report)
Implementation experience report
The text was updated successfully, but these errors were encountered: