Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add term relationshipOfResourceID to Resource Relation extension #186

Closed
jhpoelen opened this issue Aug 16, 2018 · 48 comments
Closed

add term relationshipOfResourceID to Resource Relation extension #186

jhpoelen opened this issue Aug 16, 2018 · 48 comments

Comments

@jhpoelen
Copy link

jhpoelen commented Aug 16, 2018

Suggest to add relationshipOfResourceID to Resource Relation extension to help link human readable resource relationship types (e.g., "parasite of") to a definition (e.g., http://purl.obolibrary.org/obo/RO_0002444).

For discussion see trias-project/uredinales-belgium-checklist#8 (comment) .

Discussion involved @peterdesmet @baskaufs @qgroom @LienReyserhove .

@jhpoelen
Copy link
Author

hi @tucotuco thanks for adding the label. Can I just make a pull request to the DwC spec to get this accepted? It seems like a pretty intuitive addition. Same for #187 .

@qgroom
Copy link
Member

qgroom commented Sep 8, 2019

Can I push for this to be implemented?
I can't see it being controversial.
@peterdesmet @tucotuco @timrobertson100 @mdoering @pzermoglio @baskaufs

@peterdesmet peterdesmet added this to the TDWG 2019 milestone Sep 9, 2019
@peterdesmet
Copy link
Member

I've added this to the newly created milestone TDWG 2019 to group things we should tackle then.

@jhpoelen
Copy link
Author

@peterdesmet Awesome! Suggest to also consider related #187 .

@jhpoelen
Copy link
Author

jhpoelen commented Sep 3, 2020

Hey y'all TDWG-ers @peterdesmet @tucotuco @qgroom -

As part of some review on newly proposed dwc terms, I was wondering: it there any additional red tape that needs following to get the term relationshipOfResourceID added to the Resource Extension ?

See also https://twitter.com/GlobalBiotic/status/1301032135713853442 .

@tucotuco
Copy link
Member

tucotuco commented Sep 4, 2020

Hi @jhpoelen,

This request involves the creation of a new term, thus, in accordance with the Vocabulary Maintenance Standard (VMS) Section 3 it requires a full review, a public commentary, ratification if warranted, and then implementation.

We are at the stage in the process where we have to demonstrate that the three main requirements for moving forward a change request are met. From the VMS Section 3.1:

Because the primary purpose of TDWG vocabularies is to facilitate data sharing, it is necessary to show that multiple parties will benefit from the change. As such, it is a minimum requirement that two independent entities indicate that they desire the change (the demand requirement). Additionally, it is required that there is a consensus within the community that the proposed change will accomplish the desired outcome (the efficacy requirement), and that making the change will not adversely affect the interoperability of existing implementations that depend on the stability of the vocabulary (the stability requirement).

Before the proposal goes to public review, the demand requirement has to be demonstrated. It is the burden of those proposing to make this case. It has not been made explicit so far. In order to facilitate assessment, it is essential that the complete proposal be made explicit as well. Have a look at the the Guidelines for contributing and the following issues as examples of how to do that:

#237
#246

With the two prerequisites described above satisfied, the proposal can move forward. The Darwin Core Maintenance Interest Group has its open annual review meeting on 2020-09-23T14:30+00:00, during which we will attempt to make progress on as many open issues as possible. The more mature a justification is, the further along we can move a proposal.

@jhpoelen
Copy link
Author

jhpoelen commented Sep 4, 2020

@tucotuco Thanks for your comment.

This change request was already marked for "Process - Ready for public comment" by @peterdesmet over a year ago, after being submitted two years ago after having been commented on and discussed by many folks in referenced threads.

I can understand the need for red tape to add a term, but I am a bit surprised to have to wait for a year to learn that I have to enter some kind of form.

Also, this proposal is very straight forward, as it simply adds an identifier related to an existing term "relationshipOfResource", as @qgroom mentioned, so I am a bit surprised that it can't be fast tracked.

Curious to hear your thoughts on this. I'd be happy to follow the red tape if absolutely required, but I am also weary of trying to kill a mouse with an elephant gun.

@qgroom
Copy link
Member

qgroom commented Sep 4, 2020

We used the Resource Relation extension in the TrIAS project to publish host-parasite data to GBIF. https://www.gbif.org/species/143610775/verbatim
The addition of the relationshipOfResourceID would be an improvement to the machine readability of datasets like this.
Currently, we are writing a project proposal where we envisage publishing considerably more interaction data to GBIF, with the intention that it can be harvested by GloBI and others.
Use of the Resource Relation extension and relationshipOfResourceID would allow datasets such as iNaturlist to publish their interaction data. Perhaps @loarie would comment.

To some extent this is a chick-and-egg problem, until it is easy to share interaction data with GBIF people are going to find alternative non-standard solutions.

@loarie
Copy link

loarie commented Sep 4, 2020

iNat's current model for interactions is very primitive. An observation (e.g. of American bullfrog) has an association which describes the interaction (e.g. eating) and the identification of the other taxon (e.g. California newt).

We've long wanted to change it so that interactions connect two observations (e.g. Observation 1 of American bullfrog and Observation 2 of California newt are connected via an interaction of type 'eating'). Among other things, this will allow more sophistication in how the other species (e.g. California newt) is identified. But we haven't done this.

The other issue here is the taxonomy of the interactions. On iNaturalist we have 'observation fields' which are curated by the community and thus contain lots of duplicate or semi overlapping fields (e.g. count, number of individual, abundance, etc.). We also have 'annotations' which are configured not by the community but by site admins. Currently these include sex, life stage, dead/alive for animals and flowering phenology for plants.

All the interaction data is currently stored in observation fields with lots of duplicate and semi-overlapping interaction terminology. We've resisted transferring these over to annotations until there's more clarity about what the taxonomy for interactions should be. I know there have been some debates about what constitutes pollinating as opposed to visiting a flower but not necessarily pollinating etc.

@jhpoelen
Copy link
Author

jhpoelen commented Sep 4, 2020

Thanks @tucotuco @loarie @qgroom for taking the time to comment and share your thoughts.

As @loarie mentioned, iNaturalist has a way to annotate interactions via iNaturalist observation fields (e.g., https://www.inaturalist.org/observation_fields/879 a related observation https://www.inaturalist.org/observations/2309983).

These interactions associate an observation (e.g., https://www.inaturalist.org/observations/2309983) via a observation field (e.g., "eaten by" https://www.inaturalist.org/observation_fields/879) to a iNaturalist taxon (e.g., northern Sea otter (Enhydra lutris kenyoni) https://www.inaturalist.org/taxa/133061 ). As far as I can tell, these fields are quite popular, as at least 127k of these annotations (as seen on 2020-09-04) have been added to research-grade iNaturalist observations and indexed by GloBI.

I agree with @loarie that many improvements can be made (e.g., introducing observation<> observation relations, curation of interaction claims, curating/adopting an biotic interaction type taxonomy).

However, the existing iNaturalist approach works pretty good already, because their observation fields are controlled and have identifiers. Because iNaturalist keeps a controlled list of interaction terms with identifiers (e.g., "https://www.inaturalist.org/observation_fields/879") in addition to some short descriptive name (e.g., "Eaten by"), GloBI was able to unambiguously map appropriate interaction terms into a interaction term taxonomy provided by OBO Relation Ontology (e.g., "eaten by" http://purl.obolibrary.org/obo/RO_0002471 ). Without these identifiers that exist in both iNaturalist and OBO RO schemes, the mapping from the one into the other would have been much harder to maintain.

So, only after adding the proposed term, relationshipOfResourceID alongside of the existing term relationshipOfResource in the existing Resource Relation extension, projects like iNaturalist, TriAS can unambiguously point to, and document, the relationship type they used in the Resource Relation extension.

Note also, that the Field Museum and other EMu (a collection management system) users like the Smithsonian are planning to adopt the Resource Relation extension to document their specimen<>specimen associations.

To make a long story short:
(a) adding the relationshipOfResourceID is both necessary and (conceptually) easy.
(b) Also, there's a long list of projects, institutions that support this.
(c) Finally, existing integrations have shown that having identifiers for interaction types is crucial for data exchange.

PS An extensive map that GloBI uses to translate iNaturalist observation fields to OBO Relation Ontology terms can be found at https://github.com/globalbioticinteractions/inaturalist/blob/main/interaction_types.csv .

PS2 Note that the Resource Relation extension would allow for relating occurrence ids to well defined taxon records, in addition to relating occurrence ids to other occurrence id.

@jhpoelen
Copy link
Author

jhpoelen commented Sep 4, 2020

fyi @magpiedin @seltmann @cmungall

@qgroom
Copy link
Member

qgroom commented Sep 5, 2020

GBIF recently opened their Guide to publishing sequence-derived data for public peer-review. In it they don't seem to mention linking eDNA derived observations to the observation of the host organism that the sample may have been harvested from. I've raised an issue on this #gbif/doc-publishing-dna-derived-data#55.
This seems like another community who would find the Resource Relation extension and relationshipOfResourceID useful.

@dschigel @abissett would you like to comment on, or support the proposal to add relationshipOfResourceID?

@dschigel
Copy link

dschigel commented Sep 6, 2020

Hi Quentin, same- and different trophic / interactions level co-occurrences can be to some extent handled by the Event core (in this example host organisms are missing, because "host" is soil https://www.gbif.org/dataset/3b8c5ed8-b6c2-4264-ac52-a9d772d69e9f), but many interaction attributes, both registered / observed and interpreted, would be lost. I have a few ideas here, but sadly this group is missing from virtual TDWG 2020 programme https://www.tdwg.org/community/interaction. For lots of eDNA data host and other co-occurring species would be extremely important. I need to refresh my "this is how much you can do with DwC today" knowledge when in comes to interactions before I can try to answer your question.

@qgroom
Copy link
Member

qgroom commented Sep 6, 2020

but sadly this group is missing from virtual TDWG 2020 programme https://www.tdwg.org/community/interaction.

Yes, this is a pity as I know a number of people who would have been interested in a meeting. If we can put an agenda together we could always convene an ad hoc meeting.

@baskaufs
Copy link

After a bike ride and some time to think, I'll criticize my own suggestion from seven years ago (and possibly also display my ignorance of the Web Annotation Recommendation). The "annotator" in the example http://guid.mvz.org/agents/James_L_Patton did not create the relationship http://guid.mvz.org/relationshipType/motherOf. The annotator created the assertion that the relationship existed. So there probably is a good way to use the Web Annotation model here, but to link the person with the assertion of the relationship, rather than linking the annotator directly to the relationship I suggested in 2013.

It seems like we have a number of use cases:

  1. Darwin Core ResourceRelationship relationships.
  2. Taxon Names and Concepts assertions of synonymy, etc.
  3. Some other kinds of assertions like specimen A is a duplicate of specimen B.

satisfied by a common design pattern somehow involving the Web Annotations model to document that someone asserted that a relationship existed. That bumps this question to a broader level than Darwin Core (TAG?).

I don't think that what I've brought up here should derail the proposals for creating the new terms for "spreadsheet" use of ResourceRelationship terms. But I think before we propose a Darwin Core solution for the issue of expressing this as Linked Data (or some other way of modeling non-flat relationships in a model), there should be a TDWG-wide examination of design patterns that might work for this. Perhaps that is what the Annotations group is working on. I haven't been tracking the activities of that group, but plan to attend the meeting on Friday to find out.

Ping @chicoreus (Annotations) @nielsklazenga (TNC)

@baskaufs
Copy link

Ping also @dshorthouse since the attribution group is also using/looking at the Web Annotations model

@nielsklazenga
Copy link
Member

@baskaufs Not sure I understand. What would the target of the annotation be?

@azaroth42
Copy link

azaroth42 commented Sep 24, 2020

Hi all! Brief intro: Rob Sanderson, co-chair for Annotations and JSON-LD in the W3C, co-editor for IIIF, co-chair & editor for linked.art, and (newly) director for cultural heritage metadata at Yale University, which includes the Peabody NHM, that has implemented Darwin Core to which I am new and keenly interested. 👋

I feel that annotations are a reasonable way to manage broad relationships, such as "identifies", "classifies" or "describes". This is the intent of the motivation in the Annotation spec. We were cautious in the work to not go too far into reinventing RDF, named graphs, or reification of relationships, as those are all existing technologies that we have in our toolbox already. Thus the specification doesn't document directly the possible pattern:

{
  "type": "Annotation",
  "source": "uri-of-parent-specimen",
  "motivation": "tdwg-vocab:motherOf",
  "target": "uri-of-child-specimen",
  "creator": "uri-of-annotator-asserting-relationship"
}

This was felt to be possible but not the best practice to be promoting.
However it is only one step away from this pattern for classifying a specimen as a particular controlled term:

{
  "type": "Annotation",
  "source": "uri-of-species",
  "motivation": "classifying",
  "target": "uri-of-specimen",
  "creator": "uri-of-assigner"
}

Which was seen as perfectly okay.

In CIDOC-CRM there is an explicit activity for the assertion of relationships: an AttributeAssignment. This is what we use in the Art Museum domain for this sort of thing. The advantage is that the term for the relationship can be very vague or localized -- it is moved out of the ontology space and into the vocabulary space. This has a social advantage: We believe we know how to manage and use vocabularies, but ontologies are sometimes seen as complex and unwieldy. In reality, they are just the dorsal and ventral sides of the same specimen. (Did I do that translation correctly from numismatics?)

If there is a semantically clear relationship (e.g. motherOf), in a linked data environment, then the right modeling is a simple triple (X motherOf Y). If there's a need to associate further data with that assertion (meta-metadata) then the right answer is not so clear, and different communities have gone in different directions, typically using the tools that they have already - be that named graphs, property graphs, or various flavors of reification with some local ontology. If there isn't a good local ontology pattern to use, then best practice would be to adopt an existing one ... and the annotation pattern allows for interoperability with other systems, as well as paving the way for integration with the Annotation group, which I hope is using the model [as Bob and Paul Morris were both instrumental in the pre-W3C work for it!]

@chicoreus
Copy link

@azaroth42 excellent. How about encapsulating the domain specific information in the body of the annotation? An example, straying down the path away from resource relations to determination of scientific names for resources (which could be identifications or classifications in the anno:Motivation context), a similar construct could be used to assert a resource relationship within the body using Darwin Core terms.

{ 
    "type": "Anotation".
    "motivation": "classifying".
    "target": "uri of target occurrence",
    "body": { 
           "type": "dwc:Identification",
            "dwc:scientificName":"Murex pecten Lightfoot, 1786",
            "dwc:identifiedBy": "E. Vokes",
            "dwc:dateIdentified": "1965"
      },
      "creator": "https://orcid.org/0000-0002-3673-444X",
      "created": "2020-09-24"
}

@dshorthouse
Copy link

@chicoreus This is interesting, but confuses me perhaps because I've lost sight of what are the implications (if any) for E. Vokes. Does this mean, "Paul Morris classified an occurrence today as having a determination of Murex pecten Lightfoot, 1786 made by E. Vokes in 1965"? Are you really classifying in this sense?

@chicoreus
Copy link

@dshorthouse that could be a transcription of a paper label from 1965 as an annotation document in 2020, something that would make more sense if the target of the annotation was a set of botanical duplicates, and the annotation was asserting that the label on one member of the set applied to other members of the set, or the motivation of classifying might be wrong, and the motivation should be linking rather than classifying. That's somewhat distracting, but somewhat right on line with my question, about what information is best placed as metadata about the annotation, and what information is best placed as the payload of the annotation in a body - the difference in dates highlighting that question.

@chicoreus
Copy link

A different more generalized that gets away from the uncertainty about motivation:

{ 
    "type": "Anotation".
    "motivation": "classifying".
    "target": "uri of target occurrence",
    "body": { 
          "type": "dwc:Identification",
           "dwc:scientificName":"Lutraria angustior Philippi, 1844",
           "dwc:identifiedBy": "Gonzalo Giribet",
           "dwc:dateIdentified": "2020-09-24"
      },
      "creator": "https://orcid.org/0000-0002-5467-8429",
      "created": "2020-09-24"
  }

@chicoreus
Copy link

We should try to phrase an example of an assertion that a resource relation exists as an annotation though. More germane to the issue at hand.

@azaroth42
Copy link

Yes, additional data can be added into the body of the annotation like this. However, the further down the path one goes, the further away from existing systems one gets. There are many systems that will manage the simple case today ... but they'll ignore the additional properties on the body. As a modeling construct, it's fine (in my opinion). As an domain level interoperability mechanism, it's okay as it's not ambiguous and the semantics of the annotation model are maintained. As a general practice across domains and across systems unaware of Darwin Core, on the other hand, the extra fields will probably not survive.

If domain-neutral system support is a requirement, then one solution discussed was to treat the body as an opaque document (like the human readable text of a comment style annotation), but instead to embed a serialization of the entity.

@baskaufs
Copy link

Just trying to catch up on this. What we would like, in the end, is to have a way to programmatically "translate" as resourceRelationship documented in a spreadsheet into an RDF graph of a consistent shape that could be reliably queried.

As @azaroth42 said, the obvious way to assert a relationship would be a single triple where the predicate indicates the relationship. But there are two problems with that. One is that the resource relationship allows any user or community to "mint" their own relationship types without having to turn it into a standardized property. The other is that the simple triple model does not allow provenance information to be associated with the triple ("who made the assertion?"). The resourceRelationship "spreadsheet" pattern is essentially reification with the ability to assign an identifier to the relationship assertion ("pseudo-triple") and then say things about it like who made the assertion and when. I suppose one could do a classic RDF reification to generate an IRI for the triple that corresponds to the resourceRelationships "pseudotriple", then use the IRI of the rdf:Statement instance as the target of the annotation.

But I think RDF reification never caught on and it would require there to be a predicate IRI to use as the value of rdf:predicate.

Getting way out of my comfort zone on this...

@azaroth42
Copy link

For someone out of their comfort zone, you have it exactly correct :)

@jhpoelen
Copy link
Author

jhpoelen commented Sep 24, 2020

For all of those listening and commenting: I much enjoy the inspired discussions around all things relations and annotations. However, I'd like to point out that the original topic of this issue is to add a (optional) term to the existing resource relation extension. Is there a way to split the more general conversation on how to annotate/relate things from this "add term" request? I'd be interested to join that conversation while the proposed term relationshipOfResourceID is being added to the existing Resource Relationship extension.

@chicoreus
Copy link

@jhpoelen good plan - anyone should feel free to create an issue in the Annotations Interest Group space to continue the more general discussion https://github.com/tdwg/annotations/issues

@baskaufs
Copy link

baskaufs commented Oct 6, 2020

Just to recap why we got off track, there was a suggestion:

"dwc:relationshipOfResourceID" might rather be identified/minted as "dwciri:relationshipOfResource"

The explanation of why we didn't think dwciri:relationshipOfResource should be minted was what got us off the focal topic.

Having had some time to think about this more, I think that the existence of the proposed term dwc:relationshipOfResourceID would actually facilitate the translation to RDFizing by reification that was the topic of the side conversation. If there were a set of rules for translating the resourceRelationship data from a spreadsheet into RDF, it might be something like this:

the value of dwc:resourceID -> object of triple containing rdf:subject
the value of the proposed dwc:relationshipOfResourcID -> object of triple containing rdf:predicate
the value of dwc:relatedResourceID -> object of triple containing rdf:object

assuming that the IDs (for at least the values used with rdf:subject and rdf:predicate) were IRIs. So I guess what I'm saying is that I think the round-about discussion actually supports the proposal since it may give us a way to complete the task that the RDF guide dodged: figuring out how to recommend that people "convert" resourceRelationship data from spreadsheets into RDF.

@jhpoelen
Copy link
Author

jhpoelen commented Oct 6, 2020

@baskaufs Thanks for sharing your insights. Like you, I can see many exciting possibilities open up after introducing the proposed dwc:relationshipOfResourceID . Having this predicate IRI to connect the two resources would make the translation from rdf-land to speadsheet-land a little easier*, and would provide for a way to go beyond the star schema.

As far as adding the term to the resource relationship extension:

Are we there yet? If not, what else is needed to add the term?

* I could argue that RDF-land is just a simplified version of spreadsheet-land in which only three (or four with the graph namespace) columns are allowed.

@debpaul
Copy link

debpaul commented Apr 8, 2021

@tucotuco as @jhpoelen writes:

Are we there yet? If not, what else is needed to add the term?

a) Is there consensus now?

@tucotuco Have the demand and efficacy requirements been met?
b) Please advise so the Exec can help to speed this up. Many thanks!

@tucotuco
Copy link
Member

Now that the term addition issue (#283) is in progress, closing this issue.

@jhpoelen
Copy link
Author

jhpoelen commented Nov 3, 2023

Apologies for the cross-posting, just want to make sure that contributors to this issue get updated on the outcomes of the use of Resource Relationship extension.

Nov 3, 2023
Field Museum and iNaturalist Adopt Darwin Core Resource Relationship Standard to Share Species Interaction Records
The Field Museum in Chicago and iNaturalist capture detailed records on how species interact. They both showed their capacity to innovate by using the recently improved Darwin Core Resource Relationship extensions to publish their interaction records. By using this standards based approach, they facilitate access to the valueable biodiversity knowledge they keep, and provide examples for others to follow. More ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests