Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New term - environmentalMaterial #40

Open
tucotuco opened this issue Nov 13, 2014 · 14 comments
Open

New term - environmentalMaterial #40

tucotuco opened this issue Nov 13, 2014 · 14 comments
Assignees
Labels

Comments

@tucotuco
Copy link
Member

tucotuco commented Nov 13, 2014

  • Submitter: John Wieczorek on behalf of the May 2013 GBIF hackathon-workshop on Darwin Core and sample data
  • Justification (why is this term necessary?): see "Meeting Report: GBIF hackathon-workshop on Darwin Core and sample data (22-24 May 2013)" at http://www.gbif.org/orc/?doc_id=5424
  • Proponents (at least two independent parties who need this term): see "Meeting Report: GBIF hackathon-workshop on Darwin Core and sample data (22-24 May 2013)" at http://www.gbif.org/orc/?doc_id=5424.

Proposed attributes of the new term:

  • Term name (in lowerCamelCase): environmentalMaterial
  • Class (e.g. Location, Taxon): Event
  • Definition of the term: The medium or part of the medium of an environmental system.
  • Usage comments (recommendations regarding content, etc.): Recommended best practice is to use a controlled vocabulary such as the set of subclasses of the environmental material class (http://purl.obolibrary.org/obo/ENVO_00010483) of the Environment Ontology (ENVO). Values are to represent media as being composed primarily of the named entity, rather than restricted entirely to that entity. For example, "envo:liquid water" is to be understood as "environmental material composed primarily of some chebi:water" in liquid form.
  • Examples: envo:soil, envo:sediment, envo:saline water
  • Refines (identifier of the broader term this term refines, if applicable):
  • Replaces (identifier of the existing term that would be deprecated and replaced by this term, if applicable):
  • ABCD 2.06 (XPATH of the equivalent term in ABCD, if applicable): not in ABCD

Original first comment:

Was https://code.google.com/p/darwincore/issues/detail?id=191

Reported by gtuco.btuco, Sep 25, 2013

==New Term Recommendation==

Submitter: John Wieczorek on behalf of the May 2013 GBIF hackathon-workshop on Darwin Core and sample data

Justification: see "Meeting Report: GBIF hackathon-workshop on Darwin Core and sample data (22-24 May 2013)" at http://www.gbif.org/orc/?doc_id=5424

Term Name: environmental material
Identifier: http://purl.obolibrary.org/obo/ENVO_00010483
Namespace: http://purl.obolibrary.org/obo/
Label: Environmental Material
Definition: Material in or on which organisms may live.
Comment: Examples: "scum", "http://purl.obolibrary.org/obo/ENVO_00003930". For discussion see https://code.google.com/p/darwincore/wiki/Event (there will be no further documentation here until the term is ratified)
Type of Term: http://www.w3.org/2000/01/rdf-schema#Class
Refines:
Status: proposed
Date Issued: 2013-09-25
Date Modified: 2013-09-25
Has Domain:
Has Range:
Refines:
Version: http://purl.obolibrary.org/obo/ENVO_00010483
Replaces:
IsReplaceBy:
Class: http://rs.tdwg.org/dwc/terms/Event
ABCD 2.0.6: not in ABCD (someone please confirm or deny this)

Sep 26, 2013 #1 gtuco.btuco
Based on initial discussions on tdwg-content, modified the proposal to make a new DwC property term that recommends the ENVO class as the range, as follows:

Term Name: environmentalMaterial
Identifier: http://rs.tdwg.org/dwc/terms/environmentalMaterial
Namespace: http://rs.tdwg.org/dwc/terms/
Label: Environmental Material
Definition: Material in or on which organisms may live. Recommended best practice is to use a controlled vocabulary such as defined by the environmental feature class of the Environment Ontology (ENVO).
Comment: Examples: "scum",
"http://purl.obolibrary.org/obo/ENVO_00003930". For discussion see https://code.google.com/p/darwincore/wiki/Event (there will be no further documentation here until the term is ratified)
Type of Term: http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
Refines:
Status: proposed
Date Issued: 2013-09-26
Date Modified: 2013-09-26
Has Domain:
Has Range:
Refines:
Version: environmentalMaterial-2013-09-26
Replaces:
IsReplaceBy:
Class: http://rs.tdwg.org/dwc/terms/Event
ABCD 2.0.6: not in ABCD (someone please confirm or deny this)

@tucotuco
Copy link
Member Author

tucotuco commented Mar 4, 2015

See also Issue #37, Issue #38, and Issue #39.

@tucotuco
Copy link
Member Author

Opened public discussion on tdwg-content (http://lists.tdwg.org/pipermail/tdwg-content/2015-March/003507.html).

@tucotuco
Copy link
Member Author

tucotuco commented Sep 9, 2020

This proposal has already passed through public review in 2015 without objections, however it is not clear that demand has been demonstrated.

@tucotuco
Copy link
Member Author

tucotuco commented Sep 9, 2020

There was a public review of this and related proposals in 2015 in which there were observations that the proposal as presented does not make sense. The ENVO classes can not be Darwin Core properties. Instead, new properties would have to be minted for Darwin Core with the recommendation to have the range of values come from ENVO classes.
In any case, There is no evidence in the discussion history for demand for these terms. If anyone wants to move this proposal forward, please provide a new term definition addressing the property/class issue and provide evidence of sufficient demand.

@tucotuco
Copy link
Member Author

tucotuco commented Sep 9, 2020

I was in error to note that there was a need for a demonstration of demand. The proposal was a direct result of an international workshop. Also, the revised term proposal has already been proposed. With an updated comment showing just the proposal.

@tucotuco
Copy link
Member Author

tucotuco commented Sep 9, 2020

The definitive term change proposal under consideration is at the beginning of the first comment in this issue.

Updated term change request:

  • Submitter: John Wieczorek on behalf of the May 2013 GBIF hackathon-workshop on Darwin Core and sample data
  • Justification (why is this term necessary?): see "Meeting Report: GBIF hackathon-workshop on Darwin Core and sample data (22-24 May 2013)" at http://www.gbif.org/orc/?doc_id=5424
  • Proponents (at least two independent parties who need this term): see "Meeting Report: GBIF hackathon-workshop on Darwin Core and sample data (22-24 May 2013)" at http://www.gbif.org/orc/?doc_id=5424.

Proposed attributes of the new term:

  • Term name (in lowerCamelCase): environmentalMaterial
  • Class (e.g. Location, Taxon): Event
  • Definition of the term: The medium or part of the medium of an environmental system.
  • Usage comments (recommendations regarding content, etc.): Values are to represent media as being composed primarily of the named entity, rather than restricted to that entity. For example, "ENVO:water" is to be understood as "environmental material composed primarily of some CHEBI:water". Recommended best practice is to use a controlled vocabulary such as the set of subclasses of the environmental material class (http://purl.obolibrary.org/obo/ENVO_00010483) of the Environment Ontology (ENVO).
  • Examples: envo:soil, envo:sediment, envo:saline water
  • Refines (identifier of the broader term this term refines, if applicable):
  • Replaces (identifier of the existing term that would be deprecated and replaced by this term, if applicable):
  • ABCD 2.06 (XPATH of the equivalent term in ABCD, if applicable): not in ABCD

@tucotuco
Copy link
Member Author

@pbuttigieg Would you be willing to pre-assess this proposal, as it has been a long time in the making. Does it still make sense as proposed?

@tucotuco tucotuco changed the title environmentalMaterial New term - environmentalMaterial Apr 19, 2021
@baskaufs
Copy link

This term should have a dwciri: analog. Here is what I believe would be appropriate metadata for dwciri:environmentalMaterial:

  • Definition of the term: The medium or part of the medium of an environmental system.
  • Usage comments (recommendations regarding content, etc.): Values are to represent media as being composed primarily of the named entity, rather than restricted to that entity. For example, "ENVO:water" is to be understood as "environmental material composed primarily of some CHEBI:water". Recommended best practice is to use an IRI from a controlled vocabulary such as the set of subclasses of the environmental material class (http://purl.obolibrary.org/obo/ENVO_00010483) of the Environment Ontology (ENVO).
  • Examples: http://purl.obolibrary.org/obo/ENVO_00001998, http://purl.obolibrary.org/obo/ENVO_00002007, http://purl.obolibrary.org/obo/ENVO_00002010

I actually have some questions about the usage comments here and in the proposal text, but I will put that in a different comment box.

@baskaufs
Copy link

I have some real questions about the usage comments and examples.

  1. In the usage comments, you use ENVO:water with "ENVO" capitalized, while in the examples you use envo:soil with "envo" in lower case. The use of case should be consistent.

  2. The "controlled vocabulary" examples are fraught with problems. Based on the form of the examples, here is what I think you are actually saying: "take the string envo: (which looks like a namespace abbreviation, but isn't actually defined anywhere) and concatenate it to the English label that is currently being used for the class term in the ENVO ontology." In the TDWG universe, we routinely use compact URIs, or "CURIEs" to abbreviate a term IRI and as a shorthand. For example we use dwc:country as an abbreviation for http://rs.tdwg.org/dwc/terms/country. That works because dwc: is a well-known namespace abbreviation for http://rs.tdwg.org/dwc/terms/ and since TDWG uses non-opaque local identifiers like country in its IRIs, people can pretty much "read" the CURIE and know what it means.

But for better or worse, OBO ontologies use opaque local identifiers. I'm not sure what the consensus namespace abbreviations is for ENVO. I suppose it might be envo: = http://purl.obolibrary.org/obo/ENVO_ or maybe ENVO: = http://purl.obolibrary.org/obo/ENVO_. But if that is the case, then the CURIE for soil would be envo:00001998, not envo:soil. envo:soil isn't anything real, as far as I know other than a guess at a namespace abbreviation appended to a label.

This problem is illustrated with the ENVO:water example. As far as I can tell, "water" in ENVO is http://purl.obolibrary.org/obo/ENVO_00002006. But if you actually go to the page for the term: http://purl.obolibrary.org/obo/ENVO_00002006, you see that the label used there is actually "liquid water". So following the pattern in the examples, the "controlled value" should be envo:liquid water or maybe ENVO:liquid water, but not ENVO:water. If you put "water" into the search box, you get this as the second result:

http://purl.obolibrary.org/obo/ENVO_00002006 (ENVO):
*  water in Ontobee: ENVO
*  liquid water in Ontobee: ENVO

That implies that ENVO accepts "water" and "liquid water" as alternate labels. Can people just pick which one they like better to use as the "controlled" value?

The problem is that ENVO is an ontology and not actually a controlled vocabulary and to try to use it for that, we are conflating IRIs (in the form of CURIEs), labels, and controlled value strings, which are all actually different from each other.

It seems to me that it would make more sense if we want people to use ENVO terms as controlled values to just have them use the English label as shown on the term page. That would make the values in the example be: liquid water, soil, sediment, and saline water without any pseudo-namespaces. Of course that is a problem if people mix in other ontologies besides ENVO and use non-unique labels. A non-ambiguous solution would be to use dwciri:environmentalMaterial with a full IRI value from ENVO, but that would be opaque and I suppose people would not like that.

Another alternative, which in my opinion would probably be the best, would be to just go ahead and make a real controlled vocabulary that specifies the required controlled value strings. The definitions could still be linked to ENVO. For an example, see the draft controlled vocabulary for subjectPart that we are completing in Audubon Core. In that controlled vocabulary, we link each controlled vocabulary term to an ontology definition from OBO, but explicitly specify the controlled value string to be used, following the convention of camelCase with no spaces. This would not be that hard to implement if you really want people to use those ENVO subclasses -- it would just be a matter of setting up a table similar to the example I provided.

But the currently proposed design pattern is just asking for people to effectively be guessing or making up their own "controlled" values.

@baskaufs
Copy link

I have just spent some additional time investigating the possibilities of auto-generating controlled value strings from labels using data acquired straight from Ontobee using a SPARQL query. You can run my test at the Ontobee endpoint.

prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
prefix owl: <http://www.w3.org/2002/07/owl#> 
SELECT distinct ?subclass ?label
WHERE { 
bind(<http://purl.obolibrary.org/obo/ENVO_00010483> as ?rootIri)
?subclass rdfs:subClassOf ?rootIri.
?subclass rdfs:label ?label.
filter(contains(str(?subclass), "ENVO"))
minus {?subclass owl:deprecated "true"^^xsd:boolean.}
}

This query gets the IRIs and labels for direct subclasses of the environmental material class. One should be able to extend this to all child subclasses using the property path operator * like this:

?subclass rdfs:subClassOf* ?rootIri.

but doing so results in this error: Exceeded 1000000000 bytes in transitive temp memory., which seems pretty weird to me since there is a finite number of subclasses and it shouldn't be that hard to just get them all with their labels. Perhaps someone better at SPARQL can figure out what the problem is. If I have time, I may just try to download the whole ontology and load it into a local SPARQL endpoint to see if it works better.

I did learn a few useful things from this exercise. One is that there is some inconsistency in how the labels are expressed. Some are plain literals, some are language-tagged (@en) literals, and some are literals datatyped as xsd:string with duplication across these three categories. So some de-duplicating would be required after getting all of the labels.

The other thing is that there are just many, many values here, including many obscure things like "bacon curing brine", "flue gas desulfurization material", and "congelation ice in a fresh water body". That means that the problem of proliferation of label variants will be particularly acute in this case if we depend on people constructing their own controlled values from label strings.

@tucotuco
Copy link
Member Author

@baskaufs Thanks for the thorough investigation of the proposal. It is interesting to see the issues that arise from what seems like a natural extension of capabilities by invoking an ontology as a source for a controlled vocabulary. It sounds like the route of defining a controlled vocabulary coupled to ontology definitions is the sensible way to go, but I wouldn't suggest going that far in this proposal. No one has requested it and the work would be tremendous. We have a couple of alternatives. One is to abandon the proposal, especially since there hasn't been any expressed interest since the 2013 meeting that generated it. Another alternative is to modify the proposal to be less proscriptive about the vocabulary to use, specifically, "Recommended best practice is to use a controlled vocabulary. Values are to represent the environmental material as being composed primarily of the named entity, rather than restricted entirely to that entity. For example, 'liquid water' is to be understood as 'environmental material composed primarily of water in liquid form'."

@baskaufs
Copy link

@tucotuco I think that the mechanism I suggested for creating controlled values is viable -- see the suggestion I made for values for the dwc:biome proposal. However, in this case, it seems to me that the real issue is that there are just so many subclasses of the environmental material class that it is not reasonable to suggest that they could be used to create a manageable controlled vocabulary. I would suggest shelving this proposal until its proponents suggest a viable mechanism for managing a controlled vocabulary for the property. If nobody can successfully do that, I would say this proposal should be considered unimplementable.

@tucotuco tucotuco removed this from the The Rush of the April Fools milestone Apr 30, 2021
@tucotuco tucotuco added the Controversial The solution for the issue has not reached a consensus. label Apr 30, 2021
@tucotuco
Copy link
Member Author

The Darwin Core Maintenance Group feels that this proposal has not reached a sufficient state of maturity and recommends that a Task Group be formed to develop solutions to the issues raised.

@tucotuco
Copy link
Member Author

This issue has been taken up by the Realm and Biome Task Group lead by convener @CecSve. Charter not yet published.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants