Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Primary Deliverable - MaterialSample definition #2

Closed
Jegelewicz opened this issue Aug 19, 2021 · 60 comments
Closed

Primary Deliverable - MaterialSample definition #2

Jegelewicz opened this issue Aug 19, 2021 · 60 comments

Comments

@Jegelewicz
Copy link
Collaborator

Jegelewicz commented Aug 19, 2021

Current Definition

http://rs.tdwg.org/dwc/terms/MaterialSample

A physical result of a sampling (or subsampling) event. In biological collections, the material sample is typically collected, and either preserved or destructively processed.

Please suggest changes/improvements in this issue.

See also https://github.com/tdwg/material-sample/blob/main/primary_deliverable/MaterialSample.markdown

See also MaterialSample terms Google Sheet

@tucotuco
Copy link
Member

Additional related commentary in #3 (comment).

@Jegelewicz
Copy link
Collaborator Author

From #3 (comment)

OK, so what is a MaterialSample then? I am much more fuzzy about this. It seems that the two necessary conditions are being a material thing (e.g. images don't qualify), and being sampled from something. There is no assumption that it is derived from an organism as air or water samples free of organisms could be material samples. I guess that it has something similar to the accession component that I used to define specimens, although I'm not sure about that. If the material is not destructively sampled, the DwC definition implies that it should be preserved, although I'm unsure that is the case for every material sample, e.g. ones that may be thrown out after measurements or documentation is complete. There are also samples that were obviously sampled for the purpose of being destroyed - in my mind that is a difference from specimens since I don't think specimens are generally intended to be destroyed intentionally. So a material sample can be derived from an organism, but doesn't have to be. A material sample can be a specimen, but doesn't have to be. A specimen does not have to be a material sample -- clearly the Bicentennial Oak was never the result of a sampling event. A material sample might be preserved but doesn't have to be. Honestly, the definition of MaterialSample is so fluid that it is hard for me to see why it is useful to assert that something is an instance of it.

@dr-shorthair
Copy link

Also see #3 (comment):

  • a sample is representative of a larger thing. It is the result of an act of sampling (see https://www.w3.org/TR/vocab-ssn/#Sampling). Usually the intention is that it will be used for observations
  • a specimen is an accessioned thing. Usually the intention is that it is available for observations.

A sample is not necessarily a material thing, social science samples are often not.
A sample might not be accessioned (particularly if it will be destroyed as part of some analytical process).

I think specimens are always material things.

@Jegelewicz
Copy link
Collaborator Author

#3 (comment)

Instances of MaterialSample are aggregates of physical material that are extracted ("collected") from the natural environment, and held in the custody of humans. Following the suggestion of @baskaufs that instances of a class should be defined by shared properties, these are physical items that may be preserved or destroyed, curated or accessioned, borrowed and loaned, subsampled or aggregated to yield new instances of MaterialSample, and otherwise cared for and/or maintained in some way by humans.

@deepreef
Copy link

Thanks for bringing that here! (And sorry for not thinking to do so myself). Just to be clear, though, I was not intending to propose a formal definition; but rather I tried to capture my own thinking of what a MaterialSample is, in "plain English"(ish).

@Jegelewicz
Copy link
Collaborator Author

But it is good!

@smrgeoinfo
Copy link

a materialSample is an object separated from the material world, intended to be representative of some sampled feature.

Samples are typically collected with the intention of making measurements/observations on the sample that will characterize the sampled feature.
A sample might undergo some curation process and become a specimen (as well as a sample).
A MaterialSample might be an aggregation of material unified by containment in some container, e.g. rock powder in a bag, water in a bottle, blood in a syringe.
A MaterialSample might be a self-connected object like a leaf from a tree, or a piece of rock from an outcrop.
MaterialSamples can be derived from other material samples, e.g. the legs from a grasshopper, or zircon crystals from a rock sample.
The sampled feature can be hierarchical; e.g. a material sample might be a leaf from a particular oak tree (organism) considered the sampled feature, or the sampled feature might be considered the taxon to which that individual oak tree belongs.

@dr-shorthair
Copy link

A sample might undergo some curation process and become a specimen (as well as a sample).

Maybe: "A sample might undergo some curation and accession process and become a specimen (as well as a sample)."

@RogerBurkhalter
Copy link

I recall a previous comment about accession and asked our Registrar about the legal meaning, which is, in part, legal ownership. Some samples and specimens we can never own, i.e. fossils/archaeological remains from US Federal lands, other countries have similar laws, but we do reposit them. I suggest "accession or reposit".

@Jegelewicz
Copy link
Collaborator Author

@deepreef
Copy link

deepreef commented Oct 7, 2021

Wouldn't the controlled vocabulary examples listed for environmentalMaterial also be among the controlled vocabulary examples for materialSampleType?

Or have I misunderstood the purpose & function of environmentalMaterial?

@Jegelewicz
Copy link
Collaborator Author

Wouldn't the controlled vocabulary examples listed for environmentalMaterial also be among the controlled vocabulary examples for materialSampleType?

I would think so.

@Jegelewicz
Copy link
Collaborator Author

Jegelewicz commented Oct 14, 2021

If we are going to really flesh out a "Material" class in Darwin Core, the first step should be defining the class. We have MaterialSample to begin with, but I think we have agreed that the definition is not working for everyone. While some seemed opposed to it, I think the broadest possible definition for a Darwin Core "Material" class would be the Dublin Core PhysicalObject:

Term Name:  PhysicalObject

Label description
URI: http://purl.org/dc/dcmitype/PhysicalObject
Label: Physical Object
Definition: An inanimate, three-dimensional object or substance.
Comment: Note that digital representations of, or surrogates for, these objects should use Image, Text or one of the other types.
Type of Term: Class
Member Of: http://purl.org/dc/terms/DCMIType
Version: http://dublincore.org/usage/terms/history/#PhysicalObject-003

For me, this also removes the problems of human and machine observations (images, etc) from our discourse. The next question for me is are we only thinking about "curated" objects in Darwin Core? If that is true, then perhaps the best definition for MaterialSample might be:

All or portions of physical objects (as defined in Dublin Core) that are extracted ("collected") from the natural environment, and held in the custody of humans.
Modified from #2 (comment)

The problem I see in this definition has to do with LivingSpecimen, which may not really be "extracted" from the natural environment. So how about

All or portions of physical objects (as defined in Dublin Core) that may or may not be extracted ("collected") from the natural environment but are managed or curated by humans.

@dr-shorthair
Copy link

Is 'inanimate' a problem?

@Jegelewicz
Copy link
Collaborator Author

Is 'inanimate' a problem?

I would think so - a gorilla in the zoo, a tree in the botanic garden. Thanks for pointing that out! So now what.....I need to have a weekend!

@deepreef
Copy link

We already have a "Material" class in DwC (MaterialSample), so I assume you're not proposing we change the term itself, but just provide a better definition - correct?
The term itself seems fine to me:
"Material" refers to matter (physical).
"Sample" implies that it's the subset of all physical things that humans capture or care for in some way.

I think it's beyond the scope of DwC to be defining terms that apply to literally everything that is a physical object (atoms? galaxies?). I think what we're interested in is the subset of physical objects that we humans handle or maintain or process in some way. I think dc:PhysicalObject could be indicated as the superclass of dwc:MaterialSample, and that classification could be part of the definition for the latter.

As for wording, I would favor something like:

Any physical object (as defined in Dublin Core), or discrete portion of a physical object, or aggregate set of physical objects, that is/are collected, processed, analyzed, managed, or curated by humans.

This encompasses objects, their derivatives, and aggregates, and also avoids potential ambiguities about "natural environment" (which might get a bit squirrelly if we want to accommodate other kinds of objects, like geological samples or cultural artefacts). We can probably remove some of the verbs (e.g., eliminate "analyzed", as it may be implied by "processed"?)

@dr-shorthair
Copy link

dr-shorthair commented Oct 17, 2021

"Sample" implies that it's the subset of all physical things that humans capture or care for in some way.

I don't find that very helpful. It is also not very consistent with the various bits of discussion above.

  • things that are captured and cared-for are specimens
  • things that are subsets, or are in some other way representative, of some other identifiable discrete thing are samples.

The sentence quoted conflates these concerns in a rather confusing way.

I believe the concern here is to recognize that, if 'sample' and 'specimen' are both roles, and are somewhat independent of each other, then we need to identify the parent class of 'material things', some of which are also samples, some of which are specimens, and some of which are both.

http://purl.org/dc/dcmitype/PhysicalObject would be fine, except for the 'inanimate' qualifier :-(

@tombaker do you know why dctype:PhysicalObject must be 'inanimate' ?

@dr-shorthair
Copy link

I've raised an issue about 'inanimate' over on the DCMI issue tracker.

@deepreef
Copy link

deepreef commented Oct 18, 2021

I don't find that very helpful. It is also not very consistent with the various bits of discussion above.

  • things that are captured and cared-for are specimens
  • things that are subsets, or are in some other way representative, of some other identifiable discrete thing are samples.
    The sentence quoted conflates these concerns in a rather confusing way.

Fair enough -- that sentence was written hastily -- which is why I was a bit more careful in the wording of the definition text:

Any physical object (as defined in Dublin Core), or discrete portion of a physical object, or aggregate set of physical objects, that is/are collected, processed, analyzed, managed, or curated by humans.

So... replace "humans capture or care for in some way" with "collected, processed, analyzed, managed, or curated by humans". Not sure if that is any better, though.

I believe the concern here is to recognize that, if 'sample' and 'specimen' are both roles, and are somewhat independent of each other, then we need to identify the parent class of 'material things', some of which are also samples, some of which are specimens, and some of which are both.

I think we get way too hung up on the semantics of "specimen" (as a noun?) and "sample" (as a verb? noun?). Both of these terms have different meanings to different people, and different definitions in different contexts. Of the two (specimen and sample), my sense is that "sample" probably carries less misinterpretation-potential baggage. But maybe that's just me?

In any case, the good news is that we don't need to define "specimen", and we don't need to define "sample", because neither of those terms, by themselves, is a DwC term. What we do need to do is define MaterialSample as a term. If either "Material" or "Sample" as part of that term are so misleading and problematic that they create excessive confusion, then perhaps we need to come up with a new term. Personally, I think the costs of establishing a new term are greater than the costs of potential misinterpretation of pre-conceived notions of what "Material" or "Sample" somehow implies, so my preference, still, is to keep the term "MaterialSample".

So... I agree... the phrase "humans capture or care for in some way" was unhelpful. But I'm curious: what do folks think of the actual wording I proposed for the definition of MaterialSample (above)?

http://purl.org/dc/dcmitype/PhysicalObject would be fine, except for the 'inanimate' qualifier :-(

I agree that "inanimate" is problematic, but I think a bigger problem is the scope. I do not think that dwc:MaterialSample should adopt a definition that defines the scope as broadly as dc:PhysicalObject. I do see instances of dwc:MaterialSample as representing a subset (subclass) of dc:PhysicalObject, but I don't see the two concepts as congruent. Why? because not all instances of dc:PhysicalObject are "collected, processed, analyzed, managed, or curated by humans"; and my sense is that we would like to confine dwc:MaterialSample to that more limited scope of physical things.

@dr-shorthair
Copy link

dr-shorthair commented Oct 18, 2021

Actually I think the verb 'to sample' is pretty clear, and helpful. My concern is exactly that your definition slides immediately over into the curation and handling aspect, which I understood to be associated with specimens, but not with all samples. That is confusing.

If the 'inanimate' qualifier could be removed from the Dublin Core class, then

dwc:MaterialSample rdfs:subClassOf dctype:PhysicalObject . 

We could also perhaps see an additional class

dwc:AccessionedThing rdfs:subClassOf dctype:PhysicalObject . 

to support the collections folk more explicitly.
And then some individuals might be both -

my:Individual987 a dwc:MaterialSample , dwc:AccessionedThing . 

and implicitly also a dctype:PhysicalObject of course.

@deepreef
Copy link

Ok, yes -- that sounds right to me. What are some examples of AccessionedThing that are not also instances of MaterialSample? If there are none, then wouldn't this additional class be:

dwc:AccessionedThing rdfs:subClassOf dwc:MaterialSample

?

@deepreef
Copy link

Our specimens, and tissue samples and DNA extracts, etc. are kinds of material-sample. I do NOT support deprecating or subsuming those subclasses into material-sample. I think every kind of biodiversity specimen is a kind of material-sample.

So I guess the way I see it is that things like LivingSpecimen, PreservedSpecimen, FossilSpecimen, "EnvironmentalSample", "TissueSample", etc. are better framed as entries in a controlled vocabulary, as values for something like a materialSampleType property, rather than subclasses with their own specific/unique properties and relationships. I don't know enough about LOD/Semantics to understand the implications of treating them as values in a controlled vocabulary for a property as opposed to subclasses of MaterialSample, so I may be wrong about this. But just do be clear, I didn't mean that the terms had no value; I just meant that they should be represented as values in a controlled vocabulary, instead of distinct classes in DwC.

I don't think I agree with Rich's earlier assertion that no material sample is an organism; that they are disjoint sets.

I think that depends on the meaning of "is an" in the quoted text above -- and it also underscores my long-standing uncertainty about the boundary between MaterialSample and Organism. And to be clear, I think an individual (e.g., a living tiger in a zoo) can simultaneously have MaterialSample properties and Organism properties -- so in that sense, the Tiger is both a MaterialSample and an Organism. But my point is that the properties of an Organism and the properties of a MaterialSample are non-overlapping. Whether or not that means the two classes are "disjoint sets", or something else, is a question that exceeds my understanding of the terminology of this space.

The way I understand it, the properties that apply to the Tiger as an instance of Organism are properties that are true at all moments of the existence of the Organism -- from conception until death or disintegration. These are not related to the physical being of the tiger, because the physical being changes dramatically over the course of its life. So things like taxonomic identity and gene sequences and blood type and other things like that are properties of the Organism instance. Stuff that applies to the physical manifestation of the Tiger, like condition reports, or its participation in transactions with other zoos, etc. seem, to me, to represent properties of the Tiger as an instance of MaterialSample.

A living organism can be collected for use in scientific study and thus meet the critical criterion of material-sample.

Absolutely! Which is why I think LivingSpecimen should be included among the controlled vocabulary values for materialSampletype.

Does being dead make something NOT an organism?

That's a key part of the question I've been asking for a long time now (spoiler alert: I don't have a good answer). I would say that the Organism does not exist until either a sperm fertilizes an egg, or an asexual organism splits into two, or whatever reproduction mode applies. But does that mean that, once created, the Organism continues to exist into all eternity from that moment forward? I don't think so. After the last molecule that had comprised the physical being of the Organism at the time of its death has completely dissociated, I don't think we would continue to think of that set of dissociated molecules as still being the "organism". So eventually it ceases to be. But I'm not sure when that cessation of being an Organism happens. I would say certainly not before it dies, and certainly not after it completely disintegrates -- so I would say that an Organism stops being an Organism at some point between those two points in time.

If a fish is a kind of organism, and I tell you this thing is a dead fish, isn't it (still) a kind of dead organism?

Sure (maybe?) But if that same fish is eaten by a shark, and some of its molecules are absorbed into the shark's body through digestion, and other molecules are excreted over time -- would you still call that dissociated set of molecules scattered over miles of reef and ocean water to collectively still be a fish? I'm guessing not. So... somewhere between the point at which it stopped living, and the point at which its molecules are dissociated and dispersed, I would say it stopped being an Organism.

I could wax on about this for hours, but I think that wouldn't be helpful for the task at hand. The core task is to come up with a definition for MaterialSample (or its replacement term) that works for the needs of the TDWG community (and beyond). Part of that definition should help define the boundary between instances of the MaterialSample class and the Organism class.

I think @baskaufs has suggested (and I agree), that a more practical way to arrive at these definitions and distinctions is by figuring out which properties go with which class, and from those respective sets of properties the boundaries of the classes should emerge. I have a pretty clear idea which properties I would assign to each of these two classes, but I've already consumed too much bandwidth on this discussion, and I need to get some sleep before TDWG starts again (1am Hawaii time... ouch). So I'll end it here for now.

@stanblum
Copy link
Member

stanblum commented Oct 20, 2021

Thanks for those clarifications, Rich. I think we agree. Not all organisms become material-samples, and not all (biodiversity) material-samples are (whole) organisms, so the one-to-one correspondence that can exist in some cases is not a class-subclass relationship. I also want to argue that organism and (biodiversity) material-samples should be recognized as distinct things because our samples infer the existence (or former existence) of organisms and their properties. Samples tell us about organisms, and by inference populations and taxa.

@deepreef
Copy link

Thanks, @stanblum - yes, we definitely agree! I apologize that my endless ramblings don't always capture my points clearly.

But I would like to focus on this a bit more:

Not all organisms become material-samples, and not all (biodiversity) material-samples are (whole) organisms, so the one-to-one correspondence that can exist in some cases is not a class-subclass relationship. I also want to argue that organism and (biodiversity) material-samples should be recognized as distinct things because our samples infer the existence (or former existence) of organisms and their properties. Samples tell us about organisms, and by inference populations and taxa.

... because this gets to the heart of not only the definition of MaterialSample and the boundary between that class and Organism, but also helps clarify the nature of the relationship between instances of these two classes.

First of all, I should explain that in our implementation "Organism" is itself a subclass of something we call "Individual". The latter is broader in scope and includes all manner of non-biological things. So for us, the relationship between MaterialSample and "Individual" is maintained for both biological (biodiversity) and non-biological stuff (part of my preference for maintaining the definition of MaterialSample broad to allow non-biological things).

But even if we focus only on the biological/biodiversity subsets of these two classes [Organism and MaterialSample(biodiversity)], the issue is the same: what is the semantic nature of this many-to-many relationship between Organism and MaterialSample?

Again, I don't have a clear answer, but I think we should explore this as a way to refine the definition of dwc:MaterialSample.

At the heart of this is your point that "...our samples infer the existence (or former existence) of organisms and their properties. Samples tell us about organisms, and by inference populations and taxa."

I think there is some consensus that instances of MaterialSample have hierarchical relationships with other MaterialSample instances (hence the proposed new term, parentMaterialSampleID). For example, starting with a "whole organism" (e.g., a dead fish) that is curated in a museum collection, we have one MaterialSample instance representing the physical entity of that whole fish, which is preserved in some particular way. Then we may have one or more tissue samples removed from the fish, which is/are preserved in some other particular way. Then we may combine that fish with several others identified to the same taxon and collected through the same Occurrence/Event into a single "lot". This yields something like this:

MSID parentMaterialSampleID materialSampleType Comment
1 - lot Aggregate set of three fish specimens sharing the same taxon and collecting event occurrence instance
2 1 whole organism First of the three fish in the lot
3 1 whole organism Second of the three fish in the lot
4 1 whole organism Third of the three fish in the lot
5 2 tissue sample A tissue sample extracted from the "First" of the three fish in the lot
6 2 tissue sample Another tissue sample extracted from the "First" of the three fish in the lot

[Side note: I'm imagining that the example values above for materalSampleType are subtypes of PreservedSpecimen.]

Separately, we'd track each of the Organisms comprising the lot of specimens:

OID Comment
7 Organism instance of the "First" fish
8 Organism instance of the "Second" fish
9 Organism instance of the "Third" fish

There are three examples of one-to-one correspondence between Organism and MaterialSample, that could be represented like this:

OID MSID
7 2
8 3
9 4

Perhaps that's all we need in this example, because we can infer/derive the relationships between instances of MaterialSample and Organism for MSID 1, 5 & 6 through their respective parentMaterialSampleID relationships. But this only works if you actually have the WholeOrganism instances, which may not always be the case. So there may need to be one-to-many Organism-to-MaterialSample relationships:

OID MSID
7 2
7 5
7 6

Similarly, there may need to be many-to-one Organism-to-MaterialSample relationships:

OID MSID
7 1
8 1
9 1

I'm not trying to divine an implementation data model; rather I'm trying to get at the nature of the relationships both among instances of MaterialSample (via parentMaterialSampleID) and between instances of MaterialSample and instances of Organism. In other words, what are the predicates? And how many do we need (both within MaterialSample via parentMaterialSampleID, and between MaterialSample and Organism)? If we can get a handle on this, I think it will help clarify the boundaries between the two classes, and by extension, the definitions of both terms.

@Jegelewicz
Copy link
Collaborator Author

Side note: I'm imagining that the example values above for materalSampleType are subtypes of PreservedSpecimen.

Calling "whole organism" a subtype of PreservedSpecimen seems pretty darn confusing!

@stanblum
Copy link
Member

Back on Oct 17, 2021 I mentioned that I think "Material Sample" entered the DwC discourse through the BioCollections Ontology (BCO). A change in BCO I wasn't aware of until yesterday is that BCO has now deprecated the "Material Sample" class (made it an obsolete class), and instead have adopted a term/class from a larger ontology, the Ontology for Biomedical Investigations (OBI)(!):

obi:specimen: A material entity that has the specimen role.

This combines several of the notions we've been discussing: a material entity that is the result of a material sampling process and has been taken (collected and understood) to represent some larger entity (thing, population, community) in further study or analysis.

Also deprecated in BCO were the subclasses of Material Sample, including: preserved-, living-, and fossil-specimen.

I thought it was noteworthy that having taken materialSample from BCO to create a superclass for all the different kinds of things we manage in the biocollections community, the DwC is now (still) using "material sample," while the BCO is now using "obi:specimen." Should we follow? Would it be appropriate for us to 1) incorporate the obi:specimen term in DwC, or 2) mint our own specimen term, dwc:specimen, and paraphrase their definition while including a "crosslink", like:

dwc:specimen : is "same-as" or "comparable-to" : obi:specimen

The argument for the second option being that DwC is currently a bag of terms and doesn't support reasoning, which OBI (an OWL ontology) does. In other words, they aren't the same kinds of standards, so incorporating an OBI term in DwC isn't the right thing to do. The better practice might be just to reference obi:specimen in some appropriate way. I'll defer to others with more experience.

Or, given that we also want to include environmental samples in DwC (for metagenomic analysis), should we just retain the term "material sample", because most people wouldn't think of an environmental sample as a "specimen."

@smrgeoinfo
Copy link

looks like OBI still has 'material sample', defined as 'A material entity that has the material sample role', which is a subclass of specimen, 'A material entity that has the specimen role.'. I don't see anything about deprecation (Last uploaded: January 10, 2022). You'd be hard pressed to distinguish specimen from material sample given their definitions, so I can see why they'd get rid of one of them.

@deepreef
Copy link

Thanks, @stanblum!

Does OBI define the scope of “specimen role”? And what other kinds of material entities (in the sense of OBI) are outside that scope?

I’m not in favor of changing dwc:MaterialSample to dwc:Specimen if they have essentially the same definition (for reasons articulated by @baskaufs at an earlier zoom meeting).

@baskaufs
Copy link

I want to add a bit of historical perspective on the relationship between dwc:MaterialSample and OBI. Adding MaterialSample was sort of a "test case" for aligning Darwin Core terms with terms outside of TDWG, particularly terms in formal ontologies. Discussion of the proposal was extensive -- for those interested, it is archived in the tdwg-content listserv archive between 2013 April (term proposed) and 2013 October (term ratified) and most particularly in 2013 May.

In the end, the adopted class was defined to be a subclass of http://purl.obolibrary.org/obo/OBI_0100051, which I believe at the time had the label "material sample", but whose label has now been changed to "specimen". Declaring a TDWG term by its relationship to a non-TDWG term other than those in Dublin Core was a new thing to TDWG. Eventually, the decision was made and codified in Section 4.4.2.2 of the Vocabulary Maintenance Specification(SDS) that assertions that generate machine-computable entailments should not be included in the core metadata about a term, but rather in an "extension term list" layered on top of the basic "bag of terms" layer.

As a result, the subclass declaration for dwc:MaterialSample was stripped out of the defining metadata for the term and there was no effort to assert it in any official extension term list. Because of the SDS guidelines, the subclass property was dropped from the metadata history table, so one actually can't discover that it was ever there unless you read the old tdwg-content emails (principally this one. But this history should inform our understanding of what has happened in the past to lead us to our current circumstance.

There are two important issues that are raised by Stan's comment. The first is the importance of differentiating between term labels and the terms themselves. There is no such thing as obi:specimen, OBI uses opaque numeric identifiers. As I noted, I'm pretty sure that the label of obi:0100051 has changed from "material sample" to "specimen" since 2013 (one would need to dig through the OBI history to find out for sure and the demise of Google Code doesn't help in investigating the email thread). That does not mean the term itself has changed. To know that, we would need to compare the definition in the past with the definition today. That is the danger of conflating labels with "terms" or their immutable IRI identifiers. Changing a label does not change a term.

The second issue, which is currently very relevant is the mechanism by which we make connections between TDWG terms and terms defined outside of TDWG. This has been a topic of discussion for years, without resolution. Some suggestions, like using the SKOS relationship terms like skos:exactMatch are problematic because they do come with undesirable entailments. The Audubon Core Maintenance Group has thrown down the gauntlet and suggested a solution to this problem, which you can read about in this proposal, which is now under public review. In the proposal, we were very transparent about the fact that this is a precedent-setting proposal. I've talked about it with John Wieczorek, and if the proposal goes through, the Darwin Core Maintenance Group will probably follow the precedent and use sawsdlrdf:modelReference in circumstances where we want to create a machine-followable link to a term outside of TDWG without generating machine-computable entailments. That would be the case in several proposals where OBO ontology terms have been suggested as values for controlled vocabularies. If you don't like this solution, then you'd better comment on the proposal in the next three and a half weeks or it's going to be a fait accompli. If you don't like it, explain why you don't like it and propose a better solution.

@Jegelewicz
Copy link
Collaborator Author

Term change

  • Submitter: TDWG MaterialSample Task Group
  • Efficacy Justification (why is this change necessary?): The current definition includes information that belongs in usage comments or examples and also relies upon an action (sampling) which the Task Group feels is unnecessary. We wanted to provide a simple definition that could be used by many disciplines.
  • Demand Justification (if the change is semantic in nature, name at least two organizations that independently need this term):
  • Stability Justification (what concerns are there that this might affect existing implementations?):
  • Implications for dwciri: namespace (does this change affect a dwciri term version)?: Yes

Current Term definition: https://dwc.tdwg.org/list/#dwc_MaterialSample

Proposed attributes of the new term version (Please put actual changes to be implemented in bold and strikethrough):

  • Term name (in lowerCamelCase for properties, UpperCamelCase for classes): MaterialSample
  • Organized in Class (e.g., Occurrence, Event, Location, Taxon): MaterialSample
  • Definition of the term (normative): A physical result of a sampling (or subsampling) event. In biological collections, the material sample is typically collected, and either preserved or destructively processed. A physical object that represents a physical entity of interest in whole or in part.
  • Usage comments (recommendations regarding content, etc., not normative): In biological collections, the material sample is typically collected, and either preserved or destructively processed. Material samples can preserve their identity even while gaining and losing material parts. See also https://en.wikipedia.org/wiki/Sample_(material)
  • Examples (not normative): A whole organism preserved in a collection. A part of an organism isolated for some purpose. A soil sample. A marine microbial sample. the undetached leg of an animal, an aggregate of animals, an animal, soil, water, a museum collection object, a vial of tissue
  • Refines (identifier of the broader term this term refines; normative):
  • Replaces (identifier of the existing term that would be deprecated and replaced by this term; normative):
  • ABCD 2.06 (XPATH of the equivalent term in ABCD or EFG; not normative): DataSets/DataSet/Units/Unit

@deepreef
Copy link

Sorry I missed the last session on this. One question and one comment:

Question: In the proposed new definition, is there a difference between "physical object" and "physical entity"?

Comment: The proposed examples seem a little animal-centric -- maybe a plant and a bacteria example would be good to add? Also, maybe a better/more intuitive example of an "undetatched" instance of MS would be a fossil aggregate represented by a single physical rock with multiple embedded organisms.

Also... I hadn't considered the "undetatched" potential within a single organism in MaterialSample examples. Certainly examples of multiple organisms represented as a single colelctive object (aforementioned fossil; hermit crab+shell+anemone; etc.). But I'd not considered the possibility of branding undetatched subcomponents of the same individual Organism as distinct MS instances. I guess that means that any given MS instance of a single organism could have near-infinite potential child instances, without any disarticulation action happening to the whole. I don't have a problem with this, but the non-normative documentation should probably explain this a bit more, with an explanation that MS instances are minted when there is an informatic need to do so, and also including examples where there is an informatic need to track undetatched subcomponents of an object (e.g., the undetached leg of a dog).

@cboelling
Copy link
Member

cboelling commented Apr 1, 2022

A physical object that represents a physical entity of interest in whole or in part.

I understand that the notion of "representing" in the proposed definition is to convey the notion that an object that is the subject of collecting or observing (e.g., a goose, a swarm of geese, a fossil bearing rock small enough to be lifted, a small twig from a tree) subsequently to its collecting or observing, often is used to drive inferences about about a larger whole that it is part of or relates to in a specific way (the Swedish population of geese, the mountain range the rock originates from, the entirety of shrubs belonging to the same species).

I think the use of two different terms ("physical object" - "physical entity") can be defended to reflect that a subject of collection or observation will as a matter of necessity be spatially more confined than physical entities in general (the latter including, for example, all geese in the world, the taiga, Earth's atmosphere).

It would be good though, to clarify this in the documentation around the definition.

Apart from this I agree with @deepreef's conclusions about instances of dwc:MaterialSample containing other instances of dwc:MaterialSample as proper parts and the need to explain that and add other examples. Note that composition can also go the other way: each of a dinosaur skeleton's bones in a collection can be an instance of dwc:MaterialSample as can be the set of bones, even if they aren't physically associated in the collection. If that set of bones is incomplete, also the set of bones that potentially make up the whole skeleton could, IMO, be minted as another instance of dwc:MaterialSample satisfying certain informatic needs.

cboelling added a commit that referenced this issue Apr 1, 2022
Add the draft definition and non-normative documentation for dwc:MaterialSample as discussed in #2.
@deepreef
Copy link

deepreef commented Apr 1, 2022

I think the use of two different terms ("physical object" - "physical entity") can be defended to reflect that a subject of collection or observation will as a matter of necessity be spatially more confined than physical entities in general (the latter including, for example, all geese in the world, the taiga, Earth's atmosphere).

OK, thanks! That makes sense to me. But it wasn't immediately obvious (to me, at least) from the wording of the definition. I think the wording of the definition can stay as it is, as long as the non-normative explanatory comments help folks understand the implications of the distinction (physical object vs. physical entity) -- as you suggest.

Note that composition can also go the other way: each of a dinosaur skeleton's bones in a collection can be an instance of dwc:MaterialSample as can be the set of bones, even if they aren't physically associated in the collection.

Yes! Definitely. I think it's clear that aggregates of multiple disconnected MS items can be collectively bundled into a single umbrella/parent MS instance. Whether or not those individual component items came from the same instance of Organism, or multiple Organism instances, shouldn't make any difference.

If that set of bones is incomplete, also the set of bones that potentially make up the whole skeleton could, IMO, be minted as another instance of dwc:MaterialSample satisfying certain informatic needs.

I guess so... but this almost sounds like advocation for accepting hypothetical/inferred physical objects within scope. I don't immediately have anything against that, but I worry that it might be flirting with the edges of the MS scope a bit. I'm thinking of the type specimen for Nessiteras rhombopteryx. But in that case, the physical object is not hypothetical -- it's just that it might be an Organism, and it might (probably) be a rock.

@RogerBurkhalter
Copy link

In October 2020, we had a meeting of the Paleo "Happy Hour" on the topic of clusters/fossils on a slab and otherwise instances of what I refer to as "loanable objects" (cannot loan one without loaning all objects in or on a "container"). The list we came up with may be useful. We sub-divided the list into Natural Accumulations and Artificial or Anthropogenic Accumulations:

Natural Accumulations:
Fossil bearing rock rich in abundance

  • Death assemblage
  • Reefs
  • Transport/deposition, taphonomic
  • Condensed bed (slowed sediment or carbonate rate, i.e. more time in a thin horizon)_
    Coquina or bonebed
    Coal balls
    Amber
    Parts and counterparts
    Articulated vertebrate remains
    Epibionts
    Fortuitous preservation (i.e. a benthic taxon preserved on a sessile taxon, just by chance, or a "last meal" preserved)

Artificial or Anthropogenic Accumulations:
Palynomorph slides (strew samples)
Diatom slides (strew samples)
SEM stubs
- Single taxon
- Single Locality
- Multiple Localities
- Multiple taxa
- Single Locality
- Multiple Localities
Microfossil cavity slides or gridded cavity
- Single taxon
- Single Locality
- Multiple Localities
- Multiple taxa
- Single Locality
- Multiple Localities
Coal Ball Peels/Microfossil thin sections
- Single Locality
- Multiple Localities

The antithesis of these examples for anthropogenic modification is serial thin sections or peels of an individual fossil, which would fall under the current definition of MaterialSample. What we discussed briefly are examples of display fossils that are composits of several individuals, usually vertebrate fossils, sometimes invertebrate or plant fossils. The point is that a wide variety of natural and anthropogenic objects are possible. This list was assembled from the Invertebrate Paleontology and Paleobotant collections at one museum.

@RogerBurkhalter
Copy link

I left these comments last night after two long weeks of interviewing Curator candidates for the collection I am CM for. I left these as examples of paleontology collection objects and the difficulties of fitting our collections into existing CMS and DwC models. Paleontology objects are rarely "as found", they routinely require some effort at initial preparation to further expose fossilized biological objects and then some means to preserve not only the object but the relationship between it and other associated objects found within the original sample (i.e. dwc:associatedOrganisms). However, consider for example a palynology sample: a quantity of rock is collected from a locality, transported to a lab, the rock is broken into smaller peices and placed in a jar for reserve and a subset is separated and further reduced to a coffee ground consistancy and stored, a subset of the ground sample is then placed in an acid resistant beaker and processed with HF for a day or more, washed and centrifuged. The processed "residue is then stored in a vial and a subset of the now processed residue further cleaned and pipetted onto a microscope slide cover slip, dried and flipped onto a microscope slide. When the slide is examined, you finally can see the objects collected, along with hundred to thousands of other co-occuance objects (dwc:associatedOrganisms). What is the MaterialSample, the original collected rock, the ground residue, the acid prepared residue, the microscope slide itself, or the pollen grain on the slide (that has a Linnean name and coordinates from an England finder)?

The original collected sample may have other fossil forms embedded within that would be destroyed by HF processing. Subsets may be processed by other acids (Formic, dilute HCl, Acetic, etc.) or reducuced and hand-picked under a binocular microscope to produce Conodonts, Foraminifera, Calcareous Nanofossils, or megafossils, etc.. Derivative samples may also be processed for non-biological data such as isotopes of strontium, carbon, oxygen, boron, etc. and/or radiometric dating. As such. I would see the original collected sample as the parentMaterialSample to maintain at least some relationship to all of the derivative biological and non-biological entities, with the processed samples as (what?) and the (identified and named) biological objectas the MaterialSample. Seems a lot of processed derivative samples either carry the same designation as the original parentMaterialSample, or are absorbed into dwc:preparations (that do not really fit as not a preservation method), and making the link between the parent and child somewhat muddy. This is also very true of commercial CMS and why I use a database I created to keep these relationships and results discoverable. Mapping to DwC has always been a challenge (need more coffee).

Sorry for the long comments.

@tucotuco
Copy link
Member

tucotuco commented Apr 2, 2022

@RogerBurkhalter This kind of detailed use case is immensely useful. It highlights both the value of the concept of a parentMaterialSample (see tdwg/dwc#344) and its limitations. By limitations, I mean, "What does it mean to be the parent?" I suspect we need a much richer way to relate materials, with something at the level of a ResourceRelationship where the nature of the relationship can be specified. In the Diversifying the GBIF Data Model work, the model anticipates the relationships "part of" and "derived from" as well as a separate mechanism to establish membership in a material group that was developed for the OBIS Community Measurement use case, but that would also work for other purposes.

@deepreef
Copy link

deepreef commented Apr 2, 2022

@RogerBurkhalter :

What is the MaterialSample, the original collected rock, the ground residue, the acid prepared residue, the microscope slide itself, or the pollen grain on the slide (that has a Linnean name and coordinates from an England finder)?

My answer: The MaterialSamples (plural) are: whatever units of physical material(s) warrant identification and associated metadata from an informatics perspective.

In other words, the decision of whether to mint a new materialSampleID value (=establish a MaterialSample instance) should be driven by a specific need to track information related to a particular unit of physical material.

Some use cases in my (non-fossil) world:
Whole fish is collected, fin-clip is removed and preserved separately (for DNA analysis), scale falls off body and is lost:

  1. Whole specimen (voucher)
  2. Tissue sample (parent=1)

In this case, I would not bother assigning a separate MS instance to the fish before its fin clip was removed (or scale lost); and I would not bother assigning a separate MS instance to the lost scale, because I have no informatic need to track either of those separately from the two MS instances I do mint.

Whole bird is collected, put in freezer, and accessioned/catalogued. Later, the skin is removed and prepared dry, the internal organs are preserved in alcohol, the skeleton is processed with the aid of dermestid beetles. Later still a subsection of tissue is removed from the preserved organs for DNA analysis.

  1. Whole organism as frozen/accessioned/catalogued
  2. Skin (parent = 1)
  3. Internal organs in alcohol (parent = 1)
  4. Skeleton (parent = 1)
  5. Tissue for DNA (parent = 3)

In this case, I have an informatic need to track the whole organism prior to dissociation of parts (object that is accessioned and catalogued), so I do assign an MS instance to this. I do not bother assigning MS instances to the blood and other tissue that ended up in the waste basket, nor the tissue consumed & digested by the dermestid beetles, because I don't have an informatic need to track them. Because the tissue sample was subsequently removed from the alcohol-preserved organs, I treat it as a child of that MS (3), rather than a a direct child of the whole (1). That way, the curation history of the tissue sample is more precisely/completely represented in the chain of preservation processes (e.g., in case the internal organs where first fixed in formalin, so I would then know that the derived tissue sample is not fit for purpose for DNA sequencing).

These are pretty straightforward examples in my mind. Another straightforward example is if a feather is plucked from the skin and used for some purpose/preparation/whatever, in which case I would mint:
6) Feather (parent = 2).

But here's where it gets interesting, relative to the earlier discussion on "undetatched" MS children. Suppose I photograph just the wing of the mounted skin. Would I have a need/desire to mint a new MS instance for the wing, even though it is still physically part of the whole skin? That way, I could make the subject of the image the wing alone, rather than the whole skin preparation. But is that really good practice? I honestly dunno.

In any case, I think the same basic logic ("Do I have an informatic need to track properties or relationships of a particular aggregate/unit of physical material?") would apply in the example you gave for which bits get distinct instances of MS.

@deepreef
Copy link

deepreef commented Apr 2, 2022

@tucotuco :

I suspect we need a much richer way to relate materials, with something at the level of a ResourceRelationship where the nature of the relationship can be specified.

I've come around to applying that logic to all relationships within DwC. In other words, whenever there is an xxxID term/property within a DwC Class, I'm leaning towards representing those values not as direct properties of the root instance, but as instances of ResourceRelationship.

I think of this as a "semi-serialized" approach. That is, literal values are treated as direct properties of DwC class instances (e.g., property "fields" to the class "tables"), but all "foreign key" property values are captured as an "octuple store" (eight terms organized in dwc:ResourceRelationship).

I have no idea whether this quasi-hybrid relational model/serialized model is practical or sensical, but it feels like a potentially practical middle-ground between the two different ways of representing data (i.e., tables & fields vs. triple-store).

@Jegelewicz
Copy link
Collaborator Author

Attendance at the 2022 working session included a lot of people who are not members of the Task Group and their primary concern was with the baggage that might be associated with "sample".

Mathias Dillen: This definition includes a physical photograph and a physical drawing of an organism or fossil, right?
Carlos Martínez: I wouldn't consider those as part of a material sample, e.g., I don't sample a photograph, I take a soil sample in a bag
Carlos Martínez: photographs and illustrations are "works" in the sense of the ZooCode
Carlos Martínez: Samples come from sampling, e.g., collecting material things during a field trip / sampling event. A digital photograph is not a material sample. I think that we are confusing "collection objects" with material samples in the sense used by field biologists. I am a biologist and calling a photograph a material sample is counterintuitive (read wrong) to me.
Deborah Paul: A herbarium sheet, sometimes only has an image on it.
Carlos Martínez: A herbarium sheet is not a material sample, the plant on it is. If there is no plant and there is just a picture, there is no material sample on the sheet and the sheet is just a collection object.
Carlos Martínez: Things that are included in a material sample: the three main materials upon which scientific names of animals are based: 1) specimens, 2) fossils that are substitutions (replacements, impressions, moulds and casts) for the actual remains of animals, and 3) the fossilized work of animals (ichnofossils).

It was clear to me that people were looking for something that could encompass any physical material whether it was a "sample" or not if we hope to allow collections to use DarwinCore to share their objects. There was also discussion about the use of the term sample when associated with human remains. As I have first-hand experience attempting to remove "specimen" from everything in a CMS, I completely understand the concern. Notes from the session include this: “Sample” is problematic, consider “catalogueRecord”, “object”, “entity”, “unit”

Mariel Campbell: If we change the term to Material Object or Entity, and define it as a physical object, then an herbarium sheet with a physical photograph in it is indeed a Material Object. It can be barcoded and loaned.
John Wieczorek: “Entity” might not be so easy to understand. But then maybe it will force people to read the definition. ;-)
Mariel Campbell: It also avoids calling a part or representation of a human or named animal as an "object" which is an issue

It was also discussed that a "material" class should start with a "High-level distinction between material and information artefact" as this would mesh with the LatimerCore baseTypeOfCollection

So - should we really be starting with the class MaterialEntity? Would this be equivalent to the Dublin Core PhysicalResource?

Term PhysicalResource
URI http://purl.org/dc/terms/PhysicalResource
Label Physical Resource
Definition A material thing.
Type of Term Class

I know this feels like a step backward.

Steve Baskauf: I feel like we are plowing ground that was plowed a year ago in this group...

BUT as LatimerCore is currently in expert review and they cover a lot of things that crossover into material, I think we need to think deeply about this.

@Jegelewicz
Copy link
Collaborator Author

Changes as suggested in #37 added to review package - https://github.com/tdwg/material-sample/blob/main/review%20package/MaterialSample.md

@Jegelewicz
Copy link
Collaborator Author

Term Change submitted - tdwg/dwc#451

@Jegelewicz
Copy link
Collaborator Author

change complete - tdwg/dwc#451

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests