-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dwc:occurrenceID
in the context of dwc:catalogNumber
review
#21
Comments
dwc:occurrenceID
in the context of dwc:catalogNumber
review
One issue that I see here is that for most museums - the "catalog number" describes BOTH the "occurrence" AND the object(s) or organism in the collection related to it. So museums are going to need to up their game in creating unique IDs for occurrence, material sample, and so on.... |
I think that this is a key point for the issues tackled in this task group and in particular also for #11 on linking the type of artefact used to infer an occurrence to a pointer for that occurrence. I am tempted to think that given the current definitions of the relevant terms in DwC that a dataset that uses the "catalog number" to describe BOTH the "occurrence" AND the object(s) or organism in the collection related to it is not DwC compliant and cannot be treated as such by recipients of that data. |
Recall, as is already covered in our discussions, that an important contributing reason why museums describe their specimens as occurrences is that dwc:occurrenceID is required when publishing specimens in GBIF ;-) |
THANK YOU!!! This is something I've been making noise about for years (and why I'm so excited about this Task Group!) In summary: Catalog Numbers are assigned to physical things in collections (i.e., I think the recent activity surrounding Back to your point, years ago we added The logical consequence of this is that the DwC terms In other words, getting the community on board with disentangling |
@deepreef I know that many of our neontological collections at my museum use the "so-called "Darwin Core triplet" (institutionCode+collectionCode+catalogNumber)" as the OccurrenceID, and it will be very difficult to get them to change to a machine-generated UUID or PID. The one collection I am CM for, Invertebrate Paleontology, has UUID's where recommended, it's not that hard. Getting smaller collections on board, especially those with limited resources of money or people, will be a major task. |
@RogerBurkhalter: yeah, that's exactly what I did. For each specimen table that was the source of records for Occurrence instances (which already had a occurrenceID field with auto-generated UUID), I simply added a second field to the same table for materialSampleID with auto-generated UUID. As long as I know internally that the occurrenceID represents the "specimen at collecting event" (actually Indeed, I imagine most collections are at the mercy of their respective CMS and how it manages data and translates/exports it to DwC. But if we can achieve some sort of clarity and stability on the definitions of these various DwC classes (especially |
I think this may be less true than you think, especially as physical specimens have been subsampled and shared for molecular study. See ArctosDB/arctos#4032 (comment) |
Agreed -- "vast" was an overstatement; but I bet if we looked at GBIF or iDigBio data, we'd still get a 1:1 correspondence between Catalog Number and occurrenceID in the majority of cases. But my larger point was that we can't rely on that -- even if exceptions are a minority, we need to accommodate them more robustly than we have been historically using DwC. |
At least ALL museum collections using the Darwin-Core-Triplet (see also Guralnick et al 2014) approach to build their occurrenceIDs (as is STILL today recommended in the Darwin Core definition for occurrenceID!!!) would by design have a 1:1 cardinality of catalogNumber to occurrenceID!
When generating a sub-sample from a museum specimen (or any MaterialSample), the Darwin-Core-Triplet as occurrenceID would be less of a problem if only the occurrenceID identifier-string was maintained unchanged as the occurrenceID also for the sub-sample (and not generated a-new from a new catalogNumber).
Because MaterialSample (approx 2013-03-28) and materialSampleID (approx 2013-05-25) are relatively recent additions to Darwin Core, most museum collections would likely not have any materialSampleIDs assigned to their specimens (yet)? I see an important mission of the MaterialSample task group to (finally) build the foundation for museum collections to start implementing MaterialSample and materialSampleID - and to demand such implementations from their collection management systems. |
as the data manager of a combined herbarium, living collection and seed bank collection we have many use cases where an occurrence (when and where material was collected) has many catalogue numbers (how it is physically represented in the collection). |
Indeed! I think that those of us connected with data management systems that do incorporate |
Do you already assign |
Here is a recent example with material from the same occurrence (collected in June 2021) deposited in the vascular plant herbarium (CMS = MUSIT) and in the DNA tissue bank (CMS = Corema) at the museum in Oslo. When publishing the DNA bank in GBIF a few years ago we quickly became aware of the restrictive requirement for distinct occurrenceID in each dataset - in practice blocking us from publishing derived tissue samples with the "correct" occurrenceID (because multiple tissue samples are often extracted from the same material sample/specimen). The tissue sample specimen is thus (unfortunately) published with the occurrenceID mapped to the assigned materialSampleID and the (correct) occurrenceID is instead published as relatedResourceID. (An organismID was minted as well for linking, but the vascular plant herbarium CMS did not support this term).
|
Here is an example of what happens with Arctos data and how we could (now) pass a MaterialSampleID. https://arctos.database.museum/guid/DMNS:Mamm:12344 If you look at DMNS:Mamm:12344, this is how it would work:
Each of the exercises make me realize how differently we are all approaching this and how I need to work with the Arctos community to get data in the appropriate places.... |
We don't have a separate materialSampleID's. Everything is treated as catalogued items within a collecting occurrence (currently we allocate an internal sequential number to as there is no real world candidates and deliver to DwC by appending the institutionCode) and each item has a unique catalogNumber by virtue of adding a suffix for the different physical items across all 3 collections. eg. CANB897925.1 herbarium sheet; CANB 897925.9 seed packet; CANB 897925.6 cutting (now dead), a DNA collected at the time of collection is also given a catalogNumber. The institutionCode and accNo are not necessarily unique within an occurrence as we have combined separate institutional collections over time and used different accession numbering schemes, but they all link to the same occurrenceID. |
@afuchs1 I don't think you went off track at all! I think that is the essence of what MaterialSample should be about - "this"! |
Agreed! I think this gets to the heart of what we're trying to address in this Task Group. I'm still wrestling with the boundary between Diver encounters a rare fish on the reef, and gets in-situ video of it. The fish is collected and brought to the surface alive, and transported half-way around the world (still alive). It is then photographed again (alive) in an aquarium. Some years later it dies and becomes a specimen at a Museum, where it is photographed again before preservation. Several tissue samples are taken, and the remaining specimen is preserved in alcohol. Over time, the specimen is moved from one shelf to another, or put on display, or loaned, or whatever. There's a lot to unpack there, and while it may seem like a bit of an edge case, it's not that sharp of an edge, and whatever we come up with ought to be able to accommodate this kind of use case. I'm still (mostly) confident that an instance of However, I also recognize that Ultimately, we want to be able to attach media items to both OK, now I'm the one who is gone way off track! Sorry about that! I know the above is probably way to abstract and conceptual for what we're trying to accomplish with |
Following your line of thought - what is the thing/class taking part in an Event ... in and Occurrence. How do we model environment or ecosystem or nature types or geology? (which are not appropriately modeled as Organism). Would these only be properties of a Apropos - Is an Occurrence with occurrenceStatus = absent then actually an Occurrence at all? (However, I am jumping outside the topic of this thread here) |
These are the kinds of questions that keep me up at night contemplating. I guess one fundamental thing we ought to pin down is: is the "Sample" of We deal with non-organism stuff the same way we deal with organism stuff, in that we treat "Organism" as a subclass of "Individual" (other subclasses could be things like "vehicle", "sunset", "habitat", etc.) They're all fundamentally abstract, and many (but not all) of them have material manifestations. So instances of I'm reluctant to apply any of these things directly to
Yeah, that's another one that keeps me up at night. I worry this may mead down one of those very distracting philosophical paths that would be appropriate in some context, but not this one. On the other hand, I think some of these fundamentals are important to allow us to nail down the scope & definition of |
The key property of a |
What is a MaterialSample?(PreservedSpecimen + FossilSpecimen + LivingSpecimen + tissue samples & environment samples => MaterialSample) Could a material In my use case, thinking of a "nature type" (which could also be lifeless) evaluated to be designated for active conservation by national nature protection legislation. (Would we at all want/care about to enable Darwin Core to describe the monitoring and conservation of and ecological research on such things?). (Sorry for staying outside of the thread main topic) |
From my perspective the only time you have an occurrence is when you have an organism (or some part of an organism that can be identified, e.g. DNA) in its natural environment. Therefore the fish photographed in the aquarium is not an occurrence, nor when it dies and goes to a museum, or when tissue samples are taken. Those are all events, sure, but they are not occurrences. I don't have trouble with occurrenceStatus = absent is still an occurrence. You went an looked for an organism using methods that would usually find it if it was there and didn't see it. I think you all are getting to philosophical here. Researchers use these 0 in their analyses all the time. It's not like Darwin Core invented this out of thin air. I was just on a call yesterday for seagrass monitoring where they want to make sure to include when a species of seagrass occurs in one plot at a field site but is absent from another plot because it has importance in the analyses they do. |
What is an Occurrence?If the If a LivingSpecimen can be BOTH "MaterialSample" and "Organism" at the same time (??) then places and times where it is CAN be described as "Occurrence"s would be broader than the original "collecting" event when it was sampled from "wild" nature (sensu in situ)?! Thus, even a tiger in a Zoo is a valid Occurrence? For a cultivated crop resulting from crop breeding and conserved as a LivingSpecimen there is no "wild natural environment at all" -- so would we then agree that the "natural environment" is in the agricultural field? |
I think that |
Way behind on this conversation, but I can say with confidence that @tucotuco confirmed years ago that occurrences do not have to be restricted to natural occurrences. It's buried somewhere in the tdwg-content email archives. |
I can confirm that there was never a restriction on Occurrences being
"natural". The purpose of establishmentMeans is to distinguish between
cases and now has a lovely recommended standard vocabulary (
https://dwc.tdwg.org/em/).
…On Fri, Nov 12, 2021 at 12:28 PM Steve Baskauf ***@***.***> wrote:
Way behind on this conversation, but I can say with confidence that
@tucotuco <https://github.com/tucotuco> confirmed years ago that
occurrences do not have to be restricted to natural occurrences. It's
buried somewhere in the tdwg-content email archives.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADQ727QR4NX3HOZF5UJK7TULUXAFANCNFSM5GDS52SQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
@tucotuco from Rich's example above (fish in the water -> aquarium -> museum -> tissue sample) can you tell me which of those are occurrences and therefore get an |
Occurrence: "An existence of an Organism (sensu http://rs.tdwg.org/dwc/terms/Organism) at a particular place at a particular time." A record of the time and place the video of the fish (Organism) was taken is a good candidate for a distinct Occurrence. A record of the time and place the fish was accessioned in the museum could be an Occurrence, but not one that anyone in our community has expressed publicly as an interesting one from the perspective of science, rather, it is interesting from the perspective of collection management. Similarly with the specimen as it moves around. |
Wouldn't they all be occurrences? However, I don't think that establishmentMeans does what is necessary here. None of the terms in the controlled vocabulary accurately describe any of the occurrences besides the fish in the water. The paleo people have discussed the use of in situ/ex situ as a method for getting to "natural" or not. |
I think many of the terms used in Occurrence are legacy terms and definitions from museum collections. I do not agree with changing catalogNumber to occurrenceNumber, we have occurrenceID to handle that. A catalogNumber infers a physical object that has been cataloged in a museum or repository, where machine and human observations are not routinely cataloged or numbered by a collection (or have not been). Yes, when I run across an observation in a field notebook or measured section log of an occurrence that has no corresponding collected objects, I record that as an observation as a humanObservation, and the CMS gives it an occurrenceID, but I use a UUID that does not resemble my museum catalog number for my institution. Of the hundreds of thousands of images (analog photographs and digital images) we have, not all are of objects collected nor are all of the objects in the images reposited at our institution. These machine observations I have not even begun to work on until such time as we have a DAM and students/volunteers to scan negatives and prints. +1 @tucotuco |
The value of
I'm not comfortable with this collapsing of "Occurrence" into "Evidence of an Occurrence". To me, the
I included the "absence" clause in brackets because I'm not sure there is universal consensus that absences are in scope for The point is, a definition of this sort would accommodate recording both absences of individuals at an
I agree with @RogerBurkhalter on this. We generate values of The way |
I think we might have an idealized idea of And then we have how I tend to think that our community might have painted itself into a corner and that maybe accepting that dwc:Occurrence has become predominantly used as the Evidence of might maybe be a possible least bad way out. _ (... and MAYBE instead consider minting a new class "OrganismOccurrence") In my mind, the semantics of an "
I also think that we are in agreement of the value of "Occurrence" - and that we agree (as is the motivation for this task group) that we need ANOTHER concept to describe objects in collections (PreservedSpecimen, MaterialSample, ...). It is the latter need I intended to express by "the occurrences of things is a rather poor proxy for describing the things themselves". |
I think the real corner we painted ourselves in, years ago, was the (mis)interpretation that specimen=Occurrence. This group and the class
Agreed!
Ah! OK, understood. In that case, we agree -- and I think the convergence on |
Is not
just a subset of the larger (in scope) misconception that Occurrence = any evidence of an organism-occurrence? (as in effect treating specimens only as such evidence) |
I Occurrence data as one of the main paths forward in paleontology as, especially human observations in the form of measured section notes as a primary information source of new finds and new studies. Often, a researcher with a bias towards collecting a particular taxon type, for example, Devonian gastropods. While documenting the section, they happen upon an occurrence of ostracods. The gastropod researcher may have no interest in those ostracods, but note the occurrence. Later, when another researcher is seeking Devonian ostracods for research, having that occurrence documented and findable is a major plus. There are literally tens of thousands of detail documented measured sections, published and unpublished, in museum collections and other repositories (like the USGS) that have hundreds of thousands of human observations of similar type occurrences. These are very important and under-documented resources that could certainly influence the future of study. |
I guess you could look at it that way, but the history more or less boils down to:
So, yeah, "specimens"/MaterialSample were certainly part of the "Evidence" conversation, but the conflation of Specimen=Occurrence predates that by quite a bit. However, I guess it is fair to say that "Specimen=Occurrence" is something of a subset of Occurrence=Evidence -- and this is also supported by cases where the same In any case... my original point is that we should not re-define "Occurrence" as being the Evidence (as here). In other words, the specimen and the image and the field notebook are not the Occurrence -- the Occurrence was the presence of the organism at the place and time. |
@tucotuco that may be, but clearly we don't have a common idea of what "organism" means.
So I think a clarification is needed, because until we can disentangle dwc:Organism from dwc:MaterialSample, I don't think we can move on.
|
Depending upon our clarification for "organism", the second wrench in the works that I think we need to address is how are dwc:Organism and dwc:LivingSpecimen different? |
Would something along the lines of ... be useful:
|
On Sun, Nov 14, 2021 at 6:54 AM Teresa Mayfield-Meyer < ***@***.***> wrote:
Depending upon our clarification for "organism", the second wrench in the
works that I think we need to address is how are dwc:Organism and
dwc:LivingSpecimen different?
A living specimen is, of course, an organism. I think the key distinction between the two concepts is that LivingSpecimen is a kind of MaterialSample, whereas the DwC Organism class is intended to represent an organism that is inferred to exist or to have existed (past tense). The critical role for the Organism class is that the concept and in particular the property `dwc:OrganismID` ties together multiple occurrence records that derive from the same organism. Other properties derive from the Organism class (most importantly what taxon the organism represents), but in our "shorthand" practice they are commonly recorded as properties of something that has a 1:1 relationship with organism, i.e., the Occurrence or the whole-animal MaterialSample.
|
I agree, and was going to make similar points in the as-yet-unwritten "Chapter 6" of my unsolicited dissertation. To me, the two core properties of a
These need to be fleshed out more (as I had intended within my concluding "Chapter 7"), and I'm still getting my head around whether I agree that the "sample" necessarily requires it to be some subset of a larger thing and/or whether the verb part of "sample" is definitive.
I would consider that "a" critical role; not "the" critical role. Certainly it was the original critical role (sensu the old
... which I would summarize these as:
... are actually more directly relevant to this Task Group, and in our modern thinking of representing DwC as more than just a bag of terms loosely organized into Classes. |
|
@deepreef wrote:
Maybe I've started your Chapter 6? In any case, I've been working one myself, which I posted on the Wiki home page. I think we can start working towards definitions, and analyzing scenarios (not really full use cases) to come up with recommendations about how the resulting records should be published and interpreted. I've been struggling a little with formatting and how to represent the critical concepts and data structures. So please make edits or add new representations if what I've done is unclear. |
By which I mean
|
Yeah, I get that. But here's why I'm still wrestling with it: So if I collect a specimen of a bird and put it in a collection, what bigger thing is it a representative of? A flock? A population? A species? A vector of a disease? Ok, let's say one of those works, and it doesn't matter which. What, then, is an example of something physical that is not a representative of something bigger? I mean, if it's made of matter, then isn't it ultimately a representative of the universe? I guess my question is: what are some examples of physical things that would not fulfill this third criterion? If it doesn't help us understand what is not in scope, then what purpose does it serve in the definition? EDIT OK, maybe when you say "is a sample of something", you mean the same thing that I mean when I say "It is under the direct control and care of humans"? That is, "it was obtained by an act of sampling" means the same thing as "it was taken into custody by humans". If those two cancel each other out, then that leaves the criterion that it must be a subset of something larger. To which I refer back to the rest of this post above. |
@stanblum: Thanks for the link to the Wiki page! Maybe I should have captured my "Dissertation" in that sort of template, rather than a series of Issue posts? I can reformat accordingly. |
I think the wiki formatting tools are too limited and too hard to edit. I think we should switch over to GoogleDocs. Do we have a folder already? |
Q. Why do you collect and manage a sample? I think we are doing science, right? |
@deepreef -- examples of 'physical things that would not fulfill this third criterion' (not samples): They are just things-- yes they can be categorized, but there is no intention of using them to learn anything about the world, they are just attractive or useful. I assume the bird that is collected and preserved to put in a museum is not just a decoration-- there is some intention to learn something about the world from it... |
Probably, but I don't think our definitions should hinge on intent. I think these things should focus on capturing facts, regardless of whether we want to do science with the information, or just look at pretty dead bugs. I mean, pretty dead bugs in a non-scientist's personal/private collection still function as evidence of occurrence -- assuming the data are accurate.
If you recorded the kind of rocks they were, and where they came from, then wouldn't that still be potentially useful information? How is it different for rocks in your yard vs. scientific specimens that are lost or destroyed after they are collected. In both cases, the Occurrence data are still valuable, and during the period of time when the samples (noun) were in possession /control of a Human, I would still consider them to be candidates for instances of As for the wine glasses and laundry detergent, these are out of the TDWG scope (non-biological), but I wouldn't automatically rule them out of scope for non-biological data nerds. Imagine I was a collector of rare wine glasses and found a dead insect in one of them. From my perspective, the insect would be worthless, but to an Entomologist, it might represent a new geographic record. I realize I'm stretching things here, but I guess my point is that motivation/intent should probably not be among the criteria for defining the scope of Occurrences and MaterialSamples. What should matter is that someone took the time to record and document the information, and to share the information -- whatever their motivation was. |
Agree! I think there are a lot of things currently recorded in museum catalogs that were collected because they were pretty or unique without any intent to study them. That doesn't make them less valuable or unable to be used as MaterialSamples now especially if there is some data to go with them (but for some forms of study, data isn't even that important). |
On the contrary - I suggest that motivation/intent is central here. We do science. We deliberately design a sampling and observational program, in order to describe the world in a systematic way. This is not random. |
Natural existence versus human intention: maybe the compromise is to acknowledge that nearly infinite organism-space-time intersections have existed in nature, from the origin of life to now, but we can't/don't document them all. They enter our world of "stuff we care about and document as data" when we "sample" them or observe them. They cross the threshold into our information space. Acknowledging the similarity between biodiversity specimens and other material samples lets us "play nice" with the rest of the Organization for Biomedical Ontologies (OBO) world. I don't think accepting that subclassing scheme imposes a cost or an impediment. While I don't know what the logical implied benefits might be (thinking ontologies and reasoning), it seems worth it. |
It's also probably uncontroversial that our specimens/samples enable us to discover and document the characteristics of the biological systems they were drawn from. The systems represented don't have to be declared at the time of collection. The systems represented can be determined later from the documentation of context. |
@stanblum : Agreed!
I see where you're coming from, and it reminds me of a debate I had a while back with an esteemed anthropologist. His point was that you need to design science projects (and data models for capturing results) around hypotheses, so you need to know in advance why you're gathering the data, so your sampling design (and data model) allows you to properly tests your hypothesis. I agreed, but countered that the mark of a good data model is that it allows you to answer questions you never even thought to ask when you were gathering the data. I think both of these are in play here. I suspect that the vast majority of specimens in Museums (fodder for MaterialSample) were captured/killed/preserved with scientific intent. But when I record an observation of a fish on a reef using my video camera, I may have no idea at the time that it represents a depth record or a geographic range extension. So my intent in recording the video doesn't change the scientific value of the Occurrence record that it documents. This is true even if I am taking video of another diver, and the fish just happens to swim into frame. This is why I think intent (at the time of documenting an Occurrence record) isn't a prerequisite to capturing useful information. Obviously, if I killed the fish and put it in a Museum as an instance of |
closing for focus on MaterialSample and properties |
Originally posted by @tucotuco in #6 (comment)
From the current definitions of
dwc:Occurrence
anddwc:OccurrenceID
I understand that adwc:occurrenceID
is intended an identifier for adwc:Occurrence
(the being present of one or more individuals of taxon X (somewhere) within geographical location L (at some time) during time interval T), which is, while less tangible than a preserved specimen, a real thing and exists independently of what biodiversity researchers think or do.The concept described in the quote is different and on the level of assertions (i.e. what a human agent thinks about an ocurrence for a given X, L, T) including assertions of absence, i.e. that a given occurrence, specified by X, L, T, did not occur.
I just would like to make sure I understand the existing terms correctly or if there are new requirements for those. I created this as a new issue to keep the discussion in the original issue #6 close to its core topic.
The text was updated successfully, but these errors were encountered: