materialSampleID #20

Jegelewicz · 2021-10-14T14:52:25Z

Current Definition

https://dwc.tdwg.org/terms/#dwc:materialSampleID

An identifier for the MaterialSample (as opposed to a particular digital record of the material sample). In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the materialSampleID globally unique.

Comments

Recommended best practice is to use a persistent, globally unique identifier.

Please suggest changes/improvements in this issue.

On 11:52, Thu, Oct 14, 2021 Teresa Mayfield-Meyer ***@***.*** wrote: Current Definition https://dwc.tdwg.org/terms/#dwc:materialSampleID An identifier for the MaterialSample (as opposed to a particular digital record of the material sample). In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the materialSampleID globally unique. Please suggest changes/improvements in this issue. See also — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#20>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADQ727QOMEDSRGDI3EDWYLUG3VDJANCNFSM5F73P7BA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

dagendresen · 2021-10-14T19:29:56Z

I would prefer removing "In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the materialSampleID globally unique." from the definition.

Jegelewicz · 2021-10-14T19:37:36Z

HMMMM, I think @dagendresen has a point. It feels like we are encouraging not-so-good practices right in the definition.

albenson-usgs · 2021-10-14T19:59:46Z

I disagree with @dagendresen and @Jegelewicz. After many issues with creating globally unique occurrenceIDs I have come to believe that it's better to create a unique ID using information in the dataset itself as opposed to creating a completely arbitrary GUID. My reason for this is that a unique ID can be recreated when needed (e.g. dataset updates) whereas randomly generated GUIDs cannot be recreated. They must be stored somewhere and that is currently not happening. Until we have some kind of DOI-like system for creating occurrenceIDs, materialSampleIDs, measurementIDs, and all the other IDs I think it's actually a good practice to create a globally unique identifier from a combination of identifiers in the record.

tucotuco · 2021-10-14T20:33:32Z

I hear you, @albenson-usgs . My reason for not changing is more pragmatic, if we change this one, we should chenge every DwC ID term that uses the same pattern of definition. I'm not opposed to that. I think it would actually be cleaner to put that part in the usage comments.

albenson-usgs · 2021-10-14T20:38:02Z

Sorry @tucotuco that my comment was not clear. I agree with you that it should not be changed. I think it's good advice actually.

Although if we are talking about taking it out of the definition and moving it to the usage comments instead that would be fine with me. I am not in agreement that it should be removed completely from anywhere in the term.

dagendresen · 2021-10-14T20:42:21Z

I would also be opposed to moving a similar text to the usage comments. I think composite identifiers are really bad advice!

stanblum · 2021-10-15T03:14:07Z

As a point of clarification: What do you think the prevailing attitude would be now towards pushing a distinction between the physical-object (<= material-sample <= preserved-specimen) and its digital catalog-record (=>information artifact)? The ID on the physical thing, our traditional catalog (aka accession# in botany), ties the specimen to the information record in the catalog (and is often reported in Material Examined of scientific publications); whereas the ID of the digital record containing data about the specimen can be made to comply with modern requirements (persistent, resolvable, globally unique). I added this distinction to the second figure I posted on a wiki page. Is this still be viewed as an unnecessary complication, or is there growing compliance in creating and using GUIDs for the digital record?

dagendresen · 2021-10-15T05:58:44Z

@stanblum If I read your question correctly I do not think we should publically publish identifiers for the database record (Information artifact) - I believe most people working in the collections would think of the specimen IDs as identifying the physical specimen - and yes, that persistently identifying the database record of the specimen would be a confusing complication.

I think that materialSampleID identifies the physical material sample.

Note that I think the Digital Extended Specimen concept on the other hand is useful. (As in a Digital Specimen concept that anyone in the world can contribute to describing).

dagendresen · 2021-10-15T07:32:41Z

Regarding pseudo-identifiers composed of pieces of data, or even worse of pieces of other identifiers, maybe we need something along materialSampleCode, similar to the institutionCode and collectionCode and catalogNumber (!) (could we simply use catalogNumber?). I would maybe rather suggest recommending to leave materialSampleID blank if the data publisher has no appropriate persistent identifier to put here ;-)

jbstatgen · 2021-10-15T10:14:45Z

@stanblum #20 (comment)

To me, your proposal makes sense, and your diagram is intuitively clear and helpful for visualizing the overall concept.

@dagendresen : my hope is that using both IDs and gaining experiences with them will make them intuitively understandable and matter-of-fact'ly acceptable. This process towards understanding and acceptance worked in the journal sector. There,

the DOI corresponds to the ID of the Digital Specimen, ie. the "digital catalog-record (=>information artifact)". It is a persistent, globally unique, etc. ID (a PID).
The local catalog (and/or accession) number/ID of the physical specimen corresponds to the <JournalName> <volume>(<issue>): <pages> (<year>) notation. This is a local ID, which in addition is a compound of other IDs.

Both IDs are (more or less) human- and machine-readable and -actionable.

Twenty years ago, when DOIs where first introduced, personally I had no use for them and thought them a bit superfluous, a tech thing. However, after two decades of practical experience, today I, and I would guess other users too, now seem to understand the advantages of both IDs and work with both routinely. For example, to find similar papers bundled in eg. a special issue, the old-fashioned "hardcopy" ID still is a good starting point. On the other hand, it is simply nifty to be able to click on a DOI in a publication's reference list. Yet, my impression is (from my personal use) that DOIs, ie. PIDs, are used more and more, while the old notation might be phasing out over the next decades.

deepreef · 2021-10-15T11:15:50Z

As a point of clarification: What do you think the prevailing attitude would be now towards pushing a distinction between the physical-object (<= material-sample <= preserved-specimen) and its digital catalog-record (=>information artifact)? The ID on the physical thing, our traditional catalog (aka accession# in botany), ties the specimen to the information record in the catalog (and is often reported in Material Examined of scientific publications); whereas the ID of the digital record containing data about the specimen can be made to comply with modern requirements (persistent, resolvable, globally unique). I added this distinction to the second figure I posted on a wiki page. Is this still be viewed as an unnecessary complication, or is there growing compliance in creating and using GUIDs for the digital record?

This is a complicated issue. I see a distinction between "computer-generated identifier assigned to represent the conceptual object" (e.g., a UUID), and "ID of the digital record containing data about the specimen". It's a subtle, but important, distinction.

For example, I have a field that automatically assigns a UUID to every instance of MaterialSample in our databases. Although the UUID was "born" digitally, and does indeed uniquely represent the digital record for the physical object; the intention of the the identifier is that it represents the physical thing; not the digital record for the thing. So, when I generate an IPT dataset that is shared with GBIF, and the contents of that record are captured in the GBIF aggregated dataset, the UUID is retained. If the identifier was for the digital record, then it should not be transmitted to GBIF, because the record in GBIF is actually a different digital record, so would need to have a different identifier to represent that different digital record.

Indeed, I think that our community only very rarely assigns or uses unique identifiers for digital records, and when they do, it should be extremely explicit (e.g., "this is the identifier assigned to the Bishop Museum database record for this specimen, and this is the identifier assigned to the GBIF database record for the same specimen, [etc.]")

Yes, catalog numbers and accession numbers and sheet numbers and other human-friendly identifiers can be thought of as additional identifiers assigned to the same physical object, and that's fine -- many physical and abstract instances captured as data records in databases have more than one unique(ish) identifier assigned to them (e.g., see bioguid.org).

But my point is, I think we need to be extremely explicit when trying to distinguish identifiers intended to represent the physical (or abstract) object, vs. identifiers intended to represent a particular digital record about that object.

As to the issue at hand, I'm fine leaving the definition unchanged, mostly for the reasons mentioned by @tucotuco. My personal philosophy of persistent identifiers matches closely the sentiments expressed by @dagendresen.

cboelling · 2021-10-15T11:45:21Z

I think we need to be extremely explicit when trying to distinguish identifiers intended to represent the physical (or abstract) object, vs. identifiers intended to represent a particular digital record about that object.

I wholeheartedly agree.

An identifier, first and foremost, is like a name for a thing. It is used to refer to that thing and pick it out among other things (without necessarily providing a detailed representation of that thing). The scope of its uniqueness and its suitability for one purpose or another (e.g., human discourse, machine actionability) are important, but orthogonal concerns. While striving for uniqueness (a given identifier is used for one thing and one thing alone - and vice versa) is a worthwhile goal, in many cases a thing will effectively be referred to by more than one identifier (name), if alone for legacy reasons.

Regarding pseudo-identifiers composed of pieces of data, or even worse of pieces of other identifiers, maybe we need something along materialSampleCode, similar to the institutionCode and collectionCode and catalogNumber (!) (could we simply use catalogNumber?)

Using composite keys composed of attribute data is an acceptable way to enforce uniqueness and therefore act as identifier. However, persistence of these is quite problematic (knowledge about things can change) - which is why artificial, opaque identifiers that carry no semantics by themselves are preferred in many use cases.

cboelling · 2021-10-15T12:00:03Z

I would be happy with this definition update:

An identifier for a ~~the~~ MaterialSample. (as opposed to a particular digital record of the material sample). In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the materialSampleID globally unique.

Comments

Recommended best practice is to use a persistent, globally unique identifier.

Jegelewicz · 2021-10-15T16:25:04Z

But my point is, I think we need to be extremely explicit when trying to distinguish identifiers intended to represent the physical (or abstract) object, vs. identifiers intended to represent a particular digital record about that object.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

materialSampleID #20

materialSampleID #20

Jegelewicz commented Oct 14, 2021 •

edited

Loading

tucotuco commented Oct 14, 2021 via email

dagendresen commented Oct 14, 2021 •

edited

Loading

Jegelewicz commented Oct 14, 2021

albenson-usgs commented Oct 14, 2021 •

edited

Loading

tucotuco commented Oct 14, 2021

albenson-usgs commented Oct 14, 2021 •

edited

Loading

dagendresen commented Oct 14, 2021

stanblum commented Oct 15, 2021 •

edited

Loading

dagendresen commented Oct 15, 2021 •

edited

Loading

dagendresen commented Oct 15, 2021

jbstatgen commented Oct 15, 2021

deepreef commented Oct 15, 2021

cboelling commented Oct 15, 2021 •

edited

Loading

cboelling commented Oct 15, 2021

Jegelewicz commented Oct 15, 2021 •

edited

Loading

Jegelewicz commented Apr 4, 2022 •

edited

Loading

Jegelewicz commented Apr 4, 2022

dr-shorthair commented Apr 25, 2022

Jegelewicz commented Apr 26, 2023

materialSampleID #20

materialSampleID #20

Comments

Jegelewicz commented Oct 14, 2021 • edited Loading

Current Definition

See also

tucotuco commented Oct 14, 2021 via email

dagendresen commented Oct 14, 2021 • edited Loading

Jegelewicz commented Oct 14, 2021

albenson-usgs commented Oct 14, 2021 • edited Loading

tucotuco commented Oct 14, 2021

albenson-usgs commented Oct 14, 2021 • edited Loading

dagendresen commented Oct 14, 2021

stanblum commented Oct 15, 2021 • edited Loading

dagendresen commented Oct 15, 2021 • edited Loading

dagendresen commented Oct 15, 2021

jbstatgen commented Oct 15, 2021

deepreef commented Oct 15, 2021

cboelling commented Oct 15, 2021 • edited Loading

cboelling commented Oct 15, 2021

Jegelewicz commented Oct 15, 2021 • edited Loading

Jegelewicz commented Apr 4, 2022 • edited Loading

Jegelewicz commented Apr 4, 2022

dr-shorthair commented Apr 25, 2022

Jegelewicz commented Apr 26, 2023

Jegelewicz commented Oct 14, 2021 •

edited

Loading

dagendresen commented Oct 14, 2021 •

edited

Loading

albenson-usgs commented Oct 14, 2021 •

edited

Loading

albenson-usgs commented Oct 14, 2021 •

edited

Loading

stanblum commented Oct 15, 2021 •

edited

Loading

dagendresen commented Oct 15, 2021 •

edited

Loading

cboelling commented Oct 15, 2021 •

edited

Loading

Jegelewicz commented Oct 15, 2021 •

edited

Loading

Jegelewicz commented Apr 4, 2022 •

edited

Loading