-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
materialSampleID #20
Comments
I vote for no change.
…On 11:52, Thu, Oct 14, 2021 Teresa Mayfield-Meyer ***@***.*** wrote:
Current Definition
https://dwc.tdwg.org/terms/#dwc:materialSampleID
An identifier for the MaterialSample (as opposed to a particular digital
record of the material sample). In the absence of a persistent global
unique identifier, construct one from a combination of identifiers in the
record that will most closely make the materialSampleID globally unique.
Please suggest changes/improvements in this issue.
See also
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#20>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADQ727QOMEDSRGDI3EDWYLUG3VDJANCNFSM5F73P7BA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
I would prefer removing "In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the materialSampleID globally unique." from the definition. |
HMMMM, I think @dagendresen has a point. It feels like we are encouraging not-so-good practices right in the definition. |
I disagree with @dagendresen and @Jegelewicz. After many issues with creating globally unique occurrenceIDs I have come to believe that it's better to create a unique ID using information in the dataset itself as opposed to creating a completely arbitrary GUID. My reason for this is that a unique ID can be recreated when needed (e.g. dataset updates) whereas randomly generated GUIDs cannot be recreated. They must be stored somewhere and that is currently not happening. Until we have some kind of DOI-like system for creating occurrenceIDs, materialSampleIDs, measurementIDs, and all the other IDs I think it's actually a good practice to create a globally unique identifier from a combination of identifiers in the record. |
I hear you, @albenson-usgs . My reason for not changing is more pragmatic, if we change this one, we should chenge every DwC ID term that uses the same pattern of definition. I'm not opposed to that. I think it would actually be cleaner to put that part in the usage comments. |
Sorry @tucotuco that my comment was not clear. I agree with you that it should not be changed. I think it's good advice actually. Although if we are talking about taking it out of the definition and moving it to the usage comments instead that would be fine with me. I am not in agreement that it should be removed completely from anywhere in the term. |
I would also be opposed to moving a similar text to the usage comments. I think composite identifiers are really bad advice! |
As a point of clarification: What do you think the prevailing attitude would be now towards pushing a distinction between the physical-object (<= material-sample <= preserved-specimen) and its digital catalog-record (=>information artifact)? The ID on the physical thing, our traditional catalog (aka accession# in botany), ties the specimen to the information record in the catalog (and is often reported in Material Examined of scientific publications); whereas the ID of the digital record containing data about the specimen can be made to comply with modern requirements (persistent, resolvable, globally unique). I added this distinction to the second figure I posted on a wiki page. Is this still be viewed as an unnecessary complication, or is there growing compliance in creating and using GUIDs for the digital record? |
@stanblum If I read your question correctly I do not think we should publically publish identifiers for the database record (Information artifact) - I believe most people working in the collections would think of the specimen IDs as identifying the physical specimen - and yes, that persistently identifying the database record of the specimen would be a confusing complication. I think that Note that I think the Digital Extended Specimen concept on the other hand is useful. (As in a Digital Specimen concept that anyone in the world can contribute to describing). |
Regarding pseudo-identifiers composed of pieces of data, or even worse of pieces of other identifiers, maybe we need something along materialSampleCode, similar to the institutionCode and collectionCode and catalogNumber (!) (could we simply use catalogNumber?). I would maybe rather suggest recommending to leave materialSampleID blank if the data publisher has no appropriate persistent identifier to put here ;-) |
To me, your proposal makes sense, and your diagram is intuitively clear and helpful for visualizing the overall concept. @dagendresen : my hope is that using both IDs and gaining experiences with them will make them intuitively understandable and matter-of-fact'ly acceptable. This process towards understanding and acceptance worked in the journal sector. There,
Both IDs are (more or less) human- and machine-readable and -actionable. Twenty years ago, when DOIs where first introduced, personally I had no use for them and thought them a bit superfluous, a tech thing. However, after two decades of practical experience, today I, and I would guess other users too, now seem to understand the advantages of both IDs and work with both routinely. For example, to find similar papers bundled in eg. a special issue, the old-fashioned "hardcopy" ID still is a good starting point. On the other hand, it is simply nifty to be able to click on a DOI in a publication's reference list. Yet, my impression is (from my personal use) that DOIs, ie. PIDs, are used more and more, while the old notation might be phasing out over the next decades. |
This is a complicated issue. I see a distinction between "computer-generated identifier assigned to represent the conceptual object" (e.g., a UUID), and "ID of the digital record containing data about the specimen". It's a subtle, but important, distinction. For example, I have a field that automatically assigns a UUID to every instance of MaterialSample in our databases. Although the UUID was "born" digitally, and does indeed uniquely represent the digital record for the physical object; the intention of the the identifier is that it represents the physical thing; not the digital record for the thing. So, when I generate an IPT dataset that is shared with GBIF, and the contents of that record are captured in the GBIF aggregated dataset, the UUID is retained. If the identifier was for the digital record, then it should not be transmitted to GBIF, because the record in GBIF is actually a different digital record, so would need to have a different identifier to represent that different digital record. Indeed, I think that our community only very rarely assigns or uses unique identifiers for digital records, and when they do, it should be extremely explicit (e.g., "this is the identifier assigned to the Bishop Museum database record for this specimen, and this is the identifier assigned to the GBIF database record for the same specimen, [etc.]") Yes, catalog numbers and accession numbers and sheet numbers and other human-friendly identifiers can be thought of as additional identifiers assigned to the same physical object, and that's fine -- many physical and abstract instances captured as data records in databases have more than one unique(ish) identifier assigned to them (e.g., see bioguid.org). But my point is, I think we need to be extremely explicit when trying to distinguish identifiers intended to represent the physical (or abstract) object, vs. identifiers intended to represent a particular digital record about that object. As to the issue at hand, I'm fine leaving the definition unchanged, mostly for the reasons mentioned by @tucotuco. My personal philosophy of persistent identifiers matches closely the sentiments expressed by @dagendresen. |
I wholeheartedly agree. An identifier, first and foremost, is like a name for a thing. It is used to refer to that thing and pick it out among other things (without necessarily providing a detailed representation of that thing). The scope of its uniqueness and its suitability for one purpose or another (e.g., human discourse, machine actionability) are important, but orthogonal concerns. While striving for uniqueness (a given identifier is used for one thing and one thing alone - and vice versa) is a worthwhile goal, in many cases a thing will effectively be referred to by more than one identifier (name), if alone for legacy reasons.
Using composite keys composed of attribute data is an acceptable way to enforce uniqueness and therefore act as identifier. However, persistence of these is quite problematic (knowledge about things can change) - which is why artificial, opaque identifiers that carry no semantics by themselves are preferred in many use cases. |
I would be happy with this definition update:
|
See also #6 (comment) |
From 2022-03-16 monthly meeting notes:
Stephen R. Jutta B. and Teresa M. discussed this at length on 2022-03-31 during working hour.
|
Also discussed today with @deepreef What will happen when institutions have multiple sample (skin,skeleton, tissue) with only a single identifier (catalog number)? We will need methods for making the "split" easy? |
I'd suggest being very clear about which IDs are keys, in what context; |
closing as no change needed and identifier class is out of scope for this Task Group |
Current Definition
https://dwc.tdwg.org/terms/#dwc:materialSampleID
Comments
Please suggest changes/improvements in this issue.
See also
The text was updated successfully, but these errors were encountered: