Do we need directly-attached stable part identifiers? #5231
Replies: 119 comments 1 reply
-
Thanks @Jegelewicz Yay: every part could have a way of being uniquely identified, I could avoid some semi-expensive joins, having multiple parts in a "base" container wouldn't necessarily mean they can't still be individually identified. Maybe not-so-yay: Barcodes are used for lots of things in addition to part IDs, this could be confusing when those are separated (or unstable - so useful only at limited scale - if that's somehow synchronized/maintained). The "workaround" is containers which exist only for the purposes of serving as part identifiers (for which I'd recommend a dedicated container type). That's nice because it fits into all existing workflows and requires no development, but it also requires some setup (getting the parts into the containers). Perhaps some of that could somehow be automated. |
Beta Was this translation helpful? Give feedback.
-
So we could auto assign all parts to a virtual container with an
autogenerated stable identifier?
…On Wed, Jun 2, 2021, 10:24 AM dustymc ***@***.***> wrote:
* [EXTERNAL]*
Thanks @Jegelewicz <https://github.com/Jegelewicz>
Yay: every part could have a way of being uniquely identified, I could
avoid some semi-expensive joins, having multiple parts in a "base"
container wouldn't necessarily mean they can't still be individually
identified.
Maybe not-so-yay: Barcodes are used for lots of things in addition to part
IDs, this could be confusing when those are separated (or unstable - so
useful only at limited scale - if that's somehow synchronized/maintained).
The "workaround" is containers which exist only for the purposes of
serving as part identifiers (for which I'd recommend a dedicated container
type). That's nice because it fits into all existing workflows and requires
no development, but it also requires some setup (getting the parts into the
containers). Perhaps some of that could somehow be automated.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3630 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBDZPRSQZHYFCD55EQLTQZLNRANCNFSM457B2RAQ>
.
|
Beta Was this translation helpful? Give feedback.
-
Yes.
No, but maybe we could fake it by semi-automating containering uncontainerized parts in your collection(s) or something.
That's up to you, but nothing in Arctos would change your containers (same as any other containers). |
Beta Was this translation helpful? Give feedback.
-
I think we should lean toward autogeneration and stability. Isn't this a path to the ultimate "material sample" that GGBN seeks? |
Beta Was this translation helpful? Give feedback.
-
That's probably an argument for something more specialized than containers. I think my concerns all center around usability, as above. Nothing good documentation can't bridge.... If we're going there, some sort of resolvable ID - URLs, ARKs, some short Arctos alternate URL that we could buy (and which I'd use for things like JSON), or whatever - would be cool.
The "DWC community" seems to remain at least partially convinced that institution_acronym + collection_cde can do something (it can't) so I'm not really holding my breath, but there is a materialSampleID (https://dwc.tdwg.org/terms/#materialSampleID) with a sane definition in the "core" (extension?? IDK, and IDK how to know!).
GGBN apparently has their own thing (https://terms.tdwg.org/wiki/GGBN_Material_Sample_Vocabulary), it does NOT carry an ID (that I can find). In either case, I believe there's at least the presumption of dependence - I don't think it could ever be "correct" to show DWC:MaterialSample data without also showing DWC:Occurrence data (but I'm not DWCologist, maybe I'm not understanding something). Arctos has no such inherent limitations, and it's common (at least in entomology) to just "cite" whatever's scribbled on the tube/pin/part no matter what else has been specified or agreed upon. This could be an opportunity for us to make "whatever's scribbled on the tube" something that browsers can use to get to the catalog record (or a subset of it). That comes back to the usability question - are CM's going to be able to use barcodes up to some point and then switch to "part IDs," or can we find a way to sync those so they don't have to (and what's that do for the possibility of buying pre-printed containers if so), or ??????????????????? |
Beta Was this translation helpful? Give feedback.
-
We have an incoming collection that wants to assign guids and separate part
identifiers in the field at time of collection. They want to know if they
can use their tissue identifiers for part barcodes. It would be ideal if
we could somehow incorporate this, giving a stable material sample ID at
collection, associated with a guide/url organism ID and occurrence ID . . .
Right now the closet thing we have for this is barcodes, and they mostly
work. But they are not / cannot be universally applied due to cost and
resources. If Arctos could provide a list of stable part identifiers that
could be downloaded and made into labels in advance and applied in the
field, and linked to an organism ID, maybe we could bypass NK numbers and
externally supplied barcodes?
…On Wed, Jun 2, 2021, 11:33 AM dustymc ***@***.***> wrote:
* [EXTERNAL]*
autogeneration and stability.
That's probably an argument for something more specialized than
containers. I think my concerns all center around usability, as above.
Nothing good documentation can't bridge....
If we're going there, some sort of resolvable ID - URLs, ARKs, some short
Arctos alternate URL that we could buy (and which I'd use for things like
JSON), or whatever - would be cool.
material sample
The "DWC community" seems to remain at least partially convinced that
institution_acronym + collection_cde can do something (it can't) so I'm not
really holding my breath, but there is a materialSampleID (
https://dwc.tdwg.org/terms/#materialSampleID) with a sane definition in
the "core" (extension?? IDK, and IDK how to know!).
GGBN
GGBN apparently has their own thing (
https://terms.tdwg.org/wiki/GGBN_Material_Sample_Vocabulary), it does NOT
carry an ID (that I can find).
In either case, I believe there's at least the presumption of dependence -
I don't think it could ever be "correct" to show DWC:MaterialSample data
without also showing DWC:Occurrence data (but I'm not DWCologist, maybe I'm
not understanding something).
Arctos has no such inherent limitations, and it's common (at least in
entomology) to just "cite" whatever's scribbled on the tube/pin/part no
matter what else has been specified or agreed upon. This could be an
opportunity for us to make "whatever's scribbled on the tube" something
that browsers can use to get to the catalog record (or a subset of it).
That comes back to the usability question - are CM's going to be able to
use barcodes up to some point and then switch to "part IDs," or can we find
a way to sync those so they don't have to (and what's that do for the
possibility of buying pre-printed containers if so), or ???????????????????
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3630 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBFB7YSEQGHKFD4R3RTTQZTOTANCNFSM457B2RAQ>
.
|
Beta Was this translation helpful? Give feedback.
-
I haven't seen that work yet, but as long as they can keep their numbers straight it's not a problem for Arctos.
Sure, that's always been possible/recommended, and it (along with good procedures) might be an actual way to "keep their numbers straight."
You've lost me and that seems to conflict with above - explain, please.
I'm still missing a big piece of the puzzle, but ARKs might be an easy way to get those. If you'll settle for a bit less stable, grab a series of barcodes of whatever format you want and do WHATEVER with them - get them printed, print them yourself, attempt to transcribe them, .... |
Beta Was this translation helpful? Give feedback.
-
OK, this is probably crazy-talk because it occurred to me in the middle of night, but Arctos is cataloging occurrences (identification at a place and time) NOT collections (I have this thing from this occurrence). @dustymc is always hammering home that we are not cataloging the "item of interest" and this is absolutely true. Almost every prospective institution asks about catalog numbers like 12345.1 so that they can track the various parts associated with some thing (usually a plant or animal, but other stuff too with catalog number 12345) that they are managing. Because we are so focused on the event, the parts are secondary in the system. The problem comes from the fact that for the majority of our collection managers, the parts are really the focus, but we don't number/track them well. We have now created the "part identifier" attribute to get around this, but it only creates more work for collections. Barcodes are great - but they apply to containers, not parts and I think we need to keep that distinction. I think we need to look at MaterialSampleID:
Would it be possible to construct such an ID with the object url + Arctos part number? Or should parts be assigned a "GUID" equal to the Catalog record "GUID" + part number? (Perhaps we need both, one for humans, the other for machines) If Arctos could do this for us in a way that makes it easy for us, that would be GREAT. I think the thing we need to figure out is what this "part number" should be. While the part number assigned in the parts code table is nice, it isn't known until the part is entered. How can we best accomplish this? |
Beta Was this translation helpful? Give feedback.
-
That's mostly "just UI" - Arctos is truly normalized, seeing it as a part management system (with catalog records as metadata) is a valid viewpoint. (So is seeing it as an event system, if you want to go there.)
Again, parts are 100% containers. The current level of container that can have an exposed identifier isn't in a 1:1 relationship with parts so I'm not suggesting what we have fully does what we should be doing, but there is always and inevitably a container that is in the correct relationship with parts, and it might serve this purpose (depending on what precisely that turns out to be).
The origins of that are a case study in how to not do science. Strongly suggest just avoiding that situation in exploring how to move forward.
I don't think there's anything exactly wrong with that, but it will inevitably get used in the wrong context so I'd rather avoid it.
We need to figure out what it DOES before we think about what it looks like. Eg deleting catalog records (==destroying GUIDs) is fairly difficult (it would be impossible if I had my way) because those are "citable" - minting them comes with some implicit (it would be explicit in my little fantasy world) promise that they'll be suitable for certain purposes, and that demands certain behavior from the creators. "Minting" UUIDs (or internal keys, etc.) is an act of convenience - once they've served whatever purpose they've been created to serve they can be deleted and nobody cares. I think the first question is, which of those situations is more analogous to what should be done here? If that answer turns out to be what I think, the second question involves our ability to live without (or with limited access to) 'delete part' buttons. |
Beta Was this translation helpful? Give feedback.
-
@Jegelewicz Not crazy talk!
YES!!!!!
It helps keep track of how parts are used! I want to see parts tied to the outside identifiers. Liver part --> Loan--> Project --> Publication --> Genank etc. It wasn't the skull or postcranial or the kidney that lead to all that extra data about that occurrence. |
Beta Was this translation helpful? Give feedback.
-
I think we are both pretty sure what the answer is and given @campmlc comment I think she does too. This also ties in with the Mexican Wolf scenarios and having events tied to the parts they came from. In my mind right now the answer is that we are cataloging the wrong way. A basic catalog record only requires an identification and a locality but NO PART. How does that make sense when we are managing PARTS? It should be the other way around - I should be able to catalog a part with absolutely no other information because the most important thing in that moment is that I can find the part and match it up with all of the other information. OK, before anyone jumps on me, I realize that I can put unknown everywhere (even for part name) but it feels wrong. Not saying we can't train people to do it though. Anyway, I think our problems mostly stem from putting too many parts in a single catalog record. If a part is important enough to have an associated GenBank sequence, maybe it needs it's own catalog number. Because all the parts from an event can share an event, we should not be afraid to do this. And yes, it will require a new pricing strategy.... As @dustymc says - catalog the item of interest and apparently that is not Andalgalomys pearsoni dorbignyi but one of these - And by the way, which of these ended up as these? In case anyone is interested - this ties in with tdwg/dwc#314 (comment) |
Beta Was this translation helpful? Give feedback.
-
FWIW - our new entity module could help here...all the "organism" type attributes could go there and would not need to be re-created in every catalog record. |
Beta Was this translation helpful? Give feedback.
-
@Jegelewicz Another example! In a paper, the Arctos interns found two UAM no data bison cited. Yeah, they had no data but data has now been generated about those parts. Unfortunately, they were not cataloged and we don't know which is which. The part has the data but also continues to generate MORE data.
Why is this a problem?
That is going to go over like a lead balloon. GGBN has a very similar model where all parts are separated out. |
Beta Was this translation helpful? Give feedback.
-
I think there are two components of that:
That's but one use case. Catalog records are and always have been "whatever someone felt like cataloging." I don't see any realistic possibility of that changing, and I don't see much reason to attempt to change it. There are usability implications to cataloging a bucket of guppies or each of the 47 slices of liver, but sometimes reality (or tradition) ends up in strange places anyway. Mostly I'm just not sure why you'd want to juggle more data than you have to - this just doesn't make any sense to me.
Catalog numbers are special only because Curators have decided to treat them that way. A part identifier (assuming some decent design and curatorial commitment and all that jazz) can do ~everything a catalog number can do (and some other stuff), just issue them and change your loan agreements.
I think that's almost always the biological individual (where that's easy to define, anyway), and I don't think this one's any different - the focus is population-level stuff, the individual is representative (everyone hopes!) of that, the sample is just a way to get at characteristics of the individual. If nothing else, it's a lot easier to see that 27 methods all fail to reject that critter being a member of Andalgalomys pearsoni when those data are attached to a single data object.
....doesn't make any sense as a replacement for catalog records; it just doesn't have the structure to stand like that. It's not too late to ditch the thing and just let some new value in https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcataloged_item_type define entities. (#1966 (comment)) |
Beta Was this translation helpful? Give feedback.
-
You've heard of these guys, right? https://en.wikipedia.org/wiki/Led_Zeppelin |
Beta Was this translation helpful? Give feedback.
-
I've never proposed that?!?
I don't think emulating our past mistakes is a great model.
"It might be there but we're not telling you" doesn't seem worthy of investment.
Depends on which identifier you use.
My opinions on that haven't changed! We can do something awesome here, but it will require a curatorial commitment ("pre-commitment" might be a better way to view it?), or we can use existing tools to do less-awesome stuff. (Which could still be pretty awesome, but it's not structurally constrained to awesomeness.)
There's no difference, minus the "structurally constrained" bits. Grab an ARK-or-whatever, stuff it in part attributes, demand your loan recipients use it, be careful not to hide it, and you've done exactly what we're proposing here. This would just make "be careful not to hide it" something you don't need to worry about (and maybe make the "grab..." step a bit easier, but we could do that without this). |
Beta Was this translation helpful? Give feedback.
-
There are legitimate reasons to encumber information that we cannot ignore. |
Beta Was this translation helpful? Give feedback.
-
I've just proposed prohibiting mask record (and I still don't think this is worth doing without that) - other current or future types of encumbrances would not be affected, as long as they leave SOMETHING behind. A "most everything but still there" encumbrance might even help atone for past sins, although that would of course ultimately be up to the collections.
|
Beta Was this translation helpful? Give feedback.
-
Could we encumber identification, higher geog, locality, collector etc -
the whole shebang- but leave the record shell with URL?
…On Tue, May 3, 2022, 3:00 PM dustymc ***@***.***> wrote:
* [EXTERNAL]*
legitimate reasons to encumber information
I've just proposed prohibiting mask record (and I still don't think this
is worth doing without that) - other current or future types of
encumbrances would not be affected, as long as they leave SOMETHING behind.
A "most everything but still there" encumbrance might even help atone for
past sins, although that would of course ultimately be up to the
collections.
***@***.***>> select count(*) from flat
arctos-> inner join coll_object_encumbrance on flat.collection_object_id=coll_object_encumbrance.collection_object_id
arctos-> inner join encumbrance on coll_object_encumbrance.encumbrance_id=encumbrance.encumbrance_id and encumbrance_action='mask record'
arctos-> inner join citation on flat.collection_object_id=citation.collection_object_id
arctos-> ;
count
-------
37462
—
Reply to this email directly, view it on GitHub
<#3630 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBEHE3PJ5YN6TMQ4ERDVIGHU5ANCNFSM457B2RAQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Found an example of why we don't want to assign permanent PIDs to parts without validating legacy parts first - see all the false duplicate parts associated with this record (anything without a part location path, which may not be visible to those not logged in): https://arctos.database.museum/guid/MSB:Mamm:83457 |
Beta Was this translation helpful? Give feedback.
-
Quite of few of these are real parts - and once I validate them, I'd like to be able to assign a permanent ID to confirm their validity. I'd rather not be forced to slap an actual barcode on to the vial to do this - that is the point of having the part ID. Possible? |
Beta Was this translation helpful? Give feedback.
-
Technical: #3630 (comment) (very restricted, heavily documented bulkloader) still looks like the only plausible path to implementation; maybe you think I suggested something else?? Social: This is still pointless until someone commits to to demanding citations by partID - there are lots of easier paths (for all of us, in all directions) to "confirm validity." |
Beta Was this translation helpful? Give feedback.
-
I guess my question is: is the current part ID stable within Arctos? If it
is, I could envision gradually shifting over to using Arctos assigned PIDs
as barcodes, even minting url-based PIDs. SOMEONE needs to start doing this
before we can start asking users to cite them - we have to be the horse
before the cart.
If it is not, and the PID may randomly change for no reason . . . then that
won't work.
…On Fri, Aug 12, 2022 at 5:55 PM dustymc ***@***.***> wrote:
* [EXTERNAL]*
Technical: #3630 (comment)
<#3630 (comment)>
(very restricted, heavily documented bulkloader) still looks like the only
plausible path to implementation; maybe you think I suggested something
else??
Social: This is still pointless until someone commits to to demanding
citations by partID - there are lots of easier paths (for all of us, in all
directions) to "confirm validity."
—
Reply to this email directly, view it on GitHub
<#3630 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBEENWHSWJ5XLQGOQI3VY3QABANCNFSM457B2RAQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
@campmlc I believe it is completely up to you. You can MAKE the part ID stable. See #3630 (comment) Until someone experiments and DOES this, we are going to keep having circular conversations. I was going to try this in test, but I get this:
|
Beta Was this translation helpful? Give feedback.
-
The "core" of that is functional in a few places, we've discussed the stability (lack thereof) of it many times.
We're having this conversation because ya'll convinced me barcodes are not suitable. (And you're right - they wear out, hold lots of parts, hold nothing, hold things that aren't parts, aren't used at all for political reasons, etc.) (Some sort of resolvable PID would still be fabulous barcodes, would seamlessly deal with the 'someone cited the barcode' scenario, let anyone get to at least where parts used to be, etc., but as they're used now they're not quite interchangeable.)
It's going to be a lot of work - but not much innovation - to make them do what they need to do to be stable, and there's a huge curatorial commitment involved. This is maybe closer to buying a horse and cart today (except I'm going to wave my wand and the horse won't conveniently keel over about the time the kids leave for college, your great grandkids will still need to feed it) - if you're not SURE you're going to use it then it'll just hang around and take space and consume resources and maybe make a huge mess from time to time all without really giving anything back.
It is not, this is discussed above, maybe I could work up a summary or something if that's useful, but I think it would just end up being a bad representation of this whole thread, and this whole thread needs read, carefully, before making any decisions.
Yes but no. I'm _probably_not going to mess with them, but you delete parts (even those that claim to be used) and 'mask record' encumber and such about every day. As is, part IDs are not suitable for citation. They are (usually) suitable for local timely things.
There's not much to experiment with. You say "hey borrower, use this OR ELSE", let me know about that and I make sure the identifier never changes, you go on to make sure you have policies and documentation so you don't toss it out and just not bother deleting it from the DB or reuse the identifier for something else and etc., and now there is in a very real way a physical item attached to a publication. Then we all swoon because that's sciencey. I don't think there's a less-rigorous yet still defensible approach, and maybe that's simply more than you can commit to, maybe even if the current CM, Curator, and Director think it's a fabulous idea. If that's the case then https://arctos.database.museum/info/ctDocumentation.cfm?table=ctspecpart_attribute_type#part_identifier is still available, and I could mint PIDs or get ARKs or something to go there. Those identifiers would be stable (as in, I'm not going to delete them), but there'd be no technology keeping them attached to things that exist and such, you (or your successor) can break a million publication<-->part links with a handwave and I can't stop it. Maybe that's still a decent babystep - if we start seeing those things pop up in GenBank and that changes our view then they could be elevated to some more "forever" structure, if it turns out nobody's going to USE them they can be quietly "sent to the farm" without making the world (rightly) think that Arctos itself is broken. |
Beta Was this translation helpful? Give feedback.
-
Discussion during office hour today.
|
Beta Was this translation helpful? Give feedback.
-
The answer is yes, but We need a disposition that allows collection managers to flag parts that were entered in error. Media group discussed and think that a disposition of mis-assigned that would operate in the way missing currently does (greyed out on the catalog record page, not available to loan). These parts could involve a re-direct (in remark or perhaps an attribute) to the correctly assigned part if that would be useful. |
Beta Was this translation helpful? Give feedback.
-
Media uses DB keys and would have no bearing on this. |
Beta Was this translation helpful? Give feedback.
-
If I understand correctly, the comment about media is only tangential. The actionable proposal would be to add a disposition of "misassigned", which would grey out on the record page like used up objects,and ideally not publish to aggregators. Or perhaps we could encumber these parts so they are only visible as greyed out parts curatorially. Even better would be to allow redirects for these parts when the get associated with the correct record. I think we should continue to push for stable part identifiers, with some option like this. |
Beta Was this translation helpful? Give feedback.
-
Happy to have a meeting to discuss. There are actionable proposals here. |
Beta Was this translation helpful? Give feedback.
-
Picking a specific part out of the pile is a lot of work. Once that happens, you can
Barcodes, real or otherwise, serve nicely as unique identifiers. In this case, giving everything a barcode would mean you can add the attributes later, gives you a super-easy way to eventually add to the loan, and probably provides a pathway to whatever you mean by "loan subsamples in the same virtual container."
(I've been wondering if we need directly-attached stable part IDs for a while, and maybe we do - new Issue - but they're not available NOW and barcodes are.)
Originally posted by @dustymc in #3627 (comment)
Beta Was this translation helpful? Give feedback.
All reactions