Organism ID #1966

Jegelewicz · 2019-03-13T17:48:09Z

Issue Documentation is http://handbook.arctosdb.org/how_to/How-to-Use-Issues-in-Arctos.html

Is your feature request related to a problem? Please describe.

We have been working with organisms for which we have multiple occurrences, specifically Mexican Wolves in the Mexican Wolf recovery program. Throughout their lives, samples of blood are taken from these animals and deposited in the genomic resources collection at MSB. Traditionally, each set of samples (all from the same day) have been given a single catalog number. This results in multiple cataloged items for a single organism, which we can link to each other using the “same individual as” relationship.

These relationships are nice, but they don't allow us to see ALL events for an individual in one place. and they require the addition of a new relationship for ALL related cataloged items every time a new collection of blood is made. Each cataloged item includes the other ID “Mexican Wolf Studbook Number” and we have modified the Other ID url so that clicking this other ID allows us to find all of the samples from any given animal.

This method works, but there is one issue we need to address.

When our data leaves Arctos and is ingested by aggregators such as GBIF and iDigBio, there is no easy way for anyone using the data there to make the connection that the various cataloged items are all from the same animal. Although the Mexican Wolf Studbook numbers are included in the list of related IDs, the connection just isn’t as tight as we would like it to be.

Describe the solution you'd like

Our proposed solution is to make use of the Darwin Core field “Organism ID”. We envision this as a separate and distinct other ID – one which provides a link to all related specimens (the results of that link would look just like the search result you see when you search one of the Mexican Wolf Studbook numbers):

This identifier would be passed to aggregators in the “Organism ID” field – allowing those using the data there to make the appropriate connection between the related cataloged items. Currently it appears that we are just passing the catalog item to that field

which is what led to the solution we have been attempting to make work in #1545. This has created problems with data entry and maintenance on our end. This new solution will allow us to keep events matched with parts and parts matched with accessions. It will simplify data entry and end the need for the links between events and parts.

We envision a new code table: CTCOLL_ORGANISM_ID set up very much like CTCOLL_OTHER_ID_TYPE where:

IDType = text “Mexican Wolf Studbook Number”

Description = definition of the IDType Studbook number assigned by the Mexican Wolf Recovery Program

BaseURI = http://arctos.database.museum/SpecimenResults.cfm?oidtype=Mexican%20wolf%20studbook%20number&oidnum=

When the Organism ID is used, there would be no need for all of the “same organism as” relationships, but they could be used if a collection so desired. Every cataloged item that included an Organism ID would instead appear like this:

With the text “Mexican Wolf Studbook Number: 1216” being a link taking you to the search results:

We would hope that this link could also be what appears at the aggregators in their “Organism ID” field:

Describe alternatives you've considered
The major challenge we see with this method is how to assign unique Organism IDs for things where there isn’t an obvious one. The Mexican Wolves (and eventually the Red Wolves that are expected to come in from Arkansas) and NEON recaptures are examples of when we would be using this method. These all have obvious unique identifiers (studbook numbers and NEON sample ID numbers). However, when the skin and skeleton of an animal are at DMNS and the tissues for that same animal are at MSB, there is no obvious organism ID type and we would need to come up with one. We are open to suggestions for how best to accomplish this.

What have we missed?

Additional context
See above

Priority
I would like to have this resolved by date: soonish

Jegelewicz · 2019-03-13T17:56:06Z

I have passed this by John Wieczorek and here is our discussion:

The proposal to use dwc:organismID in Darwin Core resource is right on target. That is exactly what the field is meant for. You are right that Arctos is passing the id for the cataloged item in that field right now. The reasoning was based on the majority of cases, where the cataloged item corresponds to an Organism. Rigorously speaking, I think this is a mistake, because cataloged item does not always correspond to an Organism, and in Arctos, we don't have a fail-proof method of a knowing when it does, and when it doesn't. Given that, I think we should unmap organismID from the cataloged item in all Arctos resources.

I have looked at the proposal for the new code table (CTCOLL_ORGANISM_ID). I think this is unnecessary and unsustainable. I think a sufficient solution, which is also the most scalable, is to add a new type in CTCOLL_OTHER_ID_TYPE, called "organism identifier" or similar. Curators would have the freedom to create a (single) organism identifier, and that should be a persistent resolvable GUID. It could refer to any organism within Arctos, or outside it. Note that in the case of the Mexican Wolf Studbook Number, there would be two entries in the COLL_OTHER_ID table for each cataloged item - one with type "Mexican Wolf Studbook Number", which holds the number, and one with type "organism identifier" with the resolvable GUID to the organism.

There will be issues of "persistence" and of primacy (if two data publishers have distinct organismIDs, which should be used?), but those will exist outside of the scope of the immediate problem anyway. It's something that could conceivably be solved at a level above the publication of primary occurrence data.

Following what I am proposing above, there would be no need to communicate anything to GBIF, iDigBio, or GGBN. We would be following the intended use of dwc:organismID. The misunderstandings from iDigBio and GGBN are around the conflation of Occurrences by Arctos, not about the concept of Organism. The proposed solutions do not save us with respect to GBN either. With them the issue is that they want records of tissue samples, while everyone else in the world expects Occurrences, and these are not always the same thing, especially in Arctos. So, we still have to make distinct resources for GGBN, unfortunately.

My response:

I'm not sure I can wrap my brain around the other ID type solution. I feel like what you describe is what we do with the Mexican Wolves now - how would the GUIDs be created and where would they "live"?

I'm not the most technical person, so without a demo, it's just hard for me to see how two independent Other IDs will resolve to a GUID somewhere...but the idea seems the same as what I proposed just technically more stable? If so, I am on board and I agree that we need to stop sending catalog number as Organism ID AND that MSB needs to stop trying to catalog all collections for a single wolf in a single catalog number - which is why I proposed the solution I did - it is just too messy and information is lost in the process.

This is coming to the forefront for other reasons: tdwg/dwc-qa#131

I'd like to create a simple solution to the organism issue - it really shouldn't be that difficult within Arctos. The problem of everyone agreeing on an ID when you consider stuff outside of Arctos is something we need to tackle as a larger community and is related to unique identifiers in general. Let me know how I can help push a solution forward and I'll do everything I can!

John responds:

I'm not sure I can wrap my brain around the other ID type solution. I feel like what you describe is what we do with the Mexican Wolves now - how would the GUIDs be created and where would they "live"?
In Arctos the GUIDs would live in the Coll_Obj_Other_ID_Num table with an OTHER_ID_TYPE of "organism identifier". Curators would be responsible for entering these (read "danger").

I'm not the most technical person, so without a demo, it's just hard for me to see how two independent Other IDs will resolve to a GUID somewhere...but the idea seems the same as what I proposed just technically more stable? If so, I am on board and I agree that we need to stop sending catalog number as Organism ID AND that MSB needs to stop trying to catalog all collections for a single wolf in a single catalog number - which is why I proposed the solution I did - it is just too messy and information is lost in the process.

Two independent Other IDs do not resolve to a GUID somewhere. One of the IDs says "I am this Mexican Wolf Sudbook Number", the other says, "my dwc:orgnismID is this". Hey, maybe that's what to put in the CTOTHER_ID_TYPE table - "dwc:organismID" - it would be quite explicit.
This is coming to the forefront for other reasons: tdwg/dwc-qa#131
I'd like to create a simple solution to the organism issue - it really shouldn't be that difficult within Arctos. The problem of everyone agreeing on an ID when you consider stuff outside of Arctos is something we need to tackle as a larger community and is related to unique identifiers in general. Let me know how I can help push a solution forward and I'll do everything I can!
True. It is a community issue. Arctos is a great resource for pushing the limits of what we are able to do. For many outside it is way too far ahead, despite the fact that for some inside it doesn't do all we might want.

From me:

In Arctos the GUIDs would live in the Coll_Obj_Other_ID_Num table with an OTHER_ID_TYPE of "organism identifier". Curators would be responsible for entering these (read "danger").
The "danger"is what I was hoping to avoid with the separate table for organism ID - using "Mexican Wolf Studbook Number" as the base of the ID means we don't get "Mexican wolf studbook number 1216", "Mex Wolf Studbook No. 1216", etc.

Jegelewicz · 2019-03-13T18:35:45Z

Two independent Other IDs do not resolve to a GUID somewhere. One of the IDs says "I am this Mexican Wolf Sudbook Number", the other says, "my dwc:orgnismID is this". Hey, maybe that's what to put in the CTOTHER_ID_TYPE table - "dwc:organismID" - it would be quite explicit.

To be clear - I don't propose there be two IDs, but to MOVE those other IDs that are truly Organism IDs to the new table.

dustymc · 2019-03-13T18:59:38Z

In general, I think having some sort of "individual ID" would be very useful. It's not at all clear to me why it would be in a separate table; that invites more denormalization (doing the same thing multiple ways), inevitably leading to even bigger messes.

If the scope of this is Arctos, we could exploit relationships to assemble "individuals" and/or individualID without adding any overhead - there's much more discussion on that in #1545 - and see below.

I believe that this is implicitly a proposal to recatalog http://arctos.database.museum/guid/MSB:Mamm:292063 as 5 specimens. At least for some use cases that goes against the "catalog the item of scientific interest" mantra; eventually two of the samples from the same wolf will be compared in a publication. I'm not sure that's more evil than the current situation, where 5 samples collected at different times under different conditions are likely seen as equivalent to 5 tubes from the same liver of another specimen, but it should be acknowledged. I think any consistent documented approach is an improvement.

"Occurrences" are occasionally recorded in different collections, both in and out of Arctos, so cataloging Occurrences rather than individuals would make Arctos data more comparable with the rest of the world. I'm not sure how much weight that should carry, but again it is a consideration that should be addressed.

All of that said, I don't think Arctos can or should dictate how material is cataloged. I think the most we can do is to provide documentation/guidance.

This should extend beyond Arctos. A sample of http://arctos.database.museum/guid/MSB:Mamm:292063 stored in another system and shared with GBIF would ideally bear the same "individual ID" as the record(s) in Arctos. If it did, it would be trivial to assemble the individual in GBIF or similar systems.

The "danger" is in assigning the identifiers, and I don't believe there is any technical solution to that - it's a social problem that needs a social solution. It took seconds to find https://arctos.database.museum/guid/MSB:Mamm:317312 and https://arctos.database.museum/guid/MSB:Mamm:324187 which share a NEON ID and probably are not the same organism. I have never encountered a "number series" that didn't have similar issues, and if that exists the NEON ID cannot do what you want. I think this would be best implemented as GUIDs, and for social reasons those should probably not be minted by Arctos. Drawing those from an independent source would let Curators determine what is or is not an Individual on a case-by-case basis independent of any problems with identifiers assigned by other organizations, and at least maintains some possibility that other collections holding material from the same individuals would buy in and assign those IDs to their specimens. Two candidates are UUIDs, which would not be resolvable or actionable, or ARKs which could be resolvable and could point to some shared view (eg, GBIF, which in turn could point to the various bits and pieces of the individual in various systems/collections).

I think that also could be implemented only as guidance; I don't think Arctos can or should prevent someone from using "1" as an IndividualID, but we can help them understand the implications of doing so.

Jegelewicz · 2019-03-13T19:12:49Z

How would this not be denormalization?

organismID = Mexican Wolf Studbook Number 1216
organismID = Mex Wolf Studbook No 1216
organismID = Mexican wolf studbook number 1216

These are all the same organism, but now we have three IDs for it. If we have:

ORGANISM_ID where:

IDType = text “Mexican Wolf Studbook Number”

Description = definition of the IDType Studbook number assigned by the Mexican Wolf Recovery Program

BaseURI = http://arctos.database.museum/SpecimenResults.cfm?oidtype=Mexican%20wolf%20studbook%20number&oidnum=

At least we eliminate the problem of the many ways "Mexican Wolf Studbook Number" might be spelled.

I think this would be best implemented as GUIDs, and for social reasons those should probably not be minted by Arctos.

I agree with this statement - but no one is stepping up to the plate for biological specimens (at least no one I am aware of). While the solution above does not fix the problems of the world, it would be a start for Arctos collections and maybe we could use that to press the issue with the community.

I looked up ARKs and I'm not clear on how that works - if is a solution, then let's explore, but I need an example because it seems very fuzzy to me and doesn't solve the social problem as far as I can tell.

Jegelewicz · 2019-03-13T19:17:59Z

I believe that this is implicitly a proposal to recatalog http://arctos.database.museum/guid/MSB:Mamm:292063 as 5 specimens. At least for some use cases that goes against the "catalog the item of scientific interest" mantra; eventually two of the samples from the same wolf will be compared in a publication.

Yep - and the cataloging of separate events with one catalog number results in events and parts that are not properly associated with their accessions, their collectors and preparators, nor their attributes. (The event links are OK, but easily broken or incorrectly made).

Jegelewicz · 2019-03-13T19:48:26Z

Should OrganismIDs be a DOI?

dustymc · 2019-03-13T19:56:47Z

I'm still not following. You want another table that's the same structure and does the same thing as OtherIDs??

And yes those data are denormalized - that's a lot easier to deal with that denormalized structure, and one of many reasons a GUID of some sort would be a useful value.

There is no technical solution to social problems. We can make it enticing to assign unifying IDs, but that's about it.

ARKs are functionally much like DOIs, but they're free (and don't come with the buy-in, which I suspect means they also don't come with the persistence).

https://n2t.net/ark:/87299/x6d50k1v

If I a couple million dollars and nothing better to do, everything in Arctos would have a DOI. DOIs would be great "individialIDs" but I don't think I can supply them. And that would lead back into the whole "controlled by Arctos" thing, which I don't think has any chance of being adopted by anyone outside of Arctos. I can provide tools, but the folks who own these specimens should also own the unifying identifiers.

Jegelewicz · 2019-03-13T20:26:38Z

I'm still not following. You want another table that's the same structure and does the same thing as OtherIDs??

EXCEPT - those IDs would be passed to GBIF and other aggregators as "Organism_ID".

I have also considered just using a check box in the Other_ID table "this is an organism ID"....

dustymc · 2019-03-13T21:11:14Z

Thanks - I might actually get it now!

It's Arctos-centric and not very pretty, but at least it's not denormalization: http://arctos.database.museum/SpecimenResults.cfm?oidtype=Mexican%20wolf%20studbook%20number&oidnum=none is a perfectly valid value for other_id_type=OrgID (whatever we call it).

That could be generated by a "this is an orgid" button. I could even abstract it to a saved search or ARK, but that gets us back to the "Arctos-centric" thing.

And again, if the scope of this is just "works for Arctos" then I think we'd be better off doing something with relationships. (@tucotuco pointed out that an ID works from a spreadsheet where a relationship may not, so "something" might be generating a URL that finds ID=value as above - IDK, that's details, I'm totally open to ideas).

campmlc · 2019-03-13T21:44:01Z

" That could be generated by a "this is an orgid" button" - you mean in the code table, correct? Also, we would not want to see the "messy" http://arctos.database.museum/SpecimenResults.cfm?oidtype=Mexican%20wolf%20studbook%20number&oidnum=none in the display. We'd want to see "Organism ID: Mexican Wolf Studbook Number: 1216". possible?

…

On Wed, Mar 13, 2019 at 3:11 PM dustymc ***@***.***> wrote: Thanks - I might actually get it now! It's Arctos-centric and not very pretty, but at least it's not denormalization: http://arctos.database.museum/SpecimenResults.cfm?oidtype=Mexican%20wolf%20studbook%20number&oidnum=none is a perfectly valid value for other_id_type=OrgID (whatever we call it). That could be generated by a "this is an orgid" button. I could even abstract it to a saved search or ARK, but that gets us back to the "Arctos-centric" thing. And again, if the scope of this is just "works for Arctos" then I think we'd be better off doing something with relationships. ***@***.*** <https://github.com/tucotuco> pointed out that an ID works from a spreadsheet where a relationship may not, so "something" might be generating a URL that finds ID=value as above - IDK, that's details, I'm totally open to ideas). — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1966 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AOH0hCEHhRD5iBe6CGraaQvG4XAq94Duks5vWWl1gaJpZM4buGmY> .

dustymc · 2019-03-13T21:49:30Z

No, in the interface.

http://arctos.database.museum/SpecimenResults.cfm?oidtype=Mexican%20wolf%20studbook%20number&oidnum=none is a GUID - and an actionable one at that. There's only one of them on the planet and it's easy to tell what it does. (It's not very pretty and may or may not be very persistent, but that's details.)

Mexican Wolf Studbook Number: 1216 is a string. Anyone can use it for any purpose anywhere; it doesn't natively do anything, and trying to do anything with it comes with a big pile of indefensible assumptions.

Edit for completeness: https://n2t.net/ark:/87299/x68g8hqw currently does the same thing as http://arctos.database.museum/SpecimenResults.cfm?oidtype=Mexican%20wolf%20studbook%20number&oidnum=none. It's prettier and likely more persistent. If I find another Occurrence of "none" I could re-point the ARK to somewhere mutually agreeable (eg, GBIF) in order to build a more complete picture of the Organism. It's a MUCH better solution than the URL, but also likely to take more investment than clicking a button.

2nd edit: I'm throwing ARKs around only because they're not-Arctos and super easy to create. They're not the only possible GUID, just a convenient and functional example.

tucotuco · 2019-03-13T22:03:24Z

...not to mention that the indefensible assumptions would be distinct for every different id type, ergo not scalable.

…

On Wed, Mar 13, 2019 at 6:49 PM dustymc ***@***.***> wrote: No, in the interface. http://arctos.database.museum/SpecimenResults.cfm?oidtype=Mexican%20wolf%20studbook%20number&oidnum=none is a GUID - and an actionable one at that. There's only one of them on the planet and it's easy to tell what it does. (It's not very pretty and may or may not be very persistent, but that's details.) Mexican Wolf Studbook Number: 1216 is a string. Anyone can use it for any purpose anywhere; it doesn't natively do anything, and trying to do anything with it comes with a big pile of indefensible assumptions. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1966 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAcP68SCG5JP36cTtulJpcrF783lkU80ks5vWXJsgaJpZM4buGmY> .

Jegelewicz · 2019-03-13T23:15:11Z

Mexican Wolf Studbook Number: 1216 is a string. Anyone can use it for any purpose anywhere; it doesn't natively do anything, and trying to do anything with it comes with a big pile of indefensible assumptions.

I don't get how what you propose is different from:

IDType = text “Mexican Wolf Studbook Number”

Description = definition of the IDType Studbook number assigned by the Mexican Wolf Recovery Program

base URL = http://arctos.database.museum/SpecimenResults.cfm?oidtype=Mexican%20wolf%20studbook%20number&oidnum=

tucotuco · 2019-03-13T23:21:26Z

I had been thinking there would be only one allowed organismID. Maybe that is silly. Maybe it is fine to have as many as you like. That way you could include your own AND those of other collections (in or out of Arctos). That way you could also potentially go directly to GBIF to get the set of Occurrences for all matching organismIDs.

Jegelewicz · 2019-03-13T23:26:02Z

HMMMM..I hadn't considered that.

Maybe it is fine to have as many as you like. That way you could include your own AND those of other collections (in or out of Arctos). That way you could also potentially go directly to GBIF to get the set of Occurrences for all matching organismIDs.

BUT when searching AT GBIF, how would they be related - so that some person who was unaware the two organism IDs were the same organism could make the connection?

campmlc · 2019-03-13T23:27:43Z

We were discussing earlier how we could link specimens at MSB and AMNH and Collecion Boliviana de Fauna that are all part of the same animal. All share the same field number, they are all the same organism, but how would we relate them in GBIF if AMNH assigns one and MSB assigns a different one? Ideally, we'd use the shared field number as the core ID, or we'd pay for a doi.

…

On Wed, Mar 13, 2019 at 5:21 PM John Wieczorek ***@***.***> wrote: I had been thinking there would be only one allowed organismID. Maybe that is silly. Maybe it is fine to have as many as you like. That way you could include your own AND those of other collections (in or out of Arctos). That way you could also potentially go directly to GBIF to get the set of Occurrences for all matching organismIDs. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1966 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AOH0hEL5hhP7FTXoNDyscDnEluqXKJ11ks5vWYf3gaJpZM4buGmY> .

tucotuco · 2019-03-13T23:28:35Z

I don't get how what you propose is different from:

IDType = text “Mexican Wolf Studbook Number”

Description = definition of the IDType Studbook number assigned by the Mexican Wolf Recovery Program

base URL = http://arctos.database.museum/SpecimenResults.cfm?oidtype=Mexican%20wolf%20studbook%20number&oidnum=

It is very different outside the world of Arctos. The organismID would have to be constructed from this, and what would you do to create the organismIDs of the ten collections that have parts of the same plant? Create ten new ID types and base URLS (just to cover that one organism - multiply by all the collections that share any parts of any Organisms in Arctos)?

dustymc · 2019-03-13T23:36:07Z

different

It eliminates data stored in arbitrary places.

only one

Yea, I suspect reality will find a way to stomp all over that, but it would be nice....

link specimens

Arctos can link to anything with a URL, and provides a mechanism for incoming links.

shared field number

Everybody starts at "1." If you want links, you need actionable GUIDs. If you want discoverable, you need shared actionable GUIDs. You might get at "shared" by tracking down the other 40 samples in GBIF and adding their IDs to Arctos, although "here's a nice neutral persistent actionable identifier, would you mind using it so we can talk to each other?" would greatly simplify things.

tucotuco · 2019-03-13T23:38:20Z

All share the same field number, they are all the same organism, but how would we relate them in GBIF if AMNH assigns one and MSB assigns a different one?

I think that is what I am getting at in tdwg/dwc-qa#131 (comment)

tucotuco · 2019-03-13T23:39:44Z

Something akin to IGSNs, but for Organisms instead of for samples.

Jegelewicz · 2019-03-13T23:41:21Z

The organismID would have to be constructed from this, and what would you do to create the organismIDs of the ten collections that have parts of the same plant? Create ten new ID types and base URLS (just to cover that one organism - multiply by all the collections that share any parts of any Organisms in Arctos)?

I don't understand - you would only need one ID type. From any record in Arctos, I can click the link from the Mexican Wolf Studbook Number (no matter what number it is) and I'll get the specimen results page that show all of the wolves that share the same number.

If UTEP or UMNH or any other Arctos collection had a wolf specimen and put the studbook number in the "Mexican Wolf Studbook Number" other ID, then it would show up in the search too, because the link is an actionable guid like Dusty described.

It would be a social issue to decide upon an "ID Type" for the situation that you describe, but we should only need one. The challenge - as I pointed out in the very beginning is assigning the individual organism ID numbers, so that all collections with parts of the same plant would use "Individual Plant ID" = 1, etc.

I guess I am missing something (which doesn't surprise me...) The wolves are easy because they are all here and they have a (somewhat) logical identifier. Everything else will be messy until we have a unique BOI (Biological Organism Identifier).

campmlc · 2019-03-13T23:41:29Z

In all of these situations, there is a shared organism number already that links specimens. Examples currently in use within Arctos and between Arctos and outside collections (AMNH, USNM) are Mexican Wolf Studbook Number, NK number, AF number, Robert L. Rausch collector number, NEON individual ID. These are used to find and create relationships. The problem with relationships is that relationships are pairwise - we need a way to reciprocally link a network, and organism ID would allow us to do that - like the url link to the above IDs allows us to do that now within Arctos. Can we mint DOIs or IGSNs?

…

On Wed, Mar 13, 2019 at 5:28 PM John Wieczorek ***@***.***> wrote: I don't get how what you propose is different from: IDType = text “Mexican Wolf Studbook Number” Description = definition of the IDType Studbook number assigned by the Mexican Wolf Recovery Program base URL = http://arctos.database.museum/SpecimenResults.cfm?oidtype=Mexican%20wolf%20studbook%20number&oidnum= It is very different outside the world of Arctos. The organismID would have to be constructed from this, and what would you do to create the organismIDs of the ten collections that have parts of the same plant? Create ten new ID types and base URLS (just to cover that one organism - multiply by all the collections that share any parts of any Organisms in Arctos)? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1966 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AOH0hPkdJpf-GdEBgmOtXbRz8iLU1X5Bks5vWYmkgaJpZM4buGmY> .

dustymc · 2019-03-13T23:52:28Z

organisms, mint compliant ID

Don't half-bake this! - I want those for events, localities, agents, .... too.

Seriously, Arctos is built to plug in to something like that. If we have a local identifier for something it's only because nobody else would do it for us.

relationships are pairwise

Not really - there's always an implied second THING out there, but we don't have to be able to find it. "{whatever relationship of} ABC:XYZ:1234" is fine even if ABC:XYZ isn't online, "{whatever relationship of} NK 1" is fine even if 40 specimens (that we can find) wear "NK 1", etc.

reciprocally

I don't think a lack of reciprocity will ever be Arctos' fault.

I know many of your examples are not capable of acting as unique identifiers, and I suspect that's true of all of them.

Can we mint DOIs

Yes, in limited quantities - there are "get a DOI" links scattered all over the place.

IGSNs

Beats me - if they have a service and are willing to provide access we should be able to.

We could also mint ARKs in unlimited quantities if there's a reason to do so.

Jegelewicz · 2019-03-14T00:01:50Z

relationships are pairwise

Not really - there's always an implied second THING out there, but we don't have to be able to find it. "{whatever relationship of} ABC:XYZ:1234" is fine even if ABC:XYZ isn't online, "{whatever relationship of} NK 1" is fine even if 40 specimens (that we can find) wear "NK 1", etc.

But we WANT to find it! 40 fish with "same lot as" requires 39 relationships on all 40 records and then I have no easy way to see them all in one place (or I just don't know how to do it). In the same way - 20 events of blood samples from Mexican wolf studbook number 1216 requires 19 relationships on 20 records (and a relationship needs to be added to ALL of them every time a new set of samples comes in! It is a lot of work....

campmlc · 2019-03-14T00:04:34Z

We have litters of pups that are siblings of each other, offspring of two parents, and parents of other litters. Each of these individual organisms in turn may be handled multiple times over their lifetime resulting in multiple catalog numbers of different accessions of parts, potentially at different institutions. We need organism IDs to deal with the latter, and relationships that can deal with the former.

…

On Wed, Mar 13, 2019 at 5:52 PM dustymc ***@***.***> wrote: organisms, mint compliant ID Don't half-bake this! - I want those for events, localities, agents, .... too. Seriously, Arctos is built to plug in to something like that. If we have a local identifier for something it's only because nobody else would do it for us. relationships are pairwise Not really - there's always an implied second THING out there, but we don't have to be able to find it. "{whatever relationship of} ABC:XYZ:1234" is fine even if ABC:XYZ isn't online, "{whatever relationship of} NK 1" is fine even if 40 specimens (that we can find) wear "NK 1", etc. reciprocally I don't think a lack of reciprocity will ever be Arctos' fault. I know many of your examples are not capable of acting as unique identifiers, and I suspect that's true of all of them. Can we mint DOIs Yes, in limited quantities - there are "get a DOI" links scattered all over the place. IGSNs Beats me - if they have a service and are willing to provide access we should be able to. We could also mint ARKs in unlimited quantities if there's a reason to do so. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1966 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AOH0hM3K93lKilwho96shrOC8Z0_2qe0ks5vWY89gaJpZM4buGmY> .

dustymc · 2019-03-14T00:26:24Z

easy way to see them

That's an interface problem.

a relationship needs to be added to ALL of them every time a new set of samples comes in!

That MAY be an interface problem too - eg, MAYBE I could just magic in reciprocals instead of the email. Not much problem technically, but there are social implications.

40 fish

That does occasionally happen, but more normal is a coyote, a beaver, 3 mice (all because the printer stuck), and all of their parasites (for reasons that don't make much sense to me).

siblings

There's an Issue somewhere about making inferences from relationships - also just a display problem.

organism IDs to deal with the latter, and relationships that can deal with the former.

Yea, there's some overlap that I don't think we can avoid. I think we need both anywhere we can - orgID is useless unless all of the bits are accessible, and relationships can't be used to find all the bits in places like GBIF. I'm not real happy with that, but I think it's reality.

ccicero · 2021-05-24T17:26:55Z

OK, I'll be there at noon.

I'm not doing something right. Here is a record with two events:
https://arctos.database.museum/guid/MVZ:Bird:193195

I created an observational record for the second event:
https://arctos.database.museum/guid/MVZObs:Bird:4777

and selected for both a 'Organism ID' identifier
https://arctos.database.museum/entity/0709-02237

(manually entered the URL which I'm sure is not correct, but I didn't see a base URL in the code table)

When I click on the Organism ID link, I get "Entity not found! Please let us know what happened."

dustymc · 2021-05-24T17:33:41Z

"Entity not found!

You didn't create one.

https://handbook.arctosdb.org/documentation/entity.html

I did this for you:

nope not there so

and now you have the bare minimum.

The next step would (ideally - this is now functional) be to add the components.

Then clicking "pull" and accepting whatever it says would add some discoverability.

ewommack · 2021-05-24T23:21:29Z

@Jegelewicz was amazing and added the office hours to the calendar.
Do we want any note or explanation @dustymc?
"Dusty's Office Hours are discussions with Dusty on specific problems and production developments in Arctos. Come join the conversation and help us figure out how to make Arctos better"

dustymc · 2021-05-24T23:36:18Z

Thanks!

I'm up for anything. I'll probably be more useful with some warning, I think we can/should prioritize if someone wants to schedule a topic, otherwise just see what happens?

ewommack · 2021-05-25T00:04:05Z

I'll probably be more useful with some warning, I think we can/should prioritize if someone wants to schedule a topic

How about:
"Dusty's Office Hours are discussions with Dusty on specific problems and production developments in Arctos. Suggest a topic ahead of time in GitHub, or just come join the conversation and help us figure out how to make Arctos better"

dustymc · 2021-05-25T15:39:26Z

From meeting:

clarify search before create functionality
Search is one field, hits everything possible, has usage hint
show derived data (component IDs and such) in some less-central way

Changes

entityID is assigned by Arctos; you get what you get and don't have a fit
entity description (new field in table entity, required, editable, @campmlc will write documentation)
pull is automagic
manage_collection is required to create/edit

Unresolved:

show more dynamic view in search result
DO NOT show more dynamic view in search result

It's less-dynamic for now, not sure we have the CPU to pull everything in anyway. Looking forward, this needs to (theoretically) work for hundreds (zoo critters have a rough life) if not thousands (GPS collar, maybe) of components, which probably demands separate search results and 'details' views.

Needs further discussion:

Entities are but one option for Organism ID, and therefore the code is "Entity-centric." Organism ID can be exported from Entities to catalog records, but Entity ID cannot be exported/created from catalog records. I suggest that this is sufficient; Entities are "super objects" that only need exist when there's something additional to say. If the only goal it a common identifier for Organism ID, there are many options which do not involve Entities (bird banding lab numbers, for example). Entities are "better" identifiers, and making sure that they are in fact "better" requires a small amount of focus.

Yea But Anyway:

Consider something in SpecimenResults-->Manage-->Add All Records to {pick an entity}

I think I'm comfortable with this to ADD, not so sure about CREATE

Needs Clarification

re: "bird banding lab numbers, for example" above: There is confusion around this point, it needs clarified somewhere. A number may/should be used in multiple types, because those types convey different information and have different functionality. For example, to use a BBL number as an Organism ID, the following should be entered (assuming BBL was an OtherID Type in Arctos):

BBL: 12345
Organism ID: BBL 12345

The BBL number supports "find records with a BBL number" (and perhaps value, but free-text fields aren't very good at that), and potentially (should BBL come online) can serve as a link to external resources or additional data.

The Organism ID serves as an Organism ID; it's an identifier that spans multiple Occurrences and links them together as one THING. In this case that link is dependent on users being consistent (eg, not using Organism ID: BBL{nospace}12345 in one of the involved records), and should be recognized as having limited scope (somewhere on the planet, there's probably an unrelated, perhaps even similar, "BBL 12345.") There's no realistic way for machines to determine if BBL{nospace}12345 and BBL 12345 should be the same thing; error detection requires (patient) humans.

Entities (of type Organism) serve the same purpose; they're linking identifiers. They differ in two significant ways:

There's a verifiable "correct" format; identifiers issued by Arctos behave differently than those which were not (eg, typos).
The Entity can carry data of its own, and this data can be used in things like error detection.

tl;dr: Any string can serve as Organism ID, but some can DO THINGS that others cannot.

Bulk Tools:

MSB's biopark data is recent and decent, but should have enough problems to be interesting. Try to make and "componentize" Entities from it, with a view towards developing bulk tools. (This may address any gaps left by the entity-centric approach described avove.)

Reports:

See if the stuff from edit entity (components don't use entity ID, records using entity ID aren't components) can be made into reports and/or bulkloaders.
All entities should have components or preferred entity ID

Possibilities:

Rather than Export, we could write to the ID loader with status=autoload

Yay: one click instead of ~4
Not so yay: Fixing the giant messes that approach is capable of creating could be a tremendous amount of work (which usually means it'll never happen, and then nobody will use this because it's all a giant mess). Suggest the small amount of review required to manually use the loader is well invested.

"Reports" above has the same implications; we could save a few minutes by automating, which might then require much more than a few minutes to fix the giant mess which could result from a relatively minor error.

@campmlc @Jegelewicz @ccicero what'd I miss/mangle?

dustymc · 2021-05-25T21:26:00Z

There's some new stuff in test, https://handbook.arctosdb.org/documentation/entity.html#the-process-v2 documents creating http://test.arctos.database.museum/entity/2

Questions:

What should I auto-pull into Entity Assertions from catalog records; what data might lead someone to an existing Entity and prevent them from creating a duplicate?
What should I dynamically pull on the detail page; what's useful there?

ewommack · 2021-05-26T04:45:50Z

Not sure if this will be helpful, but here are several references for BBL bands: https://www.usgs.gov/centers/eesc/science/about-federal-bird-bands?qt-science_center_objects=0#qt-science_center_objects

BBL bands always have two sets of numbers XXXX-XXXX or XXXX-XXXXX. The first string relates to the size of the band, and the second string is in sequence numerically assigned to individual banders. I can't find a reference for the numeric codes for the different sizes, but I'm sure it exists somewhere. I could dig deeper if you need me to.
They keep strong track of which of us has which bands, because as you can guess mistakes get made all the time. That way they know who to poke/yell at if a warbler band comes back being reported on a Red-tailed Hawk.

dustymc · 2021-05-26T14:05:16Z

Thanks. Nothing can really change how unresolvable strings work, but entities could serve as a place to gather identifiers - the Entity itself can hold all the variations that might be found in GBIF-n-such (BBL:XXXX-XXXX; BBL XXXX-XXXX; XXXX-XXXX, XXXXXXXX, etc., etc.) and that has some possibility of leading users to those records if they find the Arctos record.

Jegelewicz · 2021-05-26T14:57:19Z

What should I auto-pull into Entity Assertions from catalog records; what data might lead someone to an existing Entity and prevent them from creating a duplicate?

Identification (taxon)
All other identifiers
Attributes (of the catalog record item)

dustymc · 2021-05-28T22:32:55Z

Latest is in production, I rebuilt the two Entities I could, old data is in arctos-assets.

Jegelewicz · 2021-05-28T22:35:56Z

Sorry I haven't worked on this - I've been busy cleaning ichnotaxa and part names.....

dustymc · 2021-05-28T22:37:04Z

I think we've all had our distractions lately!

campmlc · 2021-05-28T23:37:48Z

Yes, I can't wait to try these out. Any way to do a mass entity bulkload for ABQ Biopark?

…

On Fri, May 28, 2021, 4:37 PM dustymc ***@***.***> wrote: * [EXTERNAL]* I think we've all had our distractions lately! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1966 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADQ7JBAW3LBY3PLFLIFOSYTTQALJZANCNFSM4G5YNGMA> .

Jegelewicz · 2021-08-05T20:54:13Z

One problem with the "multiple events for a cataloged organism" model. This one, where NONE of the parts are associated with any one of the 12(!) events.

I can tell you that at GBIF and iDigBio, each of the 12 occurrences includes all 28 parts, which is pretty misleading. Here is one of the GBIF occurrences: https://www.gbif.org/occurrence/1300283344

Also, ALL media are associated with ALL occurrences at GBIF, again misleading. This is sort of true at iDigBio as the "associated media" field links up with a search of media by the catalog number (at least I think that is what is happening) although this link has 9 results and there are 10 images at GBIF).

How does this stuff look at GGBN? Interestingly enough, I was unable to find any Canis lupus baileyi at all through their search page! @campmlc you may want to follow up on why this is so. I did find Canis lupus baileyi x Canis familiaris

I notice that GGBN results include this:

78 records found (unique samples, not counting multiple samples from the same specimen).

Well if all of the samples for this "specimen" get narrowed down to just one vial of blood in search results, then people would be missing out on the "over time" component of the sampling. Not to mention the fact that there may be more than one kind of sample (hair, blood, serum). HOWEVER, there's this

so what exactly is a "specimen"?

If I were someone looking in on this, it just looks a big pile of things and I don't have the time or inclination to sort it out amongst the 4 different resources (Arctos, GBIF, GGBN, iDigBio). The information for one cataloged item should really not look so incredibly different in all of these resources. Some of that is on the resources, but some of it is on us.

Sorry for this, but I am looking into issues related to MaterialSample and as I was researching, I fell into this rabbit hole. I wanted to document it so when the time is right I can return to it.

Jegelewicz · 2021-08-05T21:43:12Z

And extra infuriating is this. It looks like GGBN takes all of our individual "occurrences" and mashes them together.

See https://www.ggbn.org/ggbn_portal/search/record?unitID=MSB%3AMamm%3A255471&collectioncode=Mamm&institutioncode=MSB

WHY do we have to split everything up for them? I don't understand how they couldn't take the data at GBIF and parse it into the separate "samples".

https://www.gbif.org/occurrence/1229671489

And why in blazes are there only three samples when the "preparations" clearly show 6?

AND the individual samples don't even show what they are?, just "tissue"

campmlc · 2021-08-05T22:25:57Z

Great observations - we need a designated discussion on this.
@jldunnum

Jegelewicz · 2021-08-05T22:29:08Z

I'd really like you guys to look at some of your stuff in all the various portals and think about what is happening!

Jegelewicz added Function-Relationship Function-CodeTables Aggregator issues e.g., GBIF, iDigBio, etc labels Mar 13, 2019

Jegelewicz added this to the Needs Discussion milestone Mar 13, 2019

This was referenced Jun 25, 2021

Do we need directly-attached stable part identifiers? #3630

Closed

Feature Request - Use Manage in search results to add all results to an entity #3685

Closed

dustymc mentioned this issue Aug 5, 2021

change help text at top of Agent Search page #3813

Closed

Jegelewicz mentioned this issue Aug 24, 2021

rebuild ggbn stuff #3699

Closed

Jegelewicz mentioned this issue Oct 4, 2021

Other Deliverable - BasisOfRecord review tdwg/material-sample#11

Closed

dustymc closed this as completed Jan 19, 2022

Jegelewicz added this to Arctos Entity Nov 3, 2023

github-project-automation bot moved this to To do in Arctos Entity Nov 3, 2023

Organism ID #1966

Organism ID #1966

Comments

Jegelewicz commented Mar 13, 2019

Jegelewicz commented Mar 13, 2019

Jegelewicz commented Mar 13, 2019

dustymc commented Mar 13, 2019

Jegelewicz commented Mar 13, 2019

Jegelewicz commented Mar 13, 2019

Jegelewicz commented Mar 13, 2019

dustymc commented Mar 13, 2019

Jegelewicz commented Mar 13, 2019 • edited Loading

dustymc commented Mar 13, 2019

campmlc commented Mar 13, 2019 via email

dustymc commented Mar 13, 2019 • edited Loading

tucotuco commented Mar 13, 2019 via email

Jegelewicz commented Mar 13, 2019 • edited Loading

tucotuco commented Mar 13, 2019

Jegelewicz commented Mar 13, 2019

campmlc commented Mar 13, 2019 via email

tucotuco commented Mar 13, 2019

dustymc commented Mar 13, 2019

tucotuco commented Mar 13, 2019

tucotuco commented Mar 13, 2019

Jegelewicz commented Mar 13, 2019 • edited Loading

campmlc commented Mar 13, 2019 via email

dustymc commented Mar 13, 2019

Jegelewicz commented Mar 14, 2019

campmlc commented Mar 14, 2019 via email

dustymc commented Mar 14, 2019

ccicero commented May 24, 2021

dustymc commented May 24, 2021

ewommack commented May 24, 2021

dustymc commented May 24, 2021

ewommack commented May 25, 2021

dustymc commented May 25, 2021

From meeting:

Changes

Unresolved:

Needs further discussion:

Yea But Anyway:

Needs Clarification

Bulk Tools:

Reports:

Possibilities:

dustymc commented May 25, 2021

ewommack commented May 26, 2021

dustymc commented May 26, 2021

Jegelewicz commented May 26, 2021

dustymc commented May 28, 2021

Jegelewicz commented May 28, 2021

dustymc commented May 28, 2021

campmlc commented May 28, 2021 via email

Jegelewicz commented Aug 5, 2021

Jegelewicz commented Aug 5, 2021

campmlc commented Aug 5, 2021

Jegelewicz commented Aug 5, 2021

Jegelewicz commented Mar 13, 2019 •

edited

Loading

dustymc commented Mar 13, 2019 •

edited

Loading

Jegelewicz commented Mar 13, 2019 •

edited

Loading

Jegelewicz commented Mar 13, 2019 •

edited

Loading