Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Organism ID vs Entity ID vs Agent #3765

Closed
campmlc opened this issue Jul 23, 2021 · 119 comments
Closed

Organism ID vs Entity ID vs Agent #3765

campmlc opened this issue Jul 23, 2021 · 119 comments

Comments

@campmlc
Copy link

campmlc commented Jul 23, 2021

I just went through the process of creating an entity for the first time, which was excrutiating. I have no idea if I did anything correctly. https://arctos.database.museum/entity.cfm?action=edit&entity_id=4
The process is opaque, and even if documentation were present this is an incredibly complex operation.
After I created the entity, I added the entity IDs (as url - this is not clear) to each record as an organism ID.
I could then click the entity and see all the records, whew.
Then, I experimented with the same thing by creating an agent for the same animal, "Bernice". She was a chimp at the ABQ zoo.
I added all her IDs as akas to her agent profile.
I added her as an agent to each record (as "subject").
Much easier, much clearer, no need for reciprocally connecting anything.
And voila, all the specimen records and identifiers show up in a single, clear dashboard on her agent page.
I strongly suggest, yet again, that we incorporate the agent model for use with organisms. It works and delivers precisely what we need.

@campmlc
Copy link
Author

campmlc commented Jul 23, 2021

To minimize confusion, changed preferred name to "Bernice Pan troglodytes" to distinguish from any other Bernices out there.

@Jegelewicz
Copy link
Member

I changed Bernice's agent type to other agent.

@Jegelewicz
Copy link
Member

Here is a project for Bernice.

What can it do that an agent can't?

  • no birth death date (unless we use start and end for that, which isn't explicit, but sorta works)
  • no relationships to parents, siblings, children (these could be manually created as related projects)
  • Accessions for an organism would need to include ONLY that organism, probably NOT realistic
  • other identifiers need to be added to every single catalog record manually

What can and Agent do that a project can't?

  • explicit birth and death dates
  • explicit relationships to other agents
  • explicit identifiers as akas, no need to add them to every single catalog record
  • addresses (zoo and experimental animals move around)

What other ways should we evaluate this idea?

@campmlc
Copy link
Author

campmlc commented Aug 6, 2021

We discussed yesterday in intern meeting - (I also changed type to other agent - not sure why it didn't save).
@mkoo @KyndallH

@Jegelewicz
Copy link
Member

I had to remove "first name" as other agents cannot have a first name.

@Jegelewicz
Copy link
Member

@dustymc has always said we should just catalog organisms. We had some discussion about organisms during the TDWG MaterialSample Task Group meeting yesterday. In order to create an organism ID GUID, the Field museum creates a new "specimen-less" catalog record. Given this and all of our discussions, here is what I propose, because I think we can do slightly better for our community.

  1. We create a community managed collection. (Arctos:Entity)
  2. Arctos:Entity will use the Teach collection code
  3. Arctos:Entity will be managed by those with manage_code_table and requests for new entities will be handled the same way code table requests are
  4. Records in Arctos:Entity will ALWAYS be part-less
  5. GUIDS from Arctos:Entity can be used as values in other identifier = Organism ID
  6. We create a few new event types: birth and death so that we can record these events for organisms when they are known

The reason I propose this special collection for organisms is that it will help prevent the creation of duplicate identifiers in Arctos for any given organism. It also allows us to share the burden of keeping up with them and will not impose additional fees for collections doing the work to connect things. I have settled on "Entity" because cultural collections may have use for this as well - to bring together various parts of a set and so on.

Over time, I think we will find opportunities to seed catalog records from Arctos:Entity and to add data to Arctos:Entity from other catalog records, but we can work those out as they become apparent and useful. We will likely need some rules about when an Arctos:Entity record should be created, but I think they can be fairly simple.

What about this is nuts? I'm sure something is!

@Jegelewicz Jegelewicz modified the milestones: Tabled, Needs Discussion Nov 18, 2021
@campmlc
Copy link
Author

campmlc commented Nov 18, 2021 via email

@Jegelewicz
Copy link
Member

How would we integrate this with other platforms and GBIF?

By sending the url for the Arctos:Entity record as the Organism ID for the catalog record item. This is exactly what Field Museum does. We would NOT transmit anything in the Arctos:Entity collection to GBIF (yet). Eventually, we may want to send that information in its own kind of Darwin Core Archive, but for now, it would remain with us (although everyone could see it through the Organism ID link).

@dustymc
Copy link
Contributor

dustymc commented Nov 18, 2021

@dustymc has always said w

No, the data themselves are and have been doing that. (I suppose it's too late to switch my title from "data janitor" to "Speaker for the Data"?)

community managed collection

I have been saying bigger is better - something bigger than Arctos would be better, but if we must do this then this seems as good as it gets.

managed by those with manage_code_table

This seems entirely unnecessary, I'm not sure what the goals of such restrictions would be, but it's also "details" that can be easily adjusted at any time using familiar tools.

ALWAYS be part-less

I don't get that either (but it's also just more details). Why not accumulate parts? "This critter has blood samples in these 48 places that we know of, none of them have useful public data" seems incredibly useful, condition ("we have no idea") and disposition ("not here") handle the details.

GUIDS from Arctos:Entity can be used as values in other identifier = Organism ID

and

"Entity" because cultural collections

  1. Yes, a GUID in the whatever-its-called collection is just an identifier capable of carrying the kinds of data that keep being mentioned, the collection needs a name but the identifier itself can be used for WHATEVER. "Organism ID" is one of those.
  2. Yep, "a whole bunch of people ran off with chunks of this thing" can't be too rare. (But maybe any of those that anyone cares about have inherent names - I've got a chunk of The Berlin Wall, it wasn't the only wall around and I don't have any fancy numbers scribbled on it but everybody knows what it is anyway.)

few new event types

Attributes - events are interactions with humans, Attributes are - well, Attributes. (I was wondering how we don't have this, turns out we do but we call it "numeric age" and the collecting event after "verbatim preservation date." Might be a good opportunity for some reconciliation.)

prevent the creation of duplicate identifiers in Arctos for any given organism

I think that's an impossible (albeit worthy) goal. The printer stuck, 12 XYZ pages got printed, now there are 273 things that say "XYZ123" on them out there. Some sort of filter seems very useful, but starting out with the idea that you can do something that you won't actually be able to do will be frustrating. Duplicates WILL be created, and this data object is capable of doing something about it: https://handbook.arctosdb.org/documentation/catalog.html#recataloging-specimens

share the burden

Not if doing so requires manage_codetables. This would just be a collection, you can grant access to anyone who can demonstrate that they understand how to use it.

need some rules

I'll again advocate for openness/inclusiveness. If the bar is very high then this just won't get used. Definitely rules and guidelines are needed, but I think they should generally encourage new "entities" when there's some question - "this seems like something that might have a chunk cataloged elsewhere" would be a HUGE benefit for some future researcher who might be willing to dig around in the collection and maybe make everyone's data better.

Anyway, all details, I still don't see a better approach or more appropriate data object for this.

How would we integrate this with other platforms

Such as?

and GBIF

AFAIK they don't accept Organisms but they do Occurrence-->OrganismID - we'd just give them what they can handle.

@Jegelewicz
Copy link
Member

I have submitted a prospective collection request so we can discuss there.

@dustymc
Copy link
Contributor

dustymc commented Nov 18, 2021

discuss there

I don't know where "there" is - should I?

@Jegelewicz
Copy link
Member

I don't know where "there" is - should I?

It will become an issue in the New Collections repo, which is where we make decisions about incoming stuff...

@ewommack
Copy link

ewommack commented Dec 9, 2021

AWG at 9 Dec 2021 meeting agreed to try it out.

@Jegelewicz
Copy link
Member

Please review the project - https://github.com/ArctosDB/new-collections/projects/59

@ccicero
Copy link

ccicero commented Dec 22, 2021

@Jegelewicz @dustymc @campmlc

I don't need if this is the right place to post this, but I just finished uploading MVZObs:Bird records that go with MVZ:Bird records (= same organisms = same entity) - with reciprocal relationships 'same individual as'

Here is the saved search.

This will be a good test case once the new 'entity' collection is set up.

@campmlc
Copy link
Author

campmlc commented Dec 22, 2021 via email

@Jegelewicz
Copy link
Member

@dustymc @campmlc @ccicero

The Arctos Entity Collection is live! Before we start writing up procedures and such, I'd like to create a couple of entities so that we can review and talk about how it should work for the community.

I was going to test with Kianga and then one pair of Carla's bird/observation records.

Is everyone OK with me doing that?

@Jegelewicz
Copy link
Member

MVZ Bird test
https://arctos.database.museum/guid/MVZ:Bird:193195
https://arctos.database.museum/guid/MVZObs:Bird:4777

@campmlc
Copy link
Author

campmlc commented Jan 7, 2022 via email

@ccicero
Copy link

ccicero commented Jan 7, 2022

Awesome Teresa, and go for it. Thanks!

Mariel - with regard to your question, we only will be getting the blood samples, no carcasses (even if the bird died). So the data in Arctos are what we have, and I don't expect more for those records.

@dustymc
Copy link
Contributor

dustymc commented Apr 4, 2022

something about clone/entity accessions - allow pick, pop up a warning, ??

  • would also be useful in general, parasite-host etc.

add:

  • ignore entities checkbox default checked on specimensearch
  • some sort of text in specimenresults - idea is ~"this is an entity record it might not be what you expect" @ccicero to provide text - tentative "click GUID to view components"

instead: on results that includes entities, add a big verbose explanation somewhere (or link to docs or something)

would be cool:

find entities, then find all records that use found entityIDs as OrganismID


new approach

  • catalog the bits, don't worry about entities/individuals/anything but the bits
  • run them through some new tool that builds and adds Entities (via ZIM number, band ID, etc.)
    • eventually: consolidate - all 3 elephant samples from one individual at same time become one record with 3 parts under an Entity, plus redirects for the 2 that get deleted

@ccicero can you send me a bulkloader file of your "entity components" so I can play with the tool at test? Or @campmlc if your elephants are in test (or you have the bulk file) that could work too.

@Jegelewicz
Copy link
Member

Carla’s Questions today

  1. Entities need their own accession - but cloning adds them to the original accession.
  • Dusty will mess around to see if we can create an accession while cloning or add to accession while cloning
  • Also useful for parasite/hosts cloned from each other (belong in different collections with different accessions)
  • They need to look special in search results
  • Dusty suggests adding static text to indicate what they are “Click GUID to view components”
  1. They are confusing in search results
  • Include check box “include entities” but default is no
  • Mariel thinks they should be visible because people won’t know what checking/unchecking the box means/does
  • Dusty - maybe we need more than one UI depending upon use - Mariel pretty much agrees
    “ - Your results include entities - do you want to include entities? Checkbox, remove entities
  1. Can we see something in related items that shows all components?
  • Once we figure out other id metadata
  • Need new relationships (component of, has component)
  • Magic in all related ids?
  1. Could we see all components from multiple entities on a map?
  • Get your entities
  • Need a path to get all components of those entities
  1. We need an entity creation tool (use a bird band or other unique “Organism Id” to create an entity and associate all components.
  • Dusty likes and is going to work on this

@Jegelewicz
Copy link
Member

We could also register our entity domain on http://bioguid.org

@ccicero
Copy link

ccicero commented Apr 4, 2022

  1. Entities need their own accession - but cloning adds them to the original accession.

FYI - I just fixed the one bluebird record so it's now in Arctos:Entity accession 4. I'll add the total # records once I figure that out.

  1. They are confusing in search results
  • Include check box “include entities” but default is no
  • Mariel thinks they should be visible because people won’t know what checking/unchecking the box means/does
  • Dusty - maybe we need more than one UI depending upon use - Mariel pretty much agrees
    “ - Your results include entities - do you want to include entities? Checkbox, remove entities

Update: instead of check box on Search page, we decided to include them in search but have something on the results page that says something like "Your results include entity records [with link to documentation for what an entity is]. Check box to remove from results set." [where with one click you can remove all entity records]>

  1. We need an entity creation tool (use a bird band or other unique “Organism Id” to create an entity and associate all components.

This would be AWESOME! I added the bird band number to Arctos:Entity:33 and will wait for Dusty's magic tool before creating new entity records for the remaining bluebirds.

@dustymc
Copy link
Contributor

dustymc commented Apr 4, 2022

http://bioguid.org/

Excepting social problems like #4200, Arctos identifiers are generally born actionable and linked via "how the internet works" - what could an additional registry DO for us?

@ccicero see #3765 (comment) - some sort of data in test would be very useful (maybe necessary) for this, do you happen to have the record bulkloader hanging around?

@Jegelewicz
Copy link
Member

Jegelewicz commented Apr 4, 2022

what could an additional registry DO for us?

People are terrible at using https://arctos.database.museum/guid/Arctos:Entity:13 BUT if they had Arctos:Entity:13 and plugged it into BioGUID (and we had registered our domain there), they could find that they are missing the https://arctos.database.museum/guid/ part of an actionable thing.

BUT it does bring up a good point, because ALL colelctions registered there from Arctos would use https://arctos.database.museum/guid/ as their "dereference service prefix" and if someone only as MSB Mamm 5000, they will never get where they need to be.

Just for grins

I put MSB:Mamm:5000 into the BioGUID search, and I got the GBIF occurrence record
image

If you Google MSB:Mamm:5000 you get some GBIF and iDigBio stuff
image

Can we make it so that searching MSB:Mamm:5000 gets the Arctos record into the search results?

@dustymc
Copy link
Contributor

dustymc commented Apr 4, 2022

People are terrible at

Because we don't and that leaks, see #4200 (and loan instructions and etc etc etc.) This is a social problem and those seldom have satisfactory technical solutions. We're currently demanding cruddy data, and that's what we get.

Arctos:Entity:13

That's probably on the "computers might actually figure it out" end of the spectrum, but it's still a string - anyone can create or use it for any reason, any number of times, and then you can never be truly sure they think it's what you think it might be. We have good identifiers, we just refuse to use them.

Can we make it so that

Given enough resources maybe we could get Google to do better, but the real answer will always revolve around Google doing whatever Google wants to do.

So, maybe bioguid could be useful, but only if we refuse to embrace real IDs (and who is in that group? Hopefully nobody!).

ANYWAY - there's magic in next release, should go out in an hour or so, go break it at test before making a giant mess in production please! I poked around at some elephants and a mouse and can't break it.

@dustymc
Copy link
Contributor

dustymc commented Apr 4, 2022

Your results include entity records [with link to documentation for what an entity is]. Check box to remove from results set." [where with one click you can remove all entity records]>

Done-ish, but the current documentation isn't great so no link.

@Jegelewicz
Copy link
Member

Magic

Looks cool and a great way to boot your identification stats! :-)

image

Any way we can check for dupes an keep people from making messes?

@Jegelewicz
Copy link
Member

Your results include entity records

Can we move that closer to the results? I don't think people will notice it up at the very top - I think most tend to scroll past the search parameters right down to the results.

image

@Jegelewicz
Copy link
Member

the current documentation isn't great so no link.

Where is the best place for this? I'll add to my to-do list but want to make sure I get it in the right place.

@dustymc
Copy link
Contributor

dustymc commented Apr 5, 2022

Magic

That's old magic. (And no, a bunch of these are the same data, if someone wants to import them for some reason they may consider that to be a feature. Current guidance is "don't use that button at all" but users can make whatever messes they want....)

New multirecord magic is here:

Screen Shot 2022-04-05 at 7 55 23 AM

(And FYI select display_value from coll_obj_other_id_num where other_id_type ='NK' group by display_value having count(*) > 1 will find ~15K potential entities to play with, they're messy enough that I doubt I could script many if I had to actually care about the data but they're convenient targets in test.)

move that closer to the results?

I suppose it could be anywhere, but it's annoying and I'm not convinced it's necessary at all so it's up mostly-sorta out of the way.

best place

https://handbook.arctosdb.org/documentation/entity.html

@ccicero
Copy link

ccicero commented Apr 5, 2022

@dustymc I won't be able to work more on this until after my vacation (gone 4/7-4/16). What do you need from me? I have the bulkloader from the eagle data, but haven't done it yet for the bluebirds. Do you need me to send you the eagle bulkloader?

@dustymc
Copy link
Contributor

dustymc commented Apr 5, 2022

@ccicero I found some elephants to play with, I think the tool is happy, have fun!

@campmlc
Copy link
Author

campmlc commented Apr 5, 2022 via email

@dustymc
Copy link
Contributor

dustymc commented Apr 25, 2022

Add to entity magic form:

  • option to exclude singletons
  • exclude records that already have OrganismID (or "pick an ID")?? Looks like this is going to happen in small batches, build with that in mind.

add docs: be clear when the tool works, when something else is necessary, suggest what that might be

disallow nature of ID pick, just use relationship

@dustymc
Copy link
Contributor

dustymc commented Apr 25, 2022

@ccicero I snuck results/manage/entity magic/requery into prod.

@ccicero
Copy link

ccicero commented Apr 25, 2022

@dustymc cool, thanks! I'll start playing around.

Meanwhile, here is an entity record for an owl where we have the skeleton and AMNH has the skin:
https://arctos.database.museum/guid/Arctos:Entity:33
with component
https://arctos.database.museum/guid/MVZ:Bird:160626

I re-purposed that entity # from a bluebird because I want all the bluebirds to be consecutive.

All good (I think but please check) except for the map. There are no coordinates in the component record, but the map is showing a point off the west coast of Africa. The locality is in Thailand. ????
image

@dustymc
Copy link
Contributor

dustymc commented Apr 25, 2022

no coordinates

#4546 (comment)

A georeference is two clicks away - click this:

Screen Shot 2022-04-25 at 2 43 56 PM

then save (or I can do that for your collection, or some subset, or whatever).

@campmlc
Copy link
Author

campmlc commented Apr 25, 2022 via email

@ccicero
Copy link

ccicero commented Apr 25, 2022

@campmlc ok, but that shouldn't be the case.

@dustymc I can magic click the coordinates, and was going to do that but wanted to show you first. Seems like there should be no map if no coordinates?

@dustymc
Copy link
Contributor

dustymc commented Apr 25, 2022

This mapping occurs when the entity has no event.

This is true only when the entity is a component of itself, which should probably never be the case (but whatever, I can't and won't try to stop such things, maybe there are good reasons to build self-referencing Entities).

should be no map if no coordinates

Maybe, but I've come around to the idea that a (0,0) point is more informative than a big empty map. If we're going to change that then maybe we need to entirely rethink the prominent map (which I like!).

@campmlc
Copy link
Author

campmlc commented Apr 25, 2022 via email

@ccicero
Copy link

ccicero commented Apr 25, 2022

Mapping: A topic for discussion at our next organism meeting on May 9th.

I do like the big prominent map, but if there are no coordinates, they why have a map at all? We don't for records without coordinates.

@dustymc
Copy link
Contributor

dustymc commented May 3, 2022

This has become unwieldy (I hate this thing: Screen Shot 2022-05-03 at 8 59 45 AM) I opened a new issue for what I think is the only remaining discussion, closing.

@dustymc dustymc closed this as completed May 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants