Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaning up of low information agents #4903

Closed
lin-fred opened this issue Aug 4, 2022 · 72 comments
Closed

Cleaning up of low information agents #4903

lin-fred opened this issue Aug 4, 2022 · 72 comments

Comments

@lin-fred
Copy link
Contributor

lin-fred commented Aug 4, 2022

@dustymc can you post a list of agents that only have remarks you'd like us to start looking through

@lin-fred
Copy link
Contributor Author

lin-fred commented Aug 4, 2022

also please list any other agent cleaning steps that need to happen for #4554 to be successful

@dustymc
Copy link
Contributor

dustymc commented Aug 4, 2022

cleaning steps that need to happen for #4554 to be successful

You don't need to do anything - say GO! and I go and we're done....

I think people want to do things - lots of cleanup seems to have happened in the other thread (which is what lead to the idea of just getting rid of the clutter that lead to those situations), I'm happy to do whatever I can to facilitate that, just let me know what you need.

Here are 34961 collector (table, not role) or less agents created more than a year ago.

temp_agent_clean_fp(1).csv.zip

There are an additional 6820 low-information agents created within the last year.

FYI there are 95973 total agents at the moment.

@lin-fred
Copy link
Contributor Author

lin-fred commented Aug 4, 2022

Thank you for clarifying that, I thought there had to be something done on our end for you to be able to move forward but now I understand

So our big steps here are, we set up a deadline, once its past the deadline, these agents get moved into verbatim agents and their remarks get moved into a remarks field for the attribute?

It's a long list and there is no way the agents committee can tackle them all, but my hope is that we can help some collections who would really like their agents to be "fixed" before the merge

Would it be better to focus on the old or new ones? Or maybe it doesn't matter?

@lin-fred
Copy link
Contributor Author

lin-fred commented Aug 4, 2022

But we will also communicate to these collections that just because the agent has been put into verbatim agent, nothing is lost and there are tools/workflows to help them clean them up and add them as an actual agent

@dustymc
Copy link
Contributor

dustymc commented Aug 4, 2022

big steps

Sounds reasonable, or I can break things into chunks, move some out of the way while we're still working on others, WHATEVER.

collections who would...

Let me know if you need a different view of the data.

nothing is lost

Yep, that's the intention.

tools/workflows

Yep. Worst case we do absolutely nothing, which still puts cleanup in the context of more than bare strings and seems like a significant improvement to me. Best case, #4872 works as hoped, this all becomes a click (which might be automated).

@Jegelewicz
Copy link
Member

Here are a few for @wellerjes @droberts49

Aileen Alvarez (person: 21302067) [new window]
Alec Acevedo (person: 21302055) [new window]
Alvin So (person: 21302070) [new window]
Anamylee Ruiz (person: 21302050) [new window]
Andrew Moy (person: 21302058) [new window]
Brianna Nesbeth (person: 21302068) [new window]
Daisy Lara (person: 21302063) [new window]
Daliana Soto (person: 21302057) [new window]
Davarhe Jones (person: 21302060) [new window]
David Dietrich (person: 21302047) [new window]
Devontea Roy (person: 21302053) [new window]
Dixon O'Banion (person: 21302049) [new window]
Evelyn Garcia (person: 21302048) [new window]
Izaiah Redd (person: 21302056) [new window]
James Majors (person: 21302066) [new window]
Justin Peterson (person: 21302065) [new window]
K'Von Jackson (person: 21302059) [new window]
Liliane Tran (person: 21302069) [new window]
McClaran Shirley (person: 21302051) [new window]
Mustiqirr Muhammad (person: 21302061) [new window]
Natalia Carroll (person: 21302064) [new window]
Natavia Barr (person: 21302054) [new window]
Paloma Carroll (person: 21302062) [new window]
Todd Woods (person: 21302052) [new window]

These are all listed as "Student participant in the Chicago Academy of Sciences summer TEENS program." in remarks. If you want to keep them as agents - I suggest creating a project and adding them all to it. I'm happy to do this for you if you want!

Exploring further, there a a whole BUNCH of students in this group that really would be a nice project instead or a series of related projects if this is some kind of annual thing.

@dustymc I guess group membership would be something that keeps an agent an agent? I really don't like the groups, but perhaps they do serve some purpose as they apparently have here.

@dustymc
Copy link
Contributor

dustymc commented Aug 4, 2022

Groups are just awful (no metadata) relationships, prioritizing #4555 would simplify things.

@Jegelewicz
Copy link
Member

I can help a ton with that - Groups are fairly easy to convert to projects - the problem then becomes all the activity of the group and how we manage that. So when the group agent is a collector - OOF

@lin-fred
Copy link
Contributor Author

lin-fred commented Aug 4, 2022

is there a way for me to get a list of all agents that are connected to my collections?

@dustymc
Copy link
Contributor

dustymc commented Aug 4, 2022

I don't think so, but if you'll elaborate on that I can probably pull them.

@lin-fred
Copy link
Contributor Author

lin-fred commented Aug 4, 2022

any agents that are associated with
NMMNH:Bird
NMMNH:Ento
NMMNH:Herb
NMMNH:Herp
NMMNH:Inv

?

@dustymc
Copy link
Contributor

dustymc commented Aug 4, 2022

From the data above (straightforward) or from anywhere (not straightforward, needs an issue)?

@lin-fred
Copy link
Contributor Author

lin-fred commented Aug 4, 2022

From the data above (straightforward) or from anywhere (not straightforward, needs an issue)?

but the data above doesn't say which collections the agents are attributed to, there are a lot of them that have NMMNH in the remarks though that I can work on, but I'm curious if there are many others that don't. I'll make a new issue

@Jegelewicz
Copy link
Member

See my instructions in your other issue - I've been working on UTEP:Herp agents and already found some cool stuff! Check our Ernest A. Liner (who I also added to Bionomia) and Eugene D. Fleharty. It's fun to figure people out! I also spent some time on random Joneses and was able to use remarks to add other data to some of the - still a long way yo go in that list though....

@ccicero
Copy link

ccicero commented Aug 10, 2022

I'm all for cleaning up agents, but if 'low quality' agents (only initials plus last name) get moved to verbatim, what happens to the collector/preparator agent? I hope it's not getting changed to 'unknown' as someone with a last name but only first/middle initials is certainly known more then someone with only sets of initials. ???

@dustymc
Copy link
Contributor

dustymc commented Aug 10, 2022

'low quality' is "just names and acting only as collector," the format of the name is not involved.

what happens to the collector/preparator agent?

It will be removed. There is no data loss, the 'verbatim agent' attribute can carry all of the information a names-only Agent can carry.

There will be tools to "upgrade" verbatim if more information becomes available, and the intention of this Issue is to recover anything that was entered incorrectly - eg, https://arctos.database.museum/agent/21345767 (entered today) would have been on the chopping block because it had only remarks, @Jegelewicz created a relationship from those remarks, it's no longer "just strings" and therefore will not be involved in any cleanup.

Lots more discussion in #4554, https://github.com/ArctosDB/newsletter/issues/166#issuecomment-1211368414 will become an article.

@ebraker
Copy link
Contributor

ebraker commented Aug 11, 2022

I wasn't able to attend the issues meeting and just read through the notes. Can someone clarify what will happen to agent_remarks for an agent with remarks but no relationships/addresses/transactions? Will remarks get transferred to verbatim agent attribute remarks? (they won't disappear correct?)

@Jegelewicz
Copy link
Member

Will remarks get transferred to verbatim agent attribute remarks? (they won't disappear correct?)

Yes - the remarks and also I think aka's will be placed in the remark for the verbatim agent.

@ebraker
Copy link
Contributor

ebraker commented Aug 11, 2022

Are these remarks then visible on the catalog record page - some are loooong and/or pre-date the option to add "curatorial remarks" and probably don't need to be publicly displayed?

@Jegelewicz
Copy link
Member

@AJLinn I think @dustymc decided to go wild last week and merge a bunch of stuff, but I really want to wait on most merges until AFTER any string-only agents are converted to verbatim agents so that what ever people are using NOW is what they get THEN.

Also, just in case you have missed out. Any agent that has only names and remarks that is ONLY involved in collector roles, will be converted to verbatim agent around the first of next year. Any agents that you want to keep that are in this group need to have either a status, relationship, or address added to keep them agent-worthy. Also note that you can no longer add an agent without one of these things.

This is a plea to PLEASE allow the collections to do this work! Mass-mergeing stuff now is going to mean people losing verbatim information that they have currently recorded in agents.

@AJLinn
Copy link

AJLinn commented Sep 13, 2022

Thanks @Jegelewicz - I've had to miss out on a bunch of Arctos stuff because of a 8-week long seminar I've been involved with (now completed) and have not been tracking agent stuff as closely as I should have. I am fully invested in doing this work but agree, we MUST have time to prioritize the work and it's not going to be a quick fix! Forcing us to drop all of the other time-critical priorities to do these fixes before things get merged is not going to engender a positive working environment! It's going to lead to mistakes and pissed-off users, NOT the intended improvement of agent records.

For example, for just UAM:EH dealing with 918 names on this google spreadsheet assuming it takes an average of 5 minutes per name (some will take much more effort while others will be faster) means 76.5 hours of work!!! I assume others have equally as many entries to review and none of us have two weeks of dedicated time to only devote to this task.

This is a plea to PLEASE allow the collections to do this work! Mass-mergeing stuff now is going to mean people losing verbatim information that they have currently recorded in agents.

1000-times YES!

@Jegelewicz
Copy link
Member

just UAM:EH dealing with 918 names on this google spreadsheet assuming it takes an average of 5 minutes per name (some will take much more effort while others will be faster) means 76.5 hours of work!!!

Yep - I can get through about 20-25 in an hour. FWIW I have been spending some time each week just transferring information from remarks to the various status, relationship and address fields, so I am trying to help everyone get some of this done before the deadline. Also, @ArctosDB/agents-committee is meeting half an hour early each month to work on this too.

@dustymc
Copy link
Contributor

dustymc commented Sep 13, 2022

allow the collections

That's what's happening, #4930

@krgomez
Copy link

krgomez commented Sep 14, 2022

I'll work on adding more data to the UAM:Art agent profiles in the list. It can be challenging to research some of our more obscure artists, but I will do what I can. If an agent is a determiner of an attribute, does this disqualify them from being changed into a verbatim agent? Can you clarify, is the list shared in this issue all of the "low quality agents", or are there more?

@Jegelewicz
Copy link
Member

Jegelewicz commented Sep 14, 2022

If an agent is a determiner of an attribute, does this disqualify them from being changed into a verbatim agent?

Right now, yes.

is the list shared in this issue all of the "low quality agents", or are there more?

All as of the day it was made - but more agents get added every day...

@marecaguthrie
Copy link

marecaguthrie commented Sep 14, 2022 via email

@Jegelewicz
Copy link
Member

Jegelewicz commented Sep 14, 2022

@marecaguthrie don't panic! We aren't asking anyone to add any information that isn't already in a biography - just to add it in some more appropriate places! I will make it a personal mission to look through your agents to ensure that you know if any of them are headed for the verbatim agent attribute.

BUT even that would not lose anything! If an agent is "verbatimized" to a verbatim agent attribute any remark associated with the agent (your biography) will also go into the verbatim agent remark and they could be "upgraded" to an agent at any time there is something other than name or remark to identify them.

if the working group has a policy/statement about about ethics/privacy for people who have personal info in Arctos? Or maybe something to develop if we don’t?

We really don't and we should but we do encumber certain agent information (all addresses except ORCiD, Wikidata and Library of Congress as those are already public). I will start an issue in the internal repo for this.

@dustymc
Copy link
Contributor

dustymc commented Sep 14, 2022

There is still some very fundamental misunderstanding, or miscommunication, or misSOMETHING at play here.

Nothing can be lost; the defining characteristic of a verbatim agent is that the information fits in that structure without loss.

"Goes by single initial, prefers anonymity" is a great fit for verbatim agents; what we're doing does what you say you need to do much better than what we're coming from (where "A." would assuredly get credited with a bunch of unrelated low-information activity, and then probably changed to fit those misattributed data) possibly can.

@Jegelewicz
Copy link
Member

@marecaguthrie here is an example - Litho-Krome Company

Before I did anything, this agent only had the following information

image

Had I left it alone, instead of this on the catalog record:
image

You would have seen
image
verbatim agent

Agent method by date remark
Litho-Krome Company creator Karinna Gomez 2019-02-18 Lithographic printing company in Columbus, Georgia.

For a real example:
image

BUT I just took from remarks the "address" Columbus, Georgia and BOOM, now this is worthy of remaining an agent. With a few clicks, I was able to find their LinkedIN, and a Bloomberg page which both included a FULL address - and it appears this company is closed according to Google (plus their website is up for grabs).
image

So, nothing that isn't already public was needed in order to "agentify" this strings-only agent and now nothing in your records will change nor will their public agent page except for the addition of the urls, which are already public. But also, the agent is more complete and others can tell if it is the same Litho-Krome Company they have in their data or if there is a new incarnation of this company name.

Hope that helps!

@AJLinn
Copy link

AJLinn commented Sep 15, 2022

Question:
If the low-quality agent gets moved to the Verbatim Agent attribute, will they show up if someone searches for the agent from the search page?

Screen Shot 2022-09-14 at 4 48 33 PM

@dustymc
Copy link
Contributor

dustymc commented Sep 15, 2022

low-quality

Low information - I don't think that's the same as or even a decent proxy for 'quality.'

Screen Shot 2022-09-14 at 6 19 47 PM

@Jegelewicz Jegelewicz changed the title Cleaning up of low quality agents Cleaning up of low information agents Sep 15, 2022
@ewommack
Copy link

ewommack commented Sep 15, 2022

So to summarize (because I know this can be a bit confusing, and we've been working on this for a long time):

  • Low information agents = agents that do not have an address or other contact information, birth/death dates, or any relationships and associations.
  • Low information agents = does not have anything to deal with what makes up an agents name
  • If your collections low information agent is moved to a verbatim agent NO DATA WILL BE LOST
  • You can search verbatim agents; no functionality is lost
  • Arctos is working on tools to move data from verbatim agents and agents

@dustymc @Jegelewicz @lin-fred @droberts49 do I have the summary right?

@dustymc
Copy link
Contributor

dustymc commented Sep 15, 2022

My only objection is around the categorization of "down-graded." "Verbatimizing" is a lateral move, functionally equivalent to any other approach. Bigger-picture it should result in a much more information rich environment where things like duplicates (which prevent giving proper credit) are much less likely to exist, so while the path may not be direct I think the end result is inevitably an up-grading.

@dustymc dustymc closed this as completed Sep 15, 2022
@dustymc dustymc reopened this Sep 15, 2022
@ewommack
Copy link

@dustymc I changed the wording. What do you think?

@dustymc
Copy link
Contributor

dustymc commented Sep 15, 2022

Nice, one more request - consider changing

You can search verbatim agents

to

You can search verbatim agents; no functionality is lost

@ewommack
Copy link

ewommack commented Sep 15, 2022

I think we should be good to go for the summary. Here is a clean version of it so we can link to this comment when we are discussing the issue. I'll try and keep track with developments and add to the summary as things come up:

So to summarize (because I know this can be a bit confusing, and we've been working on this for a long time):

  • Low information agents = agents that do not have an address or other contact information, birth/death dates, or any relationships and associations.
  • Low information agents = does not have anything to do with what makes up an agents name
  • If your collection's low information agent is moved to a verbatim agent NO DATA WILL BE LOST
  • You can search verbatim agents; no functionality is lost
  • Arctos is working on tools to move data between verbatim agents and agents

@krgomez
Copy link

krgomez commented Oct 9, 2022

I finished going through and adding more data for the UAM:Art agents on the list attached in this issue.

@mkoo
Copy link
Member

mkoo commented Oct 10, 2022

Assigning an archival database student to help.
Just so we are all working on the same file to clean-up agents this is what I am sharing wtih Jihyun.

@marecaguthrie
Copy link

marecaguthrie commented Oct 11, 2022 via email

@dustymc
Copy link
Contributor

dustymc commented Jan 17, 2023

I think we're done here.

@dustymc dustymc closed this as completed Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests