-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting Preparator Number out of attributes #4270
Comments
You could turn attributes on and extract that stuff from the JSON, but this is definitely a place where SQL is easier. Let me know if the attached contains what it should, if you want something else included, or whatever I can do to facilitate.
You should be able to split those columns out into bulkloader format (collectors and otherIDs), eg
should result in "A157" (as some ID type) in the otherID loader, plus two lines (one for Bob and one for Virginia) in the collector loader, with collector_role=preparator. |
Thanks very much Dusty. Yes that CSV file does appear to contain all I need. I have started the process of splitting the VERBATIMPREPARATOR column into bulkloader format, I am using Excel's "text to columns" tool. Here are about 70 lines where I have done that, can you please look through and confirm I am splitting it into columns that the bulkloader will accept? Notice column I: "JEB questions - Dusty please confirm" where I give questions for that particular record. In particular - for us "Preparator Number" is a thing, it's pretty much our Original Identifier in most cases and what I enter as "Custom ID" at the start of our data entry for each record, so I want to keep that as a single cell (instead of an integer associated with the collector). But I also realize I need to separate out the Preparators, make sure they are all agents and have them labeled as Preparators. I have good SQL help from Elly here in our IT office, but I am hoping to run my plans by you before giving instructions to her, to make sure I am not asking her to do anything I will regret later. After I do this for Preparators, I will do something similar for Collectors. Thanks again! Jeff |
Yep, looks like you're on the right track - should just be a matter of renaming columns and maybe adding static values (eg collector role or ID type) to make that load. You can use group agents if that makes something easier. "Preparator Number" is https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcoll_other_id_type#preparator_number - as long as you agree with that definition, no problem there. (Issue if you don't agree, of course.) You can load bits and pieces, so eg, pulling one column (and GUID) and loading it as some ID Type, then pulling another column as another type should be simple. (Or you can merge it all into one giant load if that works better for you.) At some point you'll need to standardize and clean the agents (eg, "Nat. Museums of Kenya"--->"National Museums of Kenya"). I usually do that after everything has been split up, having all of the agent data together in one column will reveal typos and such. That also often uncovers things like "M. L. Johnson 6803" (from your 2nd preparator column) that can force you to back up a step (a simple correction would lose the ID) - saving LOTS of versions usually deals with that. Let me know if there's anything I can help with. |
Thanks Dusty this is great. Cleaning the agents is going to be a long slog. Quick question: can you help me make the group agent "R. L. & V. R. Rausch"? I just tried and got a weird error. Longer question(s), maybe more philosophical: I am going to have a lot of agents to create, many of them only associated with a few records (eg students who prepped a couple specimens during a Mammalogy class). How much time do I spend now to figure out who their associates were (eg so I could include "Student of E. S. Booth" in the Agent Record)? |
https://arctos.database.museum/agents.cfm?agent_id=21336379 Anything that'll resolve to a single agent will work. (Preferred name is unique so that's guaranteed.) If there's one "E. S. Booth" then no problem, if there are 20 then it won't work. That particular example works. Even vague relationships (student of E. S. Booth), status entries (alive in 1970) and addresses (Seattle area) are all potentially useful, but your call - sometimes that's just a price you can't pay. Stuffing that all into remarks might be more workable, and that's better than not recording it it all. Finding two agents representing a person (or whatever) can be surprisingly difficult - they can hide behind a typo for years. Meanwhile someone will have used them for some unrelated thing, and you end up with a huge low-data mess that nobody's ever going to sort out. If the agent (their biographer, whatever) comes looking, they're probably not going to be happy that you've apparently sold all those dead rats that aren't attached to the part of themselves they've found. Noticing that some particular agent was in two places at once is sorta unavoidable - everyone finds everything, it just comes with some extra. Splitting is generally easy. If there must be messes, overloaded agents are worlds easier to deal with - including finding - than 64 more "J. Smith"s. I'd definitely recommend setting them up to be as discoverable as possible, but I'd also not hesitate to attribute 2 "J. Smith" low-data Peros to one of the (many, probably) existing low-data "J. Smith" variants. We'd all like you to spend HOURS on each of these, and we all understand that you can't. |
Hi Dusty - Are you still doing office hours? or can we set up a time to chat about my progress on this? |
I think everything's officially on hold for the summer, but I'm around for the next couple weeks anyway if you want to schedule something. I'll try to answer some stuff here. See #4554 - in general, unless there's some reason to create an Agent (eg, you know their life events, addresses, relationships, etc., or they're doing more than collect) I currently think verbatim collector is a better (and functionally equivalent) solution. I'd leave 'stranding team' where it is. "leave as Verbatim? Or designate as Preparator number?" - mostly a question of why anyone would search for that. If it's more like an Identifier then it should probably be moved. If it's an iffy mess (and it probably is if there's not a clear target) then it probably doesn't matter where it lives for now. "Should Svihlas be a group?" - #4555, I'm hoping to kill that as an agent type, but the functionality will remain via relationships - in any case, "these two folks acting as one" is something that Arctos can handle, it might make sorting the data a bit easier, but it probably makes the data slightly less accessible too. I'd use the agent splitter and give them both direct attribution, or leave them in verbatim - I'm not sure the middle ground makes sense, even if a lot of folks seem to like it.
Then they can have more than one role. I think there are "traditional" reasons we don't do that, but "you prepared it, you collected it" is always most correct.
It would be two (500, whatever) rows using the same GUID in the collector bulkloader.
Your call. Are they that way because you transcribed them wrong, or because the collector was inconsistent? (Usually both I imagine!) You can always enter both, but I can't tell you if that would add or reduce confusion.
String-based loaders need a unique value by which to find the data object, and preferred name is always unique. So yep.
Yes.
or
I'd use the first if the entity is "agent worthy" and the second if it's just "Bob" but you have corrections. (But maybe if you've dug that deep then you'll know something Agent-worthy - this Bob was alive in 1987 is SOMETHING.)
Everyone rejoices, that's what! That's 4554 - "Pope" can't do much by itself, so why make that complicated? Just leave it verbatim, "upgrade" if you find some reason (eg relationships, dates, better-than-collector data) to elevate "Pope" to an agent.
I'd still only create Agents if there's some reason to.
There are no technical restrictions or links between those things, so sure. Whether you should or not is up to you.
I'd 100% stick with verbatim, but Arctos won't prevent you from listing out all the possibilities either.
Again I can only really answer from the technical side, but yes - if its a finite number of unicode characters, it can be a preparator number. (But I'd recommend something more generic if you don't know that - https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcoll_other_id_type#other_identifier is nice)
Anyone searching by whatever it should be won't find it. That said, I don't think we know what all our various and sundry nonresolvable identifiers do and I don't think anyone else does either so hopefully everyone's searching by the value and ignoring our bad attempts at pigeonholing, and if that's true then one type (of nonresolvable, those without base_url) is as good as any other. |
@jebrad I'm out this week at SPNHC, but Ill be back next Tuesday. If I end up with time, I'll look at this file before I get back and maybe we can set up a call to discuss. |
OK this is fabulous, thanks Dusty especially for wading through all that red text. I'll chew on all of this and let you know if I have follow-ups. Jeff |
Good chat yesterday, @Jegelewicz - thanks very much. I've pared down my "example spreadsheet" and attached again. It seems that all my remaining questions (red text in that spreadsheet) are mostly an issue of the mechanics of the bulkloading. @dustymc As part of this, I would like my VERBATIMCOLLECTORS and VERBATIMPREPARATORS to be visible in the Agents tab, instead of their current spot on the Attributes tab. Is this easy? Thanks all! |
There is (and should be!) no such thing (that's method)??
Definitely needs a dedicated Issue. Seems sorta "wrong but maybe not entirely EVIL" on the surface, but if it allows me to proceed with #4554 then I'm all in. |
I think this needs it's own issue - and probably I misunderstood what @jebrad was asking yesterday! |
@jebrad I've been going through your verbatim collectors - it looks like you guys have added a lot of agents recently in order to match these names up with agents. As I'm working, I noticed a few that I think are duplicates.
These are the kinds of things @dustymc is talking about as we are trying to make better agents and reduce the number of duplicates. Some of these were entered by other people - everyone does this! I have the time and feel empowered to merge or change agents in order to make the data better. If you can give me a week or so, I think I can have your collectors, collector numbers, preparators, and prep numbers ready for upload. I can send you the prepped files, then you can review and load them. Sound good? |
@Jegelewicz Ah.... Yes?!? This certainly sounds very good to me. A bit too good to be true, but I'll take it thank you! Yes, we (erikajprice) have been creating a lot of agents or finding the correct names in preparation for doing this bulkload. I am fairly sure that most new agents Erika created will have enough into to raise them above low-quality agents (eg "alive on" date), even if we have to go back through to add that info. This spreadsheet will allow us to search for the new ones that Erika created, and either add more info or purge the agents that have none. I've attached the big file of all our records with the GUID, VERBCOLL and VERBPREP, these 3 columns are what I got from Dusty. Also has a few columns showing Erika's progress, agents created, and her questions to me, etc. Let me know if you change your mind on this offer, or if you think of ways I can help without being in your way. |
@jebrad yeah - that helps! I have been doing this for a bunch of incoming collections so it is at the top of my mind. If I have questions, I'll ask them here. |
@jebrad here is a first step - the preparator numbers in the bulkload other identifier format. A few notes:
Anyway - you could load this to the other identifier batch tool as is and your preparator numbers would be in. If you want names removed from the numbers or any of these things set up as a different type of identifier, let me know and I'll set it up that way. I am still working on agents for the preparators - I'll get some of them here soon. Ask any questions here! |
Just curious - sure it's R. J. Rausch and not Robert L. Rausch? I'd love to track down more of the latter's specimens. Also, many of his numbers are in as collector numbers or sometimes preparator numbers with RLR as prefix. This is why including as much info as possible is useful. And consistency, at least within a collection, is best, in case something needs to be changed - so it can easily be changed all at once. |
There are A TON of Rausch specimens in this collection - getting those collector/preparator names loaded is a priority. Anything R.J. Rausch is probably Robert L. |
Any way to get that same file with the Rausch prep number added in? Then I
could link the related MSB records.
…On Thu, Aug 4, 2022 at 3:00 PM Teresa Mayfield-Meyer < ***@***.***> wrote:
* [EXTERNAL]*
Also for grins - here is the file you could upload
<https://arctos.database.museum/tools/BulkloadCollector.cfm> to add all
of the Rausch collector and preparator agents.
UWBM_Rausch_agents.csv
<https://github.com/ArctosDB/arctos/files/9263318/UWBM_Rausch_agents.csv>
—
Reply to this email directly, view it on GitHub
<#4270 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBEJ6JYYQ4BGJOLOTCTVXQVPFANCNFSM5MID63CA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
The prep numbers are in the identifier load that he already has. |
OK, I can wait!
…On Thu, Aug 4, 2022 at 3:03 PM Teresa Mayfield-Meyer < ***@***.***> wrote:
* [EXTERNAL]*
The prep numbers are in the identifier load that he already has.
—
Reply to this email directly, view it on GitHub
<#4270 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBCDQN2QILZF5KL4OJLVXQV2HANCNFSM5MID63CA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Ok this is great. I have gone through the files, thanks @Jegelewicz! I am a bit overwhelmed, but also confident that these look good, and that our time on Wednesday will be well spent getting these bulk loaded. Quick question: is there something specific I should be doing to prepare for Wednesday's bulkload? Eg should I try to find and think about all the red cells like these: UWBM:Mamm:47024 | preparator number | A. E. Perry 657 | self My instinct is that these will be more easily solved after the bulk loading is done, rather than trying to fix them now. Thanks, Jeff |
Agree - they are just red because the same "preparator number' is being applied to two different records. This might be just fine, but also might indicate a typo or something. |
My guess is that for many of these duplicate prep #s, the skull got one UWBM, and the skin got a different UWBM. Seems like that used to happen a lot.... |
That's fair and fine - BUT it would be good to make sure these records included a relationship with same individual as and that can be the clean-up! |
Today we loaded the collector and preparator numbers and the Rausch agents. Next up
|
@jebrad I am waiting on something before doing steps 1-3 above and I thought that you might enjoy seeing yourself as an agent! Load the attached file to the Bulkload collector tool, review and set them to autoload, then wait a day or so and check out your agent page. Your numbers will have really gone up! |
@jebrad would you consider giving me access to your collection? I bounce around and do projects and I could be adding stuff to your data, only I don't have access. Here is an example: On this page of Rausch necropsy cards (already in Arcts thanks to @campmlc), I find that your UWBM:Mamm:32065 record is referenced via the Rausch preparator number 044556. I could tag your record in this media and create a link, I could also verify a bunch of trait data from the card and update the determiner for these traits to Rausch (more confidence building than "unknown"). Don't get me wrong, I wouldn't do this full time, but at least when I run across things, I could just do them for you. But also, this would be a FANTASTIC student project - tagging your records in the existing media and verifying traits (and maybe even adding some!). |
I edited that tag so you can see what it should look like. There's also a less-formal tagging possibility How the heck have you been working blind @Jegelewicz ?! |
Suggest that we modify the tag so that it only includes the first two columns - as this would just include the host record. The entire row includes data for both the host and parasite records. As we come across associated parasites in other collections, they will be tagged to the cestode column (Echinococcus). |
I have an amazing ability to work in sub-optimal conditions.... |
That line also includes traits of the host and in the far right column it clearly states "skin and skull to Burke" - no reason we can't tag the entire line to both host and parasite(s). |
Yea, that whole line is clearly relevant to multiple things. That also sets up the possibility of things like spatial analysis - 'these two references overlap, maybe there's some interaction" and maybe on to "this doc is about parasites, bot-create a relationship.' |
We do need to set up some sort of working group for this, because that is not how the MSB Rausch records have been tagged. If we are going to overlap the tags, we need some change to the UI because you won't be able to see the parasite tags under/over the host tag on the page. Some changes to the UI to make it easier to find and search these multi-page documents would also be helpful, and it would be good to come up with some recommendations. |
There can be two host records and 3 or more parasite records scattered across multiple institutions PER ROW. We need recommended way to set up the tags. |
@Jegelewicz Yes just gave you access to UWBM Mamm (I think). Thanks! |
@jebrad I have loaded a file of collectors and preparators that will pass the agent test after #4554 gets implemented to the Bulkload Collector tool. I can review and load these, or you can. Just let me know if you want me to do the clicking. Once these are loaded, I think we are done with this issue! |
I loaded these today - if any are incorrect, we have the file above to do an unload. Forgot we still need to fix verbatim agents. I have an unload and load ready for that too. |
Set all verbatim agents except four to unload. Once this is complete, I will load these. |
@jebrad there are 1,297 entries in the bulkloader for UWBM:Mamm. Some of them were entered in 2017 by Dusty! Can I clean these up? They won't load anymore because things have changed (part names, etc.) since they were initially loaded. I can download them, fix them and reload if you want. There are recent things as well - I can leave those alone for you to approve. |
Sorry for the radio silence and thanks for the work on these @Jegelewicz ! I just got back from some time in the wilderness, and I'll get through some of this in the next few days. I also have Erika this Sunday so we can hopefully make some progress this week. |
Wow! Much progress! I am sifting through to find my Next Step. @Jegelewicz can you please confirm if you are still waiting on action from me from your Sep 3 comment:
I did look at that file, and it looked fine in theory (as much as I could tell from a quick look) - certainly it looks good enough to load. And - Yes please go ahead and work on fixing/reloading the old things in the Bulkloader. I'll try to work on approving the newer ones and try to get that emptied for u. Is it time for another phone call so you can get me up to speed on what I can do to help you? Or I can just check here. I've got some time the next week or so to work on this. Now I need to go look into Geography cleanup #5017, a bunch of those on Dusty's list are mine..... |
@jebrad you can just check in here - the attribute unloader is taking it's time, then agent loader has a lot to do, then I can load back the verbatim agents. I'll let you know when it's all done. I'll go look at the old records in the bulkloader tomorrow or next week. Woohoo! |
@jebrad I believe this is all done. If you agree, please close this, otherwise let me know what's missing. |
Yes thank you! |
Dusty, I need your help and advice please.
![J E Bradley 484](https://user-images.githubusercontent.com/62565118/150011508-57572af4-0d4c-4b58-8647-f554f41dfd1d.jpg)
I am finally tackling the task of getting Preparator Number out of the attributes (you put them there as "Verbatim Collector, remark: preparators" during migration) and into the proper Preparator Number.
In the attached image, J. E. Bradley 1484 needs to become the Preparator Number (Bradley and C. Glenney will each be listed as preparators, R. GItzen and DEMO will each be collectors, DEMO# 2000-39 will be Collector Number).
I think my next step is to clean up my "Verbatim Collector: Remark preparator" field (so that each name is a correct Agent, separating out additional preparators like C. Glenney in the above case) so they all can be loaded into the Preparator Number field.
And I think that requires you to enable me to Customize my search so that I can see the Remarks for Verbatim Collector, perhaps by adding a box for "Attributes remarks" here:
Otherwise the attached record returns this as Verbatim_collector: R. Glitzen, DEMO# 2000-390; J. E. Bradley 1484, C. Glenney .
Am I thinking about this correctly? I have someone in our IT office who has a lot of SQL experience and will help me on this end, I am just trying to figure out exactly what she needs to do to help me.
Thanks, Jeff
The text was updated successfully, but these errors were encountered: