Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

component loaders (and data entry) #2974

Closed
dustymc opened this issue Jul 29, 2020 · 100 comments
Closed

component loaders (and data entry) #2974

dustymc opened this issue Jul 29, 2020 · 100 comments
Assignees
Labels
Blocker This Issue is blocking other Issue(s). Please reference the issue this is blocking in the comments. Component Loader Things involved in Round Five of the component loader discussions NeedsDocumentation When the issue is resolved in Arctos repository, this should be moved to the Documentation-wiki repo Priority-Critical (Arctos is broken) Critical because it is breaking functionality.

Comments

@dustymc
Copy link
Contributor

dustymc commented Jul 29, 2020

The "data entry extras" functionality isn't as good as it could be, loading large batches of various components (identifiers, parts, identifications, etc.) causes timeouts then problems/confusion, the "claim" process for managing data entered via 'data entry extras' causes problems, etc. Let's fix it.

Very tentative suggestions, which may or may not hold up to reality:

  • Attach all bulkloaders to a scheduled task (like the specimen bulkloader)
  • Replace the "claim" functionality with an ability to change status in "not-yours" records (which would necessarily come with access to records created by users with whom you share collections)
  • Rebuild the loaders to resolve UUIDs without first fetching guid_prefix (UUIDs are unlikely to be replicated, the second identifier isn't necessary)

A normal load would then be

  • load
  • validate (optional)
  • set status to something (probably "autoload")
  • check back later, find either
    • nothing, because it loaded and cleaned up after itself
    • errors, because you didn't validate

"Approving" records loaded by you or your students/techs/associates, via any process including data entry extras, would be

  • set status
  • check back later

I think that would be a significant simplification in both the code and the user experience. "Manage your..." might come with a "pick users" option (a slight increase in complexity), but most of the rest of the complexity (claim, find guid, etc.) that's been introduced for various reasons could be removed.

This has some urgency, I'd like to use #727 as a proof of concept, so I'm adding scary labels and will interpret a lack of immediate objections as enthusiastic approval.

@dustymc dustymc added the Priority-Critical (Arctos is broken) Critical because it is breaking functionality. label Jul 29, 2020
@dustymc dustymc added this to the Next Task milestone Jul 29, 2020
@dustymc dustymc self-assigned this Jul 29, 2020
@Jegelewicz
Copy link
Member

I'm up for trying this method. Anything we can do to simplify and make the process consistent across tools would be nice.

@dustymc
Copy link
Contributor Author

dustymc commented Jul 30, 2020

The basics of this are running in test with bulkload identifications. I think its worked out even better than anticipated, but timely feedback would be appreciated.

Replace the "claim" functionality with an ability to change status in "not-yours" records

The form is limited to manage_collection in order to safely (I hope!) accommodate this, and there's a new "shares collection" function which DOES NOT exclude users with locked accounts (so you can load things created by former techs & etc.).

Rebuild the loaders to resolve UUIDs without first fetching guid_prefix

This is implemented and tested, needs propagated to all other loaders

"validate" is part of the load process; there's no pre-validation. (Having this as a separate step has been a source of confusion for some time, this process facilitates a much simpler go/nogo approach.)

Todo, pending nobody finding a reason to go in a different direction:

  • add component_loader and component_loader_notification to scheduler
  • unschedule autoload_extras and dataentry_extras_notification
  • test with Bulkloading identifications error #2936
  • rebuild all component-loaders to use this system
  • change "extras" notifications; need to report everything in loaders (it's all the same now) rather than attempting to pick out pieces. Use the new 'even if locked' function for this.
  • figure out if we can/should merge "unloaders" into the same system (probably and probably?)

@Jegelewicz
Copy link
Member

Now I gotta dig up some stuff to load....

@Jegelewicz
Copy link
Member

/remind me to work on this tomorrow

@reminders reminders bot added the reminder label Jul 30, 2020
@reminders
Copy link

reminders bot commented Jul 30, 2020

@Jegelewicz set a reminder for Jul 31st 2020

@campmlc
Copy link

campmlc commented Jul 30, 2020 via email

@reminders reminders bot removed the reminder label Jul 31, 2020
@reminders
Copy link

reminders bot commented Jul 31, 2020

👋 @Jegelewicz, work on this

@dustymc
Copy link
Contributor Author

dustymc commented Jul 31, 2020

Another major point for this model: it makes replication easy, there's now a testable locality-loader. I'll stop until I get some feedback, I don't want to replicate any problems.

The loader-scripts aren't scheduled, you can just open http://test.arctos.database.museum/ScheduledTasks/component_loader.cfm to process from the two new loaders.

@Jegelewicz
Copy link
Member

When I follow that link - I get a white screen.

image

@Jegelewicz
Copy link
Member

Let's go to Vegas!
image

@dustymc
Copy link
Contributor Author

dustymc commented Aug 3, 2020

white screen.

Yea it's not very interactive - check back with the data, should be different. https://github.com/ArctosDB/internal/issues/65

Vegas

Sorry, I broke it!

@Jegelewicz
Copy link
Member

OK, one more observation, when stuff won't load, it would help to get the error along with the csv when you download to fix stuff.

So I was able to load 10 localities - none had coordinates - I'll see if I can find a couple that do to try.

Also http://test.arctos.database.museum/ScheduledTasks/component_loader.cfm to process from the two new loaders. Needs to have some kind of interactivity...once you go there, you don't get out and we need people to understand that they have accomplished something. Assuming this will be true in production.

@dustymc
Copy link
Contributor Author

dustymc commented Aug 3, 2020

get the error

wilco

interactivity

That's just test - it'll be on the scheduler in production, loading (or errors) will just happen (including for any number of records).

@Jegelewicz
Copy link
Member

Clarification - So when I load a file directly to the tool, if stuff passes all the triggers, does it just load or will it always show up in the "manage" page first. Don't know why I can't decide what happens....

@dustymc
Copy link
Contributor Author

dustymc commented Aug 3, 2020

You can load with status, and if you load with it as "autoload" then Arctos will take care of the rest (or make errors). If you follow the instructions and load from a fresh template then you'd need to set status (which gives you an opportunity to notice that you've just loaded 4582 duplicates...). How that's implemented and documented is a little waffly at the moment, but the potential for "stuff just happens" exists.

@dustymc
Copy link
Contributor Author

dustymc commented Aug 4, 2020

This is in prod, need to integrate eg #2967 (comment) and rebuild all component-loaders under this umbrella.

Dropping priority.

@dustymc dustymc added Priority-Normal (Not urgent) Normal because this needs to get done but not immediately. and removed Priority-Critical (Arctos is broken) Critical because it is breaking functionality. labels Aug 4, 2020
@dustymc dustymc modified the milestones: Next Task, Active Development Aug 4, 2020
@dustymc
Copy link
Contributor Author

dustymc commented Aug 6, 2020

Need to check throttle; currently set for 10 records per run, can be upped significantly but needs monitored as things are added.

@Jegelewicz Jegelewicz added the NeedsDocumentation When the issue is resolved in Arctos repository, this should be moved to the Documentation-wiki repo label Aug 13, 2020
@Jegelewicz Jegelewicz self-assigned this Aug 13, 2020
@ewommack
Copy link

Hey Arctos - Reminder to try and test this by next Thursday!

@dustymc
Copy link
Contributor Author

dustymc commented Dec 11, 2020

See #3300 - make sure status (which can be errors) is urlencoded when necessary

@dustymc
Copy link
Contributor Author

dustymc commented Jan 11, 2021

This has served its purpose, there's a template, it's awesome, closing.

@campmlc
Copy link

campmlc commented Jan 29, 2021

@gradyjt

@dustymc dustymc added the Component Loader Things involved in Round Five of the component loader discussions label Feb 10, 2021
@dustymc
Copy link
Contributor Author

dustymc commented Feb 16, 2021

The next two tasks on my list (#2556, #2442) rely on this template. I can't seem to reconcile #3413 and the related AWG discussion. Do we love this or hate it? Can I keep building these things or do we need more discussion? Do I need to change something going forward? Do I need to change something with the ~dozen loaders I've already built under this model?

@Jegelewicz
Copy link
Member

I think the tool is fine. It is just the related "documentation" that needs update, but others should weigh in.

@campmlc
Copy link

campmlc commented Feb 16, 2021 via email

@dustymc
Copy link
Contributor Author

dustymc commented Feb 16, 2021

main bulkloader

If you mean the catalog record bulkloader, these are fundamentally different tools. The catalog record bulkloader is an independent tool - things in it load or error, that's it. "Component loaders" can have dependencies - things can hang around with 'autoload: ....' for weeks, then be processed after related data becomes available. That is, component loaders have three exits:

  • load worked (so delete)
  • load didn't work because data are a mess (so a person needs to become involved)
  • load didn't work because dependent data are MIA (so try again later, no humans required)

I would be in favor of changing the actionable value of loaded to "autoload" rather than NULL for the catalog record bulkloader, but that should be addressed in a new issue.

@Jegelewicz
Copy link
Member

Maybe we need to think about some way to let people jump to a specific set of data in tools like https://arctos.database.museum/tools/BulkloadOtherId.cfm

Currently there is an extra-long list of errors in there and if my username was after this person alphabetically, I'd have to scroll forever to get to my stuff.

This is just one page of it
image

Maybe just a table at the top that lists the usernames and lets you jump to a specific user's stuff?

@dustymc
Copy link
Contributor Author

dustymc commented Feb 23, 2021

See https://github.com/ArctosDB/data-migration/issues/450#issuecomment-784555912 - verbose errors 400 lucee, need to POST or truncate errors or something.

Untested workaround: filter only on username, change status to something shorter.

@Jegelewicz

@Jegelewicz
Copy link
Member

Thanks, that worked for my stuff...

@dustymc
Copy link
Contributor Author

dustymc commented Feb 25, 2021

v1.1: csv download should include this to strip unnecessary columns

<cfset flds=mine.columnlist>
<cfif listfindnocase(flds,'key')>
<cfset flds=listdeleteat(flds,listfindnocase(flds,'key'))>	
</cfif>
<cfif listfindnocase(flds,'last_ts')>
<cfset flds=listdeleteat(flds,listfindnocase(flds,'last_ts'))>	
</cfif>
....
<cfset csv = util.QueryToCSV2(Query=mine,Fields=flds)>

@dustymc
Copy link
Contributor Author

dustymc commented Feb 25, 2021

Moved unfulfilled requests to #3463, closing (again).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocker This Issue is blocking other Issue(s). Please reference the issue this is blocking in the comments. Component Loader Things involved in Round Five of the component loader discussions NeedsDocumentation When the issue is resolved in Arctos repository, this should be moved to the Documentation-wiki repo Priority-Critical (Arctos is broken) Critical because it is breaking functionality.
Projects
None yet
Development

No branches or pull requests

7 participants