Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default specific locality in specimen search results #862

Closed
AJLinn opened this issue Apr 8, 2016 · 13 comments
Closed

Default specific locality in specimen search results #862

AJLinn opened this issue Apr 8, 2016 · 13 comments
Assignees
Labels
Priority-High (Needed for work) High because this is causing a delay in important collection work..

Comments

@AJLinn
Copy link

AJLinn commented Apr 8, 2016

The UAM:EH specimen records typically have between 1-3 specimen events (e.g., place of manufacture, place of use, place of collection) with sometimes three different localities. It seems that the specific locality that is displayed in the search results is randomly selected from those three events. I request that the default specific locality that is displayed is the locality associated with the "place of manufacture". Likewise, the georeferenced place of manufacture should be what shows up on the map following a search. Finally, this same information should be the locality information displayed at the top of the specimen record.

@jldunnum
Copy link

jldunnum commented Apr 8, 2016

We are trying to work through this same issue with serial sampling of the same individuals through time and across space (i.e. serial blood sampling of Mexican wolves at the various reintroduction program sites). Not only do you just get a single event in search results, but you cannot download or map the other events either.

@campmlc
Copy link

campmlc commented Apr 8, 2016

So in our case at MSB, we need to be able to search on, map and download
ALL specimen events. Is this possible?

On Fri, Apr 8, 2016 at 1:45 PM, jldunnum notifications@github.com wrote:

We are trying to work through this same issue with serial sampling of the
same individuals through time and across space (i.e. serial blood sampling
of Mexican wolves at the various reintroduction program sites). Not only do
you just get a single event in search results, but you cannot download or
map the other events either.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#862 (comment)

@dustymc
Copy link
Contributor

dustymc commented Apr 8, 2016

Picking a "favored" event is possible - it's (computationally) expensive but happens asynchronously, so whatever. (It's not random - events with coordinates should float to the top all else being equal, etc. - but it probably looks that way to most users for most specimens!)

The "see all locality" issue is #755. The short version is that "locality data" is a bunch (~100) of columns for every specimen-event, and a specimen can have any number of events. That doesn't fit in anything tabular (results table, download ), and having data in maps/queries (things that can deal with variable cardinality) which can't be seen in the table would be extremely confusing.

@dustymc dustymc self-assigned this Apr 8, 2016
@dustymc dustymc added this to the Next Task milestone Apr 8, 2016
@dustymc dustymc added the Priority-High (Needed for work) High because this is causing a delay in important collection work.. label Apr 8, 2016
@jldunnum
Copy link

jldunnum commented Apr 8, 2016

Maybe we could have a way to mark records that contain multiple events so at least people will know when they see it in the search results and can go deeper if they wish.

@dustymc dustymc modified the milestones: Active Development, Next Task Apr 12, 2016
@dustymc
Copy link
Contributor

dustymc commented Apr 12, 2016

Maybe we could have a way to mark records that contain multiple events so at least people will know when they see it in the search results and can go deeper if they wish.

Yes, that's the core intent of #755 - and if the "marker" contains the data (eg, as JSON - and I have no idea if that's practical until I play with it) then having that available should make it somewhat simpler to go deeper - just unwind into the variable-cardinality format of your choice, or flatten it out into DWC Occurrences (which we already create and could make available), or use the clicky-viewer (if we can figure out how to build one), or whatever.

Or maybe nobody (or nobody without access to the writeSQL tool) would make use of the JSON and a simple "this thing has 48 localities see specimen detail" flag is enough??

Picking a "favored" event is possible - it's (computationally) expensive but happens asynchronously, so whatever.

It turns out the "simple" way is REALLY expensive - a small batch update (500 records) went from ~2 seconds to ~7 minutes, which will be disruptive even as an asynchronous process. I'll keep looking....

@dustymc
Copy link
Contributor

dustymc commented Apr 14, 2016

I may have a workable solution to selectively picking the one specimen event that appears in specimenresults + downloads. Priority currently is:

  1. event_type=place of manufacture
  2. an event linked to a locality with coordinates
  3. just grab one of whatever's left

in all cases excluding "unaccepted place of collection."

Other requests?

@campmlc
Copy link

campmlc commented Apr 14, 2016

By date - earliest and most recent.
Can we choose by geographic element, eg state?
On Apr 14, 2016 8:39 AM, "dustymc" notifications@github.com wrote:

I may have a workable solution to selectively picking the one specimen
event that appears in specimenresults + downloads. Priority currently is:

  1. event_type=place of manufacture
  2. an event linked to a locality with coordinates
  3. just grab one of whatever's left

in all cases excluding "unaccepted place of collection."

Other requests?


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#862 (comment)

@dustymc
Copy link
Contributor

dustymc commented Apr 14, 2016

I was referring to machine behavior - given http://arctos.database.museum/guid/MSB:Mamm:193683, which one of the 5 events is "prioritized" to fit into http://arctos.database.museum/SpecimenResults.cfm?guid=MSB:Mamm:193683? (Current answer: The one with the coordinates, http://arctos.database.museum/guid/MSB:Mamm:193683?seid=593167.)

I don't understand the above comments.

@AJLinn
Copy link
Author

AJLinn commented Apr 14, 2016

Those priorities work for me and seems logical.
How does the # 2 priority determine its selection if there is no #1 and multiple events linked to localities with coordinates. Just goes on to #3?

thank you for working on this. It will make a huge difference for our users.
Angie

On Apr 14, 2016, at 6:38 AM, dustymc notifications@github.com wrote:

I may have a workable solution to selectively picking the one specimen event that appears in specimenresults + downloads. Priority currently is:

  1. event_type=place of manufacture
  2. an event linked to a locality with coordinates
  3. just grab one of whatever's left

in all cases excluding "unaccepted place of collection."

Other requests?


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub #862 (comment)

Angela J. Linn
Senior Collections Manager, Ethnology & History
University of Alaska Museum of the North
907 Yukon Drive
P.O. Box 756960
Fairbanks, AK 99775-6960
TEL: (907) 474-1828
FAX: (907) 474-5469
www.uaf.edu/museum
Accredited by the American Alliance of Museums

Explore our collections: http://www.uaf.edu/museum/collections/ethno/search-collections/


http://akethnogirl.wordpress.com

@jldunnum
Copy link

Could use date as the next level of hierarchy within those categories. Earliest event gets priority.

@dustymc
Copy link
Contributor

dustymc commented Apr 14, 2016

https://github.com/ArctosDB/DDL/blob/master/functions/getPrioritySpecimenEvent.sql is now experimentally running at prod - it's a bit slower than the previous revision, but the ~15K specimens with a place of manufacture updated in ~10 minutes or so, which seems workable. Adding more logic to the ordering, as long as it doesn't use data outside of specimen_event, collecting_event, and locality, should (!) have a minimal impact on performance, and adjusting the function is simple as long as the input and output parameters don't change.

The function is now finding the earliest event (based on began_date) within the winning category.

http://arctos.database.museum/guid/MSB:Mamm:224771 has a bunch of equivalent events (accepted place of collection, no coordinates) and so....

UAM@ARCTOS> select specimen_event.specimen_event_id,collecting_event.began_date, locality.dec_lat from specimen_event,collecting_event,locality where specimen_event.collecting_event_id=collecting_event.collecting_event_id and collecting_event.locality_id=locality.locality_id and collection_object_id=21760431;

SPECIMEN_EVENT_ID BEGAN_DATE                                DEC_LAT
----------------- ------------------------------------------------------------------ ----------
      2585775 2010-01-01
      2585778 2011-03-29
      2585777 2010-08-30
      2585779 2012-01-04
      2585782 2014-12-18

5 rows selected.

Elapsed: 00:00:00.01
UAM@ARCTOS> select getPrioritySpecimenEvent(21760431) from dual;

GETPRIORITYSPECIMENEVENT(21760431)
----------------------------------
               2585775

1 row selected.

... the earliest is returned, which hopefully won't offend anyone.

Including State would require one more join (to geography), and if there's no Arctos-wide agreement on which state is most important (seems unlikely) then an additional 3 jumps the other way to get at Collection. There are 1488 unique States in Arctos at the moment, which might be enough to have a noticeable impact on the post-query processing as well (especially if collection is a multiplier). So possible, yes, but likely fairly expensive. ("Expense" can be measured in how long it takes an update to appear in the interfaces and is difficult to quantify, but my wild guess is that adding state would be noticeable/disruptive.)

@Jegelewicz
Copy link
Member

Jegelewicz commented Dec 14, 2022

I am re-opening this because the solution isn't working for me. See the issue referenced above. We need to be able to tell people that more than one event exists in the search results/download.

@Jegelewicz Jegelewicz reopened this Dec 14, 2022
@dustymc
Copy link
Contributor

dustymc commented Jan 6, 2023

more than one event exists

Screenshot 2023-01-06 at 7 44 12 AM

Or to see data,

Screenshot 2023-01-06 at 7 47 35 AM

isn't working for me

I'm closing because I don't think anything else can become actionable from here. Please reopen if you have a solution in mind, or open a discussion if you want to look for a solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority-High (Needed for work) High because this is causing a delay in important collection work..
Projects
None yet
Development

No branches or pull requests

5 participants