Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request - limit query cost #5080

Closed
dustymc opened this issue Sep 20, 2022 · 11 comments
Closed

Feature Request - limit query cost #5080

dustymc opened this issue Sep 20, 2022 · 11 comments
Labels
Enhancement I think this would make Arctos even awesomer! Infrastructure-limited Issue which could be resolved, or more easily resolved, with additional computational power

Comments

@dustymc
Copy link
Contributor

dustymc commented Sep 20, 2022

Is your feature request related to a problem? Please describe.

See #5078 - it's possible to turn very many specimenresults options on, doing so is very expensive, I'm not sure there's ever a need for everything all at once.

Describe what you're trying to accomplish

keep Arctos alive without limiting usability

Describe the solution you'd like

Help!

Describe alternatives you've considered

?

Additional context

@Jegelewicz (and maybe everybody else, I didn't check) seems to have all (or nearly) custom options on, which involves many (expensive) function calls. A simple query requires 5145kB of memory and completes in 10271.653 ms (about 10 seconds).

A near-default query for the same records requires some apparently-unnoticeable amount of memory and completes in 73.849 ms (less than a tenth of a second).

From another angle, I'm relatively sure I could kill something interesting with the first set of options turned on, and I'm relatively sure I couldn't with the second.

Is there any real reason any user needs to see, say, NAGPRA category and endoparasites detected in the same results set - can we somehow limit the potential cost of a query?

My only idea is to allow some number (10 maybe) of attributes, but something more sophisticated would be nice.

Priority

I think relatively low, but it would be good to have a discussion before eg #2745

See also #1623 which failed to address this from another angle.

@dustymc dustymc added Enhancement I think this would make Arctos even awesomer! Infrastructure-limited Issue which could be resolved, or more easily resolved, with additional computational power labels Sep 20, 2022
@dustymc dustymc added this to the Needs Discussion milestone Sep 20, 2022
@Jegelewicz
Copy link
Member

I probably had this all turned on either to demo something or research something. No, I don't always need everything, but occasionally I do. It would be nice if I could reset to some basic set of columns or select the stuff I need BEFORE doing a search. And the basic set of columns won't be the same for everyone....in fact, it is VERY occurrence-centric as it is set up now.

Catalog Record Results
  GUID other identifier Identified As country state/province specific locality verbatim date Decimal Latitude Decimal Longitude Coordinate Error (m)

@Jegelewicz
Copy link
Member

Sometimes, I don't care where or when something was collected - it is more important to know where it is in the collection or if it was examined for parasites. The fact that the locality stuff is ALWAYS there can honestly be annoying sometimes.

@dustymc
Copy link
Contributor Author

dustymc commented Sep 20, 2022

BEFORE doing a search.

That definitely seems plausible via #2745 (although the general idea will be more 'stateless' - "before" won't be quite such a defined thing), thanks.

VERY occurrence-centric

Yep, at least in that they both fit nicely into a tabular format - which means things like manufacture/use/collection just don't get usefully represented. I'm completely open to radical ideas.

locality stuff is ALWAYS there

Also seems like something that could be addressed via #2745 - assuming we fail to come up with that radical idea and there's still a table with locality-stuff columns....

@dustymc
Copy link
Contributor Author

dustymc commented Sep 23, 2022

Some recent potential reasons to prioritize this in the logs:

Screen Shot 2022-09-23 at 10 03 03 AM

turning lots of junk on leads to timeouts. "We" presumably know that and are happy to accept it, others may get a bad impression of Arctos from such experiences.

And from that, perhaps we could/should consider (somehow) limiting things only for public users, if there's some legitimate reason to allow everything at all (I think @Jegelewicz is saying there is?).

@ewommack
Copy link

So to make sure I understand:
User profiles can have a bunch of attributes for objects turned on during a search, which means every time we search it is still trying to find all of them and may time out.
So I have collection number always turned on to display, and so no matter the collection I am searching it automatically also searches for collection number?

Or is this more from the fact that we are selecting a large number of attributes to search at once in the search field (e.g. a taxonomy, a state, a county, an agent, and an ID), and that is forcing the time out. The solution there would be to first search with simple terms and narrow down before throwing all the terms in the mix?

@dustymc
Copy link
Contributor Author

dustymc commented Sep 23, 2022

Its mostly cataloged item attributes, which are flattened inline with an expensive function.

Turning on stuff that's cached in FLAT (like state and agents and etc) results in more data being transmitted and that can plug the toobs so might not lead to a great user experience, but probably only for users who know they've got an antique computer and/or a slow connection, and it can't break Arctos.

@ewommack
Copy link

Its mostly cataloged item attributes, which are flattened inline with an expensive function.

So that means it is: User profiles can have a bunch of attributes for objects turned on during a search, which means every time we search it is still trying to find all of them and may time out.
So I have collection number always turned on to display, and so no matter the collection I am searching it automatically also searches for collection number?

?

Just double checking, because ways we figure out how to help create solutions will depend. And it is Friday. My brain is attempting to leave for the weekend.

@dustymc
Copy link
Contributor Author

dustymc commented Sep 23, 2022

User profiles can have a bunch of attributes for objects turned on during a search, which means every time we search it is still trying to find all of them and may time out.

Yes, and queries that work with small numbers of records will fail with more records, and these could be used maliciously.

So I have collection number always turned on to display, and so no matter the collection I am searching it automatically also searches for collection number?

Yes, and that will come from FLAT - the number of records will have very little effect on anything (except moving the data to your browser, but that's a whole separate thing), and this isn't expensive enough to be very dangerous.

My brain is attempting to leave for the weekend.

I think mine wandered off a while ago....

@ewommack
Copy link

Yes, and queries that work with small numbers of records will fail with more records, and these could be used maliciously.

So this is the problem? We've got too many things turned on to view when the initial search page comes up?
Does the same problem develop if you've already done a search, and then you select more columns to show for the data in the search result page?

@dustymc
Copy link
Contributor Author

dustymc commented Nov 16, 2022

I'm struggling to incorporate the "legacy" system (ssrch_field_doc) into #2745, I'm going to build something new that includes a cost factor and see where that leads

@dustymc
Copy link
Contributor Author

dustymc commented Jan 20, 2023

done

@dustymc dustymc closed this as completed Jan 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement I think this would make Arctos even awesomer! Infrastructure-limited Issue which could be resolved, or more easily resolved, with additional computational power
Projects
None yet
Development

No branches or pull requests

3 participants