Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

limited effect of caching #3

Closed
AladW opened this issue Apr 6, 2022 · 5 comments
Closed

limited effect of caching #3

AladW opened this issue Apr 6, 2022 · 5 comments

Comments

@AladW
Copy link

AladW commented Apr 6, 2022

According to the README, some kind of caching is implemented for package data. My expectation would be

  • a small delay at the beginning (to retrieve the AUR metadata dumps)
  • none when navigating between packages or search terms.

What I get instead is:

  • a delay for each package on the first visit (this includes when switching terms, e.g. foo -> bar -> foo gives new delays on "foo" matches)
  • none when navigating between visited packages on the current search term
@moson-mo
Copy link
Owner

moson-mo commented Apr 6, 2022

The search does not fetch all package data that is visible in the details panel. With the default config, the most performing type=suggest API call is used to get a list of packages from the AUR.

Now when you navigate through the search result (package) list, there is a delay of 500ms applied before further information is being requested (this is only valid for AUR packages) so that there no "unnecessary" API calls to fetch package information while you quickly navigate through the list... -> This delay is 500ms by default and can be changed in the settings "AUR search delay (ms)"

You might ask: why don't you use the type=search API when doing the search and immediately get the information for all packages instead of querying the package data individually -> Because it only contains very limited information. Dependencies, Provides, Conflicts, etc. are not included), hence the search and showing of additional information (while navigating through the results) is separated.

Caching:

Once you performed a search or navigate to another package to show it's details, this data is stored in a cache.
Hence when you navigate back to a package that you have already looked at before, it retrieves the data from the cache instead of performing the AUR lookup again.
The time how long this data should remain in the cache, before it is being requested from the /rpc endpoint again, can be configured in the settings Cache expiry (m)

@AladW
Copy link
Author

AladW commented Apr 6, 2022 via email

@moson-mo
Copy link
Owner

moson-mo commented Apr 7, 2022

Good point. I more or less want to request / transfer information only if needed.
But immediately fetching also the "info" part for all search results could make sense.
Could be that making one "big request" instead of multiple small ones if more efficient at the end. I'll play around with that...

Regarding the metadata: I never really liked that approach because as you mentioned you'd transfer 9 MB of data. With a reasonable fast machine and internet connection that process (download, decompress and load data) takes about 2 to 3 seconds already.

That's quite a waste of resources if you just intend to do a quick search for a single package or so (where you just need a couple of KB of data).

Actually Manjaro does it that way with their "pamac". Once that got implemented, within a week or so the traffic limit of the AUR webserver was reached. Since then, they self-host packages-meta-ext-v1.json.gz 😉
Not that pacseek will ever have such a big user-base as pamac though.

IMHO the /rpc endpoint is the way to go where you just fetch what you need.
With pacseek's default config it is using my own, self-hosted in-memory implementation of the /rpc endpoint. So it won't stress the AUR web/database server with those lookups.

moson-mo added a commit that referenced this issue Apr 7, 2022
Instead of fetching package info data when a package is highted,
the lookup is now being performed immediately after searching.
Once the search result is returned, another request is being made to
retrieve data for ALL packages returned by the search

Technically this seems to be the better approach than firing off several
search queries for individual packages.

See issue #3 for more information on this topic.
@moson-mo
Copy link
Owner

moson-mo commented Apr 7, 2022

I ran a couple of benchmarks and it turns out that fetching the info for multiple packages at once seems to be the better option.
Doing the benchmarks locally (without network transfer) showed:

Fetching information for 100 packages vs. 1 single package is about ~10 times slower (1800 requests/s vs 18000)
10 vs 1 is 3x slower (6000 r/s vs 18000)

However, server to server (2.5 GBit/s) with reverse proxy and TLS encryption, 100 vs. 1 is just about 2 times slower.

So I've now changed the implementation to immediately run the info call with all packages returned by the suggest/search one and throw these things into the cache. With that, there are only 2 calls performed per search no matter how much you navigate through the result list...

Most likely I'll push another release today with the changes.

@AladW
Copy link
Author

AladW commented Apr 7, 2022 via email

@AladW AladW closed this as completed Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants