limited effect of caching #3

AladW · 2022-04-06T18:34:44Z

According to the README, some kind of caching is implemented for package data. My expectation would be

a small delay at the beginning (to retrieve the AUR metadata dumps)
none when navigating between packages or search terms.

What I get instead is:

a delay for each package on the first visit (this includes when switching terms, e.g. foo -> bar -> foo gives new delays on "foo" matches)
none when navigating between visited packages on the current search term

moson-mo · 2022-04-06T19:14:22Z

The search does not fetch all package data that is visible in the details panel. With the default config, the most performing type=suggest API call is used to get a list of packages from the AUR.

Now when you navigate through the search result (package) list, there is a delay of 500ms applied before further information is being requested (this is only valid for AUR packages) so that there no "unnecessary" API calls to fetch package information while you quickly navigate through the list... -> This delay is 500ms by default and can be changed in the settings "AUR search delay (ms)"

You might ask: why don't you use the type=search API when doing the search and immediately get the information for all packages instead of querying the package data individually -> Because it only contains very limited information. Dependencies, Provides, Conflicts, etc. are not included), hence the search and showing of additional information (while navigating through the results) is separated.

Caching:

Once you performed a search or navigate to another package to show it's details, this data is stored in a cache.
Hence when you navigate back to a package that you have already looked at before, it retrieves the data from the cache instead of performing the AUR lookup again.
The time how long this data should remain in the cache, before it is being requested from the /rpc endpoint again, can be configured in the settings Cache expiry (m)

AladW · 2022-04-06T20:38:12Z

Thanks for providing details. Regarding type=search: then why not use type=info in combination with type=suggest? You can put up to 5000 terms in one POST info request, and have all metadata available. About the caching, I guess my point is that with the metadata archives available, having an instantaneous lookup is possible in every case. And the required metadata archive can either be retrieved at application startup, or in a given interval. Though with the size of the archive (9mb compressed), the on-demand approach using requests might be preferable in some cases.

…

On Wed, 06 Apr 2022 12:14:33 -0700 moson-mo ***@***.***> wrote: The search does not fetch all package data that is visible in the details panel. With the default config, the most performing `type=suggest` API call is used to get a list of packages from the AUR. Now when you navigate through the search result (package) list, there is a delay of 500ms applied before further information is being requested (this is only valid for AUR packages) so that there no "unnecessary" API calls to fetch package information while you quickly navigate through the list... -> This delay is 500ms by default and can be changed in the settings "AUR search delay (ms)" You might ask: why don't you use the `type=search` API when doing the search and immediately get the information for all packages instead of querying the package data individually -> Because it only contains very limited information. Dependencies, Provides, Conflicts, etc. are not included) Caching: Once you performed a search or navigate to another package to show it's details, this data is stored in a cache. Hence when you navigate back to a package that you have already looked at before, it retrieves the data from the cache instead of performing the AUR lookup again. The time how long this data should remain in the cache, before it is being requested from the /rpc endpoint again, can be configured in the settings `Cache expiry (m)` -- Reply to this email directly or view it on GitHub: #3 (comment) You are receiving this because you authored the thread. Message ID: ***@***.***>

-- alad ***@***.***>

moson-mo · 2022-04-07T07:30:21Z

Good point. I more or less want to request / transfer information only if needed.
But immediately fetching also the "info" part for all search results could make sense.
Could be that making one "big request" instead of multiple small ones if more efficient at the end. I'll play around with that...

Regarding the metadata: I never really liked that approach because as you mentioned you'd transfer 9 MB of data. With a reasonable fast machine and internet connection that process (download, decompress and load data) takes about 2 to 3 seconds already.

That's quite a waste of resources if you just intend to do a quick search for a single package or so (where you just need a couple of KB of data).

Actually Manjaro does it that way with their "pamac". Once that got implemented, within a week or so the traffic limit of the AUR webserver was reached. Since then, they self-host packages-meta-ext-v1.json.gz 😉
Not that pacseek will ever have such a big user-base as pamac though.

IMHO the /rpc endpoint is the way to go where you just fetch what you need.
With pacseek's default config it is using my own, self-hosted in-memory implementation of the /rpc endpoint. So it won't stress the AUR web/database server with those lookups.

Instead of fetching package info data when a package is highted, the lookup is now being performed immediately after searching. Once the search result is returned, another request is being made to retrieve data for ALL packages returned by the search Technically this seems to be the better approach than firing off several search queries for individual packages. See issue #3 for more information on this topic.

moson-mo · 2022-04-07T12:23:18Z

I ran a couple of benchmarks and it turns out that fetching the info for multiple packages at once seems to be the better option.
Doing the benchmarks locally (without network transfer) showed:

Fetching information for 100 packages vs. 1 single package is about ~10 times slower (1800 requests/s vs 18000)
10 vs 1 is 3x slower (6000 r/s vs 18000)

However, server to server (2.5 GBit/s) with reverse proxy and TLS encryption, 100 vs. 1 is just about 2 times slower.

So I've now changed the implementation to immediately run the info call with all packages returned by the suggest/search one and throw these things into the cache. With that, there are only 2 calls performed per search no matter how much you navigate through the result list...

Most likely I'll push another release today with the changes.

AladW · 2022-04-07T14:52:29Z

Awesome, thanks!

…

On Thu, 07 Apr 2022 05:23:30 -0700 moson-mo ***@***.***> wrote: I ran a couple of benchmarks and it turns out that fetching the info for multiple packages at once seems to be the better option. Doing the benchmarks locally (without network transfer) showed: Fetching information for 100 packages vs. 1 single package is about ~10 times slower (1800 requests/s vs 18000) 10 vs 1 is 3x slower (6000 r/s vs 18000) However, server to server (2.5 GBit/s) with reverse proxy and TLS encryption, 100 vs. 1 is just about 2 times slower. So I've now changed the implementation to immediately run the info call with all packages returned by the suggest/search one and throw these things into the cache. With that, there are only 2 calls performed per search no matter how much you navigate through the result list... Most likely I'll push another release today with the changes. -- Reply to this email directly or view it on GitHub: #3 (comment) You are receiving this because you authored the thread. Message ID: ***@***.***>

-- alad ***@***.***>

AladW closed this as completed Apr 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

limited effect of caching #3

limited effect of caching #3

AladW commented Apr 6, 2022 •

edited

Loading

moson-mo commented Apr 6, 2022 •

edited

Loading

AladW commented Apr 6, 2022 via email

moson-mo commented Apr 7, 2022

moson-mo commented Apr 7, 2022

AladW commented Apr 7, 2022 via email

limited effect of caching #3

limited effect of caching #3

Comments

AladW commented Apr 6, 2022 • edited Loading

moson-mo commented Apr 6, 2022 • edited Loading

AladW commented Apr 6, 2022 via email

moson-mo commented Apr 7, 2022

moson-mo commented Apr 7, 2022

AladW commented Apr 7, 2022 via email

AladW commented Apr 6, 2022 •

edited

Loading

moson-mo commented Apr 6, 2022 •

edited

Loading