-
-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance when sorting tags with stats by visits or short URLs count #1346
Comments
Solutions already tried and discarded:
Solutions not fully verified:
|
The approach used for the query has been recently changed. Now there are sub-queries calculating the visits count (and also the non-bot visits count). Maybe it's possible to apply limit/offset to the new sub-queries, as it's done with the tags sub-query, and see if that improves performance. In case it does, consider moving the calculation of the short URLs count also to a sub-query to use the same approach. |
New idea: Track amount of visits and short URLs per tag in a separate table which has a value that's incremented every time a new visit happens or a short URL is created. That would remove the need to do aggregates at runtime ( It has a big consideration though. In order to guarantee data consistency, those tables need to lock while being updated, and they will have to be updated on every visit. Servers with a lot of visits will get impacted on performance. Ways to mitigate this:
EDIT: By reading #1346 (comment), I realized I already evaluated this option, but perhaps #2036 could be another alternative, or present some improvement. |
I'm going to close this as completed, as the most outstanding problem, very slow queries, should now be fixed. In future, we can give this another round of thinking, and try to simplify the query, in an attempt to make the performance even better when fetching only a subset. |
Due to the complexity required on the underneath query used to list tags with stats, te performance when sorting by visits or short URLs count is not good.
A way to improve this would be to add some short-lived query result cache, which could be used to cache the result of a sepcific subset (limit+offset) with a specific set of conditions (API kye or not, search term or not, etc).
However, this could result in eventually inconsistent data, until the cache expires. Some ways to mitigate this:
The text was updated successfully, but these errors were encountered: