Improve performance when sorting tags with stats by visits or short URLs count #1346

acelaya · 2022-01-23T09:19:09Z

Due to the complexity required on the underneath query used to list tags with stats, te performance when sorting by visits or short URLs count is not good.

A way to improve this would be to add some short-lived query result cache, which could be used to cache the result of a sepcific subset (limit+offset) with a specific set of conditions (API kye or not, search term or not, etc).

However, this could result in eventually inconsistent data, until the cache expires. Some ways to mitigate this:

Allow to opt-in to this feature, so that there's no cache by default, and only when the amount of data grows then you can enable it knowing the side effects.
- Users will have to enable redis or APCu.
Implement some cache invalidation that clears it when a short URL is created or edited. Visits are not that important because are not created by admins, so the inconsistency might pass unnoticed until cache expires naturally.
- Might be harder to achieve than it seems, as the cache key would need to include the tags it affects, and invalidation would have to apply for potentially multiple cache entries.
Both of the above.

acelaya · 2022-01-23T09:40:41Z

Solutions already tried and discarded:

Re-think query, using a different base table: Not possible, as when need to get all tags, always, so tags table has to be the main one.
Use a view: Does not improve performance, as it still runs the query every time.
Use an aggregate table that gets updated periodically (pseudo-cache):
- It would be very complex to maintain, as values can change based on new visits, new short URLs, deleted short URLs and edited short URLs.
- Depending on the API key permissions, the aggregate counts are different, and it's not possible to foresee all possible combinations to have them pre-calculated.
- The migration filling-up this table might take long to run.
Try to join instead of left-join: Performance is muuuuch lower.

Solutions not fully verified:

Use a cross-table composite index: Not yet verified if all supported DB engines allow it.

acelaya · 2023-02-12T18:57:08Z

The approach used for the query has been recently changed. Now there are sub-queries calculating the visits count (and also the non-bot visits count).

Maybe it's possible to apply limit/offset to the new sub-queries, as it's done with the tags sub-query, and see if that improves performance.

In case it does, consider moving the calculation of the short URLs count also to a sub-query to use the same approach.

acelaya · 2024-02-25T12:19:33Z

New idea: Track amount of visits and short URLs per tag in a separate table which has a value that's incremented every time a new visit happens or a short URL is created.

That would remove the need to do aggregates at runtime (COUNT(...)), and should also drastically simplify the queries and remove the need to use sub-queries.

It has a big consideration though. In order to guarantee data consistency, those tables need to lock while being updated, and they will have to be updated on every visit. Servers with a lot of visits will get impacted on performance.

Ways to mitigate this:

Have separate tables to count the amount of short URLs and the amount of visits, as the latter will probably update much more frequently.
Spread the total value in multiple rows per tag, instead of just one. When the value needs to be incremented, we pick one of them randomly.
That will significantly reduce the impact of database locking, and then we can get the actual total by doing an aggregate SUM() and all the partial totals for all rows that belong to the same tag.

EDIT: By reading #1346 (comment), I realized I already evaluated this option, but perhaps #2036 could be another alternative, or present some improvement.

acelaya · 2024-03-28T18:24:45Z

After implementing #2074, the overall performance has been drastically improved with #2078.

However, it still takes almost as much time regardless of how many tags are fetched. With the improvements of the new visits count table, the overall query could/should be reworked in order to address this.

acelaya · 2024-04-01T08:38:20Z

I'm going to close this as completed, as the most outstanding problem, very slow queries, should now be fixed.

In future, we can give this another round of thinking, and try to simplify the query, in an attempt to make the performance even better when fetching only a subset.

acelaya added bug experimental labels Jan 23, 2022

acelaya added this to the 3.1.0 milestone Jan 23, 2022

acelaya modified the milestones: 3.1.0, 3.2.0 Apr 23, 2022

acelaya modified the milestones: 3.2.0, 3.3.0 Jul 28, 2022

acelaya added this to Shlink Aug 8, 2022

acelaya moved this to Todo 🗒️ in Shlink Aug 8, 2022

acelaya removed this from the 3.3.0 milestone Sep 9, 2022

acelaya removed this from Shlink Sep 9, 2022

acelaya added this to Shlink Sep 25, 2022

acelaya moved this to Todo 🗒️ in Shlink Sep 25, 2022

acelaya removed the status in Shlink Sep 25, 2022

acelaya modified the milestones: 3.6.0, 3.5.3 Feb 12, 2023

acelaya moved this to Todo in Shlink Feb 16, 2023

acelaya modified the milestones: 3.5.3, 3.5.4 Mar 31, 2023

acelaya removed this from the 3.5.4 milestone Apr 11, 2023

acelaya removed the status in Shlink Apr 11, 2023

acelaya added this to the 3.6.0 milestone Apr 16, 2023

acelaya moved this to Todo in Shlink Apr 16, 2023

acelaya modified the milestones: 3.6.0, 3.7.0 May 21, 2023

acelaya removed this from Shlink May 21, 2023

acelaya added this to Shlink Jul 31, 2023

acelaya moved this to Todo in Shlink Jul 31, 2023

acelaya removed the status in Shlink Oct 20, 2023

acelaya removed this from the 3.7.0 milestone Oct 20, 2023

acelaya mentioned this issue Jan 25, 2022

Use server-side pagination/filtering/ordering for tags when using list mode and consuming Shlink 3.0.0 shlinkio/shlink-web-component#238

Open

acelaya added this to the 4.1.0 milestone Feb 25, 2024

acelaya changed the title ~~Improve performance when sorting short tags with stats by visits or short URLs count~~ Improve performance when sorting tags with stats by visits or short URLs count Feb 25, 2024

acelaya mentioned this issue Feb 29, 2024

Track amount of visits per short URL so that we don't have to do COUNT aggregates at runtime #2036

Closed

This was referenced Mar 28, 2024

Feature/slotted counts #2074

Merged

Improve tags stats performance by using the new short_url_visits_counts table #2078

Merged

acelaya moved this to In Progress in Shlink Mar 31, 2024

acelaya closed this as completed Apr 1, 2024

github-project-automation bot moved this from In Progress to Done in Shlink Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance when sorting tags with stats by visits or short URLs count #1346

Improve performance when sorting tags with stats by visits or short URLs count #1346

acelaya commented Jan 23, 2022 •

edited

Loading

acelaya commented Jan 23, 2022

acelaya commented Feb 12, 2023

acelaya commented Feb 25, 2024 •

edited

Loading

acelaya commented Mar 28, 2024 •

edited

Loading

acelaya commented Apr 1, 2024

Improve performance when sorting tags with stats by visits or short URLs count #1346

Improve performance when sorting tags with stats by visits or short URLs count #1346

Comments

acelaya commented Jan 23, 2022 • edited Loading

acelaya commented Jan 23, 2022

acelaya commented Feb 12, 2023

acelaya commented Feb 25, 2024 • edited Loading

acelaya commented Mar 28, 2024 • edited Loading

acelaya commented Apr 1, 2024

acelaya commented Jan 23, 2022 •

edited

Loading

acelaya commented Feb 25, 2024 •

edited

Loading

acelaya commented Mar 28, 2024 •

edited

Loading