Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance when sorting tags with stats by visits or short URLs count #1346

Closed
acelaya opened this issue Jan 23, 2022 · 5 comments
Closed

Comments

@acelaya
Copy link
Member

acelaya commented Jan 23, 2022

Due to the complexity required on the underneath query used to list tags with stats, te performance when sorting by visits or short URLs count is not good.

A way to improve this would be to add some short-lived query result cache, which could be used to cache the result of a sepcific subset (limit+offset) with a specific set of conditions (API kye or not, search term or not, etc).

However, this could result in eventually inconsistent data, until the cache expires. Some ways to mitigate this:

  • Allow to opt-in to this feature, so that there's no cache by default, and only when the amount of data grows then you can enable it knowing the side effects.
    • Users will have to enable redis or APCu.
  • Implement some cache invalidation that clears it when a short URL is created or edited. Visits are not that important because are not created by admins, so the inconsistency might pass unnoticed until cache expires naturally.
    • Might be harder to achieve than it seems, as the cache key would need to include the tags it affects, and invalidation would have to apply for potentially multiple cache entries.
  • Both of the above.
@acelaya acelaya added this to the 3.1.0 milestone Jan 23, 2022
@acelaya
Copy link
Member Author

acelaya commented Jan 23, 2022

Solutions already tried and discarded:

  • Re-think query, using a different base table: Not possible, as when need to get all tags, always, so tags table has to be the main one.
  • Use a view: Does not improve performance, as it still runs the query every time.
  • Use an aggregate table that gets updated periodically (pseudo-cache):
    • It would be very complex to maintain, as values can change based on new visits, new short URLs, deleted short URLs and edited short URLs.
    • Depending on the API key permissions, the aggregate counts are different, and it's not possible to foresee all possible combinations to have them pre-calculated.
    • The migration filling-up this table might take long to run.
  • Try to join instead of left-join: Performance is muuuuch lower.

Solutions not fully verified:

  • Use a cross-table composite index: Not yet verified if all supported DB engines allow it.

@acelaya acelaya modified the milestones: 3.1.0, 3.2.0 Apr 23, 2022
@acelaya acelaya modified the milestones: 3.2.0, 3.3.0 Jul 28, 2022
@acelaya acelaya added this to Shlink Aug 8, 2022
@acelaya acelaya moved this to Todo 🗒️ in Shlink Aug 8, 2022
@acelaya acelaya removed this from the 3.3.0 milestone Sep 9, 2022
@acelaya acelaya removed this from Shlink Sep 9, 2022
@acelaya acelaya added this to Shlink Sep 25, 2022
@acelaya acelaya moved this to Todo 🗒️ in Shlink Sep 25, 2022
@acelaya acelaya removed the status in Shlink Sep 25, 2022
@acelaya
Copy link
Member Author

acelaya commented Feb 12, 2023

The approach used for the query has been recently changed. Now there are sub-queries calculating the visits count (and also the non-bot visits count).

Maybe it's possible to apply limit/offset to the new sub-queries, as it's done with the tags sub-query, and see if that improves performance.

In case it does, consider moving the calculation of the short URLs count also to a sub-query to use the same approach.

@acelaya acelaya modified the milestones: 3.6.0, 3.5.3 Feb 12, 2023
@acelaya acelaya moved this to Todo in Shlink Feb 16, 2023
@acelaya acelaya modified the milestones: 3.5.3, 3.5.4 Mar 31, 2023
@acelaya acelaya removed this from the 3.5.4 milestone Apr 11, 2023
@acelaya acelaya removed the status in Shlink Apr 11, 2023
@acelaya acelaya added this to the 3.6.0 milestone Apr 16, 2023
@acelaya acelaya moved this to Todo in Shlink Apr 16, 2023
@acelaya acelaya modified the milestones: 3.6.0, 3.7.0 May 21, 2023
@acelaya acelaya removed this from Shlink May 21, 2023
@acelaya acelaya added this to Shlink Jul 31, 2023
@acelaya
Copy link
Member Author

acelaya commented Feb 25, 2024

New idea: Track amount of visits and short URLs per tag in a separate table which has a value that's incremented every time a new visit happens or a short URL is created.

That would remove the need to do aggregates at runtime (COUNT(...)), and should also drastically simplify the queries and remove the need to use sub-queries.

It has a big consideration though. In order to guarantee data consistency, those tables need to lock while being updated, and they will have to be updated on every visit. Servers with a lot of visits will get impacted on performance.

Ways to mitigate this:

  • Have separate tables to count the amount of short URLs and the amount of visits, as the latter will probably update much more frequently.
  • Spread the total value in multiple rows per tag, instead of just one. When the value needs to be incremented, we pick one of them randomly.
    That will significantly reduce the impact of database locking, and then we can get the actual total by doing an aggregate SUM() and all the partial totals for all rows that belong to the same tag.

EDIT: By reading #1346 (comment), I realized I already evaluated this option, but perhaps #2036 could be another alternative, or present some improvement.

@acelaya acelaya added this to the 4.1.0 milestone Feb 25, 2024
@acelaya acelaya changed the title Improve performance when sorting short tags with stats by visits or short URLs count Improve performance when sorting tags with stats by visits or short URLs count Feb 25, 2024
@acelaya
Copy link
Member Author

acelaya commented Mar 28, 2024

After implementing #2074, the overall performance has been drastically improved with #2078.

However, it still takes almost as much time regardless of how many tags are fetched. With the improvements of the new visits count table, the overall query could/should be reworked in order to address this.

@acelaya acelaya moved this to In Progress in Shlink Mar 31, 2024
@acelaya
Copy link
Member Author

acelaya commented Apr 1, 2024

I'm going to close this as completed, as the most outstanding problem, very slow queries, should now be fixed.

In future, we can give this another round of thinking, and try to simplify the query, in an attempt to make the performance even better when fetching only a subset.

@acelaya acelaya closed this as completed Apr 1, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Shlink Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

1 participant