Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

analytics - improve performance on big logs #556

Closed
fvanderbiest opened this issue Jan 8, 2014 · 8 comments
Closed

analytics - improve performance on big logs #556

fvanderbiest opened this issue Jan 8, 2014 · 8 comments
Assignees
Milestone

Comments

@fvanderbiest
Copy link
Member

When the ogc-statistics database gets huge, analytics takes too much time to display the stats.

@fvanderbiest
Copy link
Member Author

The culprit is this seq scan:

->  Seq Scan on ogc_services_log  (cost=0.00..1437477.65 rows=982 width=30) (actual time=478.463..12177.526 rows=1163185 loops=1)

...because we're not using the index with this where clause:
https://github.com/georchestra/georchestra/blob/14.06/analytics/src/main/java/org/georchestra/analytics/model/OGCStatsModel.java#L17

Better use

date >= '2014-08-01 0:0'::timestamp and date < '2014-09-01 0:0'::timestamp

which enables a far speedier index scan.

@fvanderbiest
Copy link
Member Author

Monthly requests optimized with f67a9a1

For the global stats, we might need to consolidate the results in a separate table.

@pmauduit
Copy link
Member

Or maybe a calculated table maintained regularly by the analytics apps ? (just some random thoughts)

@fvanderbiest fvanderbiest removed this from the 14.12 milestone Nov 20, 2014
@fvanderbiest fvanderbiest assigned Vampouille and unassigned pmauduit Dec 1, 2015
@fvanderbiest
Copy link
Member Author

@Vampouille
Copy link
Contributor

I saw several indexes on log table, maybe we can speed up insertion by removing those indexes (for example : layer, service) without slowing down "select" queries

@fvanderbiest
Copy link
Member Author

+1, thanks

@Vampouille
Copy link
Contributor

I just finished some tests with 6M records on my desktop. Query duration changes from 2500ms to 80ms with the following modifications :

  • create a hierarchy of tables (inherited postgres tables) on date field (partitions)
  • remove execution of prepared statements and replace with native query : executeQuery()
  • I also remove all indexes

I think I will commit those changes in the beginning of next week.

@fvanderbiest
Copy link
Member Author

Mostly done with ad0ba48 but not yet merged into master (related PR is #1141)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants