Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement: object database query caching? #1100

Open
2 tasks
abitmore opened this issue Jun 26, 2018 · 3 comments
Open
2 tasks

Performance improvement: object database query caching? #1100

abitmore opened this issue Jun 26, 2018 · 3 comments
Labels
1a Epic High level concept to be addressed. Description should contain a list referencing child User Stories 2a Discussion Needed Prompt for team to discuss at next stand up. 6 Performance Impacts flag identifying system/user efficiency, performance, etc. performance

Comments

@abitmore
Copy link
Member

Most of data stored in the object database is in a red-black tree. Currently, every query searches from the root to the leaf, the complexity is O(log(n)). Generally, some data is much hotter than others, so it makes sense to cache the hottest results (pointers or addresses) to avoid searching from root to leaf for same data again and again.

Things to be done:

  • profiling / query pattern analysis : find out what's being queried the most, whether a cache would help
  • caching design

Thoughts?

@abitmore abitmore added performance 1a Epic High level concept to be addressed. Description should contain a list referencing child User Stories 2a Discussion Needed Prompt for team to discuss at next stand up. 6 Performance Impacts flag identifying system/user efficiency, performance, etc. labels Jun 26, 2018
@jmjatlanta
Copy link
Contributor

Here's a somewhat dated article about cache design. While the article brings out some high-level architecture ideas and points to consider, the code shows its age. ipcc99.pdf

Things to think about: Stale due to age, stale due to an operation, memory size, lookups of cached data, cost of decision to look in database vs cache.

How smart do we make it? Does it learn what to cache, or do we tell it what to cache?

@abitmore
Copy link
Member Author

abitmore commented Jul 5, 2018

I guess LRU cache would work in most cases.

The question is whether now is the correct time to play with pointers, whether the gain worth the efforts/risks. Similar discussion in #1095 .

@clockworkgr
Copy link
Member

clockworkgr commented Jul 25, 2018

I may be completely off the mark here seeing as I'm not a C++ dev and I'm going purely on documentation I've read. So if completely wrong just let me know.

What about boost's notifying indices? It seems to me that it would allow for perfect cache invalidation. I have no idea whether it's implementable or what the performance hit (if any) would be. I just know that we use boost containers and according to the docs, notifying indices provide callbacks on object modification.

Combined with a fixed cache size and cache object removal based on last-access time it seems as if it would self-optimise.

UPDATE: Apologies...appears notifying indices are planned but not included in boost yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1a Epic High level concept to be addressed. Description should contain a list referencing child User Stories 2a Discussion Needed Prompt for team to discuss at next stand up. 6 Performance Impacts flag identifying system/user efficiency, performance, etc. performance
Projects
None yet
Development

No branches or pull requests

3 participants