Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBENGINE v2 - improvements part 8 #14319

Merged
merged 23 commits into from
Jan 24, 2023
Merged

Conversation

ktsaou
Copy link
Member

@ktsaou ktsaou commented Jan 24, 2023

Minimize use of malloc() for pages

A netdata parent receives 1M points per second from about 200 Netdata children.

Every a few minutes, this happens (stops digesting more data for about 5 seconds, then it catches up):

image

The reason is this (about 60k new pages added to main cache, more than 150k - probably smaller pages - had to be evicted to make room):

image

We traced the problem to malloc(). A lot of data had to be allocated and deallocated and this created this effect.

This PR keeps the deallocated pages as buffers and when new pages are needed uses the cached ones, to avoid allocations.

3 page sizes are cached, the ones needed by the storage tiers.

Every 1 second, a cleanup is running that removes 1 page per size from the cache. So, the system tries to cleanup all the time, but as the cycle of allocation and deallocation repeats the system stabilizes automatically at the really needed size for the cache.


Other improvements

  • database rotation happens at startup when needed, so that retention is accurate at startup
  • when queries switch plan (tiers), the plans now overlap so that there will be no gaps at the query
  • when parallel db rotations run, make sure the first time of the metric can only be moved forward
  • replication ensures future timestamps are not propagated from the child to the parent and to the parent from the child
  • replication sends chart states atomically, while the data collection lock is held
  • no more data loss on shutdown - now all pages are flushed before exiting
  • retention recalculation at database rotation had a bug and didn't update the retention correctly (actually it didn't do anything). Fixed it.
  • added negative cachine in main cache for invalid pages found on disk and query gaps after journal v2 scan, so that repeated queries on timeframes with gaps will run faster, bypassing dbengine query planning steps 2 and 3.

…ion receivers ignores chart and dimension states when rbegin is also ignored
@github-actions github-actions bot added area/collectors Everything related to data collection collectors/plugins.d labels Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants