Further memory investigation #287

martinsumner · 2019-07-12T12:06:36Z

Had a Riak instance in production where there were a lot of leveled_cdb instances that had a large amount of binary memory referenced. This could be cleared through garbage_collect() but garbage collection didn't seem to be trying to do this.

The binaries referenced may have been in the active journal. This was after a riak restart, so perhaps related to scan on startup?

martinsumner · 2019-07-17T09:19:48Z

There are two directions to look at this problem - the (over) use of memory by the database, and the (under) use of memory by the OS page cache.

Firstly the situation where the problem arose was where the database is left idle for a long time (days), and then a volume test is started. At that stage there is an unexpectedly high proportion of the memory taken by Riak, and within that the majority seems to be binary references in leveled_cdb processes that are actually ready for garbage collection.

I've struggled to find concise references to how GC is triggered on a process (and of course the BEAM memory management also changes significantly between R16 and OTP20, so any reference may not be relevant due to evolution). However, there are references to idle processes not triggering garbage collection, and that idles processes should be hibernated because of this.

It has been guessed that scan_over_file in CDB has a high risk of producing lots of binary GC'able binary references, but this hasn't been proven at a non-trivial scale in tests. GC may be triggered at the end of a scan to eliminate this as a possibility, but it is difficult to be certain that this is actually going to make a real difference.

In terms of the under use of memory by the OS page cache, then fadvise should be our friend. When we start up the database, we would normally want the ledger to be in the page cache - so an option is to fadvise this as willneed on startup. However, this may have an impact on startup times - as the fadvise may involve a sync read (https://stackoverflow.com/questions/4936520/posix-fadvisewillneed-makes-io-slower). Will this punitively impact startup times?

Note that the database was left idle after startup - hence the assumption that adjusting startup behaviour may have a positive impact.

martinsumner · 2019-07-17T09:33:25Z

Note, that in volume tests where we start and empty store, and then load it as part of the volume test - there are no issues with memory allocation. Riak will take a minimal memory footprint in comparison to the page cache.

If this is specifically an issue with non-GC when idle - is hibernate a better answer? Should leveled_cdb files hibernate after an inactivity timeout?

martinsumner · 2019-07-17T11:43:28Z

If there is to be a page cache load via fadvise on startup - this needs to be configurable, as it won't necessarily be of help when leveled is used as an AAE backend

martinsumner · 2019-07-17T12:55:48Z

In terms of the under use of memory by the OS page cache, then fadvise should be our friend. When we start up the database, we would normally want the ledger to be in the page cache - so an option is to fadvise this as willneed on startup. However, this may have an impact on startup times - as the fadvise may involve a sync read (https://stackoverflow.com/questions/4936520/posix-fadvisewillneed-makes-io-slower). Will this punitively impact startup times?

Testing this it makes negligible difference in startup times. With fadvise there is higher disk utilisation in the minutes following startup - so it looks like this activity is sent to the background.

However, this is when restarting, when the pages are likely to be already cached.

Doing this after a reboot, the node startup time as 30% slower, but there was a lot more disk activity post-startup (about 3 x as much disk I/O).

martinsumner · 2019-07-22T09:50:37Z

When testing in an NHS pre-production environment, the following was observed:

Following startup (with a full data set) and prior to sending new load, memory usage is low;
Following a load test, memory usage is very high, much higher than has been observed during basho_bench tests.

The additional memory is all on the binary heap (not the process heap), and the files with the largest number of binary memory referenced are SST files. Further investigation reveals the binary references are related to the cache of header information (an array of binaries).

This cache is built lazily after startup - by splitting out the header binary from the overall block when the block is first loaded. However, there is no binary copy - so this retains a reference to the whole block. Hence the unexpected high volume of binary memory referenced in the nodes following startup.

See 5bef21d where a unit test has been added to demonstrate this (and fixed using binary:copy whenever a header binary is added to the array).

martinsumner · 2019-07-24T11:25:56Z

#288

martinsumner · 2019-07-24T11:26:20Z

Further to PR. To close issue the level at which page cache is pre-loaded needs to be configurable.

martinsumner · 2019-08-06T09:05:57Z

#292

martinsumner closed this as completed Aug 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Further memory investigation #287

Further memory investigation #287

martinsumner commented Jul 12, 2019

martinsumner commented Jul 17, 2019 •

edited

Loading

martinsumner commented Jul 17, 2019 •

edited

Loading

martinsumner commented Jul 17, 2019

martinsumner commented Jul 17, 2019 •

edited

Loading

martinsumner commented Jul 22, 2019

martinsumner commented Jul 24, 2019

martinsumner commented Jul 24, 2019

martinsumner commented Aug 6, 2019

Further memory investigation #287

Further memory investigation #287

Comments

martinsumner commented Jul 12, 2019

martinsumner commented Jul 17, 2019 • edited Loading

martinsumner commented Jul 17, 2019 • edited Loading

martinsumner commented Jul 17, 2019

martinsumner commented Jul 17, 2019 • edited Loading

martinsumner commented Jul 22, 2019

martinsumner commented Jul 24, 2019

martinsumner commented Jul 24, 2019

martinsumner commented Aug 6, 2019

martinsumner commented Jul 17, 2019 •

edited

Loading

martinsumner commented Jul 17, 2019 •

edited

Loading

martinsumner commented Jul 17, 2019 •

edited

Loading