-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Further memory investigation #287
Comments
There are two directions to look at this problem - the (over) use of memory by the database, and the (under) use of memory by the OS page cache. Firstly the situation where the problem arose was where the database is left idle for a long time (days), and then a volume test is started. At that stage there is an unexpectedly high proportion of the memory taken by Riak, and within that the majority seems to be binary references in leveled_cdb processes that are actually ready for garbage collection. I've struggled to find concise references to how GC is triggered on a process (and of course the BEAM memory management also changes significantly between R16 and OTP20, so any reference may not be relevant due to evolution). However, there are references to idle processes not triggering garbage collection, and that idles processes should be hibernated because of this. It has been guessed that scan_over_file in CDB has a high risk of producing lots of binary GC'able binary references, but this hasn't been proven at a non-trivial scale in tests. GC may be triggered at the end of a scan to eliminate this as a possibility, but it is difficult to be certain that this is actually going to make a real difference. In terms of the under use of memory by the OS page cache, then fadvise should be our friend. When we start up the database, we would normally want the ledger to be in the page cache - so an option is to fadvise this as Note that the database was left idle after startup - hence the assumption that adjusting startup behaviour may have a positive impact. |
Note, that in volume tests where we start and empty store, and then load it as part of the volume test - there are no issues with memory allocation. Riak will take a minimal memory footprint in comparison to the page cache. If this is specifically an issue with non-GC when idle - is hibernate a better answer? Should leveled_cdb files hibernate after an inactivity timeout? |
If there is to be a page cache load via fadvise on startup - this needs to be configurable, as it won't necessarily be of help when leveled is used as an AAE backend |
Testing this it makes negligible difference in startup times. With fadvise there is higher disk utilisation in the minutes following startup - so it looks like this activity is sent to the background. However, this is when restarting, when the pages are likely to be already cached. Doing this after a reboot, the node startup time as 30% slower, but there was a lot more disk activity post-startup (about 3 x as much disk I/O). |
When testing in an NHS pre-production environment, the following was observed:
The additional memory is all on the binary heap (not the process heap), and the files with the largest number of binary memory referenced are SST files. Further investigation reveals the binary references are related to the cache of header information (an array of binaries). This cache is built lazily after startup - by splitting out the header binary from the overall block when the block is first loaded. However, there is no binary copy - so this retains a reference to the whole block. Hence the unexpected high volume of binary memory referenced in the nodes following startup. See 5bef21d where a unit test has been added to demonstrate this (and fixed using binary:copy whenever a header binary is added to the array). |
Further to PR. To close issue the level at which page cache is pre-loaded needs to be configurable. |
Had a Riak instance in production where there were a lot of leveled_cdb instances that had a large amount of binary memory referenced. This could be cleared through garbage_collect() but garbage collection didn't seem to be trying to do this.
The binaries referenced may have been in the active journal. This was after a riak restart, so perhaps related to scan on startup?
The text was updated successfully, but these errors were encountered: