Skip to content

mv fadvise control

Matthew Von-Maszewski edited this page Dec 5, 2013 · 14 revisions

Status

  • merged to master
  • code complete December 4, 2013
  • development started December 3, 2013

History / Context

This branch contains code for the Riak 1.4.4 release. There are three distinct pieces:

  1. backport from 2.0 of eleveldb fix for iterator Prev and Next operations: this change is discussed here https://github.com/basho/eleveldb/issues/52

  2. backport from 2.0 of leveldb fix to Compaction::ShouldStopBefore(): a code change in 1.4.0 to limit the total number of keys in any .sst table file to 75,000 had a side effect. The side effect disabled the function's primary purpose of splitting up a new .sst when its keys overlapped too many .sst files at the next higher level. Too much overlap creates very large compactions in the future (some multi-gigabyte compactions were seen).

  3. add app.config flag "fadvise_willneed" and pass that flag through eleveldb to leveldb: leveldb 1.3 and 1.4 each incrementally improved the fadvise() logic that manages the Linux page cache. The improvements helped the page cache flush all newly compacted user data to disk more quick, leaving more page cache space for random disk operations. However, the page cache management assumes that user servers have physical RAM that is much smaller that the data base size. A user with 200Gbytes of RAM and a smaller database requested an option to disable the page cache management so that all user data would remain in the page cache. Setting fadvise_willneed to true in app.config on systems where physical RAM exceeds data base size will improve some random read performance and reduce disk operations.

Branch description

basho/leveldb mv-fadvise-control changes

db_impl.cc

The global gFadviseWillNeed is populated from Options.fadvise_willneed everytime a database is opened. The global immediately impacts all open databases, not just the one opening now. With Riak, this is ok and even desired. Globals are tacky programming, but currently options do not get passed to the lower level objects needing this setting.

Clone this wiki locally