Skip to content

mv bucket expiry

Matthew Von-Maszewski edited this page Nov 19, 2016 · 15 revisions

Status

  • merged to master -
  • code complete - November 18, 2016
  • development started - November 11, 2016

History / Context

This branch is a partial implementation of Basho's expiry by bucket types for use within Riak. This branch contains the subset of the total design that is sufficient for an immediate customer need. Subsequent branches will complete the expiry by bucket type feature.

The complete feature is intended for Basho's enterprise edition products, not open source. Enterprise edition products required paid support. Therefore a substantial portion of this feature is isolated within the private leveldb_ee repository. Portions of the feature are also within Basho's eleveldb open source repository. A successful build requires the mv-bucket-expiry branch from all three repositories: basho/eleveldb, basho/leveldb, and basho/leveldb_ee. The term "this branch" refers to the collective set of code changes within all three repositories.

The key feature of this branch is to derive the expiry "write time" from data within a Riak Object that is being written to leveldb. The branch adds two write time sources: Riak Object's LastModTime field and an optional customer X-Riak-Meta-Expiry-Base-Seconds field. The priorities for write time sources are:

Priority Source Where valid
1 X-Riak-Meta-Expiry-Base-Seconds enterprise only
2 LastModTime enterprise only
3 "current minute" enterprise and open source

The "write time" is used in conjunction with a user specified expiry minutes. leveldb considers an object expired (logically deleted) once the current time is past the "write time" plus the expiry minute setting.

X-Riak-Meta-Expiry-Base-Seconds is property that the user can set when the data being loaded needs to be "back dated" for the purpose of expiry. Example: loading 12 months of previous data to facilitate computations against new live data, but desire rolling 12 month expiry. The previous data needs this explicit property to facilitate properly timed expiry. The value of this property is seconds since "epoch", Jan 1, 1970. The valid range of dates is Jan 1, 1980 to Jan 1, 2080. Examples:

X-Riak-Meta-Expiry-Base-Seconds: 1245494500 is June 9, 2009 at 7am 
X-Riak-Meta-Expiry-Base-Seconds: 1478342700 is November 5, 2016 at 6:45am

The "current minute" uses the current system clock, but is only updated roughly every 60 seconds. The expiry clock is not intended to be precise and overall throughput improves using the imprecise clock.

Branch description

The distinction between enterprise edition and open source expiry was unclear prior to this branch. The open source files (leveldb_os/expiry_os.cc, leveldb_os/expiry_os.h, leveldb_os/expiry_os_test.cc) were directly copied into the enterprise edition repository with only the file names changed (leveldb_ee/expiry_ee.cc, leveldb_ee/expiry_ee.h, leveldb_ee/expiry_ee_test.cc). This branch moves the open source files to the utils directory. Then the enterprise edition derives is class (ExpiryModuleEE) from the open source class (ExpiryModuleOS). The new organization eliminates having the same code in two places.

leveldb's enterprise edition requires the following command to retrieve the leveldb_ee repository:

git submodule update --init

then:

make clean
make

eleveldb automates the above based upon setting the BASHO_EE environment variable to 1 (important: no spaces between BASHO_EE, the equal sign, and the number 1):

export BASHO_EE=1

then:

make clean
make

eleveldb: c_src/eleveldb.cc

parse_open_options() now uses a factory function to create the ExpiryModule object. leveldb::ExpiryModule::CreateExpiryModule() compiles from different leveldb source files depending upon whether leveldb is built for enterprise edition or open source.

leveldb: db/penalty_test.cc

A prior branch added this file. It was not properly validated on all 16 open source platforms. Its compile failed on five of the sixteen. The older compilers did not acknowledge an integer constant as a valid comparison to a function returning a volatile integer within Google's test harness. All comparisons of this nature are now direct comparisons with test for true/false condition afterward.

leveldb: include/leveldb/expiry.h

Changed the constructor to protected status. Added the copy constructor and assignment operator as private status. This is an incomplete attempt to force usage of reference pointers instead of direct creation. The unit tests require changes before full protection of all ExpiryModule derived classes. Those changes will be in a future branch.

leveldb: leveldb_os/expiry_os_stub.cc

Everything in the leveldb_os directory compiles only if there are no files in the leveldb_ee directory. The leveldb_ee directory only contains files if the private leveldb_ee repository is added via "git submodule".

This stub file contains ExpiryModule::CreateExpiryModule(). This is an object factory. It creates an open source object only when the open source directory is compiled. There is a similar object factory in leveldb_ee for create enterprise edition expiry modules.

leveldb: tools/sst_scan.cc

This is not a production tool. It is used during development to check information within individual leveldb .sst table files. This branch adds the "v" option which creates a hex dump of the data values. There are two new blocks of code that are currently within #if 0/#endif sections. The two blocks are stubs that need clean / correction during a future expiry branch.

leveldb: util/expiry_os.cc/.h (was leveldb_os/expiry_os.cc/.h)

MemTableInserterCallback() previously called GetTimeMinutes() directly. Now it calls a virtual function GenerateWriteTime(). GenerateWriteTime() differs between open source and enterprise edition. The open source version simply makes the GetTimeMinutes() call previously called directly by MemTableInserterCallback(). The enterprise edition version attempts to retrieve the "write time" from within the Riak Object.

leveldb_ee: expiry_ee.cc/.h

These source filew were previously a straight duplicates of expiry_os.cc and expiry_os.h. This branch removes all the duplicate functions and instead inherits the functions via deriving from the open source ExpiryModuleOS. There are only three unique functions for enterprise edition:

ExpiryModule::CreateExpiryModule():  object factory that returns a new ExpiryModuleEE object
ExpiryModuleEE::Dump():  LOG data dump
ExpiryModuleEE::GenerateWriteTime():  enterprise edition virtual function that reads Riak Object for write time

leveldb_ee: riak_object.cc/.h

These are new source files. They provide two functions: KeyGetBucket() and ValueGetLastModTime(). Only the latter function is used in this branch. KeyGetBucket() will be integral to the next bucket expiry branch.

ValueGetLastModTime() and its supporting routines decode the Riak Object to the limited degree necessary for extracting the LastModTime value and potentially the "X-Riak-Meta-Expiry-Base-Seconds" property.

Information on the Erlang term encoding is found here: http://erlang.org/doc/apps/erts/erl_ext_dist.html

Clone this wiki locally