Skip to content
ahshilshah edited this page Jun 27, 2019 · 16 revisions

Background

BerkeleyDB is a noSQL database that implements a key-value store, which is very efficient and reliable, offering guarantees about atomic transactions, etc.

Related work

The RBerkeley package provided an R interface to BerkeleyDB but it was removed from CRAN in 2017.

Coding project: RBerkeley back on CRAN

It would be useful to have BerkeleyDB on CRAN in order to support applications/packages/algorithms that require disk-based storage. For example PeakSegDisk used to depend on BerkeleyDB STL, which provides an easy-to-use API for on-disk STL containers. PeakSegDisk provides an on-disk implementation of an optimal changepoint detection algorithm for genomic data, which scales to huge data sets because it is not limited by memory. However it was a real pain to get PeakSegDisk to compile on CRAN/win-builder, because they do not provide BerkeleyDB headers/libraries. It would have been great to be able to simply write LinkingTo: RBerkeley in the PeakSegDisk DESCRIPTION file, and be done. However it was easier to just re-write the required functionality in standard C++. The moral of the story is that R needs a package that provides Berkeley DB.

  • address the issues that made CRAN remove BerkeleyDB.
  • setup a git repo with CI / code coverage.
  • either fork https://github.com/hrbrmstr/RBerkeley or continue development there if possible.
  • add C code to support BerkeleyDB Standard Template Library (STL). blog API docs
  • write more tests to increase code coverage.
  • you can use this old version https://github.com/tdhock/PeakSegDisk/commit/190ce1c5e7774f27c38304e43a74cb0d860686c5 of PeakSegDisk as an example of how RBerkeley should/could be used – make a new repo with this code, add LinkingTo: RBerkeley, and use it for testing.
  • add a vignette explaining how to use RBerkeley in R and in C++.

Expected impact

After this GSOC project the RBerkeley package will be back on CRAN, and package developers will be able to build algorithms/functions that take advantage of this powerful library.

Mentors

Students, please contact mentors below after completing at least one of the tests below.

  • Bob Rudis <bob@rud.is> is the maintainer of the last version that was on CRAN, and has agreed to mentor.
  • Jeff Ryan <jeff.a.ryan@gmail.com> is the original author and can mentor.
  • Toby Hocking <toby.hocking@r-project.org> proposed this project, and would be a user of RBerkeley if it was on CRAN.

Tests

Students, please do one or more of the following tests before contacting the mentors above.

  • Easy: download the most recent version of RBerkeley and create an Rmd/html page showing R code and results of how to use RBerkeley.
  • Medium:
  • Hard:

Solutions of tests

Students, please post a link to your test results here.

Name: Basil Singh

University: Indian Institute of Technology Kanpur (India)

Degree: B.S. Economic Sciences

Solution for easy test: Solution Corresponding HTML page


Name: Abhinav Agarwal

College: Manipal Institute of Technology

Degree: B.Tech in Information Technology

Link: Easy


Clone this wiki locally