-
Notifications
You must be signed in to change notification settings - Fork 622
Environment
This page contains information that is intended to help start hacking on Peloton. It covers a wide set of topics related to software development, like development environment, debugging, version control etc. We are only attempting to provide biased suggestions that we think are invaluable for low-level software development. Feel free to deviate from them in case you are comfortable with other tools that solve similar problems.
We prefer the Linux
OS for development and testing. You can use any other OS for development as long as your modified source code can be built and run on Linux.
In case you are a Mac OS
or Windows
user, we encourage you to consider using Virtual Box
and Vagrant
. Virtual Box
is a hypervisor for x86 computers, and Vagrant
is a tool for sharing easy to configure virtual development environments. More information is available here.
We are developing Peloton in C++
. In particular, we are following the C++11
standard. C++ provides a lot of leeway in DBMS development compared to other high-level languages. For instance, it supports both manual and automated memory management, varied styles of programming, stronger type checking, different kinds of polymorphism etc.
Here's a list of useful references :
-
CPP Reference is an online reference of the powerful
Standard Template Library
(STL). - C++ FAQ covers a lots of topics.
Here's a list of C++11 features that you might want to make use of:
-
auto
type inference - Range-based
for
loops - Smart pointers, in particular
unique_ptr
. - STL data structures, such as
unordered_set
,unordered_map
, etc. - Threads, deleted member functions, lambdas, etc.
Please comment your code. Comment all the class definitions, non-trivial member functions and variables, and all the steps in your algorithms. Here's an example.
We follow the Google C++ style guide. As they mention in that guide, these rules exist to keep the code base manageable while still allowing coders to use C++ language features productively.
Make sure that you follow the naming rules. For instance, use class UpperCaseCamelCase
for type names, int lower_case_with_underscores
for variable/method/function names.
Please refrain from using any libraries other than the STL
(and googletest
for unit testing) without contacting us.
Organize source code files into relevant folders based on their functionality. Separate binary files from source files, and production code from testing code.
In general, the src
directory contains all the production code, the test
directory contains all the testing code, and the build
directory contains all the built binary files. Within the src
directory, src/storage
contains files related to our storage engine, while src/index
contains files related to the indexes, etc..
We exclusively use git
for source code management, and github
for collaboration. Here's a simple guide for using it. If you have never used a tool like git, I am sure that you will soon wonder how you managed to live without it. In case you want to learn more, here's a free book.
Please update or add a .gitignore
file to exclude unwanted files (e.g. the build directory with binary files, backup/temporary files of your editor/IDE, large files containing test data, etc.) from the repository.
We use cmake
for building Peloton. In particular, we use cmake
for automatically generating Makefile
files. You will probably not need to add file names to existing cmake files (with extension CMakeLists.txt
) since the cmake script automatically collects all files with cpp
extension.
In case you are curious about cmake
, here's a short introduction.
We use the g++
compiler, which is part of the GNU compiler collection. Here's more information on the status of C++11 support in GCC. In particular, we suggest g++ compilers with version higher than 4.7. Here's a list of useful compiler flags:
-
-std=c++11
: Enables C++11 support -
-g
: Produces debugging information in the OS's native format. -
-ggdb
: Produces debugging information specifically intended for gdb. -
-O0
: Optimize option that reduces compilation time and makes debugging more reliable. -
-O3
: Increases both the compilation time and the performance of the generated code. Use this when running benchmarks. -
-Wall
: Generate helpful warnings. Do not ignore them! In fact, force yourself to deal with warnings by handling them as errors with the-Werror
compiler flag.
We encourage you to use a debugger to find bugs, rather than relying on printf
statements. Specifically, we recommend gdb
, the GNU debugger. Most development platforms have a graphical gdb
front-end, but the command line can already be very helpful for debugging program crashes. You can consider using the curses-based interface for gdb, called cgdb
.
Here's some information on debugging Peloton using gdb.
If your program behaves in "mysterious" (a.k.a. "non-deterministic" ways), valgrind
is your friend. It can automatically detect many memory management and threading bugs. In particular, its memcheck
tool is useful for finding illegal accesses to memory, uninitialized reads and much more. Check out this blog post on the interaction between GDB and Valgrind.
Here's some information on debugging Peloton using valgrind.
It is a good habit to make sure that Peloton passes all the memchecked unit tests every time you commit a set of changes with git
.
Eclipse
is an awesome IDE that provides a nice project workspace, and has an extensible plug-in system for customizing the environment. It is really hard to match the capabilities of Eclipse, in particular its refactoring features, with editors like emacs
or vim
and a command line.
Here's some information on setting up Eclipse to develop Peloton.
Unit tests are critical for ensuring the correct functionality of your modules and reduce time spent on debugging. It can help prevent regressions. We use googletest
, a nice unit-testing framework for C++ projects.
We encourage you to write unit test cases for each class/algorithm that you have added or modified. For instance, look at this test suite for testing the functionality of tuples. Try to come up with test cases that make sure that the module exhibits the desired behavior. Some developers even suggest writing the unit tests before implementing the code. Make sure that you include corner cases, and try to find off-by-one errors.
gcov
is a code coverage analysis tool that tells us how much of the code base is covered by the unit tests. We have already integrated it in our build system.
Profilers help better understand the performance of Peloton and the environment in which it is running. We suggest perf
and strace
tools for profiling. Here's some information on profiling Peloton using perf.
A terminal multiplexer lets you switch easily between several programs in one terminal. This is particularly useful if you are benchmarking or developing on a remote server. We recommend the tmux
tool. Here's a short tutorial, and here's a nice cheat sheet.
We suggest the zsh
shell for development. In particular, along with oh-my-zsh, it provides an amazing command line experience with its autocomplete features. Here's more information on improving the terminal experience.
If you are already comfortable with most of these tools (and maybe even more), and want to hack on Peloton -- then that's awesome ! Drop me a line at mailto:jarulraj@cs.cmu.edu.