microGBT is a minimalistic (606 LOC) Gradient Boosting Trees implementation in C++11 following xgboost's paper, i.e., the tree building process is based on the gradient and Hessian vectors (Newton-Raphson method).
A minimalist Python API is available using pybind11. To use it,
import microgbtpy
params = {
"gamma": 0.1,
"lambda": 1.0,
"max_depth": 4.0,
"shrinkage_rate": 1.0,
"min_split_gain": 0.1,
"learning_rate": 0.1,
"min_tree_size": 3,
"num_boosting_rounds": 100.0,
"metric": 0.0
}
gbt = microgbtpy.GBT(params)
# Training
gbt.train(X_train, y_train, X_valid, y_valid, num_iters, early_stopping_rounds)
# Predict
y_pred = gbt.predict(x, gbt.best_iteration())
The main goal of the project is to be educational and provide a minimalistic codebase that allows experimentation with Gradient Boosting Trees.
Currently, the following loss functions are supported:
- Logistic loss for binary classification,
logloss.h
- Root Mean Squared Error (RMSE) for regression,
rmse.h
Set the parameter metric
to 0.0 and 1.0 for logistic regression and RMSE, respectively.
To install locally
pip install git+https://github.com/zouzias/microgbt.git
Then, follow the instructions to run the titanic classification dataset.
git clone https://github.com/zouzias/microgbt.git
cd microgbt
docker-compose build microgbt
docker-compose run microgbt
./runBuild
A binary classification example using the Titanic dataset. Run
cd examples/
./test-titanic.py
the output should include
precision recall f1-score support
0 0.75 0.96 0.84 78
1 0.91 0.55 0.69 56
micro avg 0.79 0.79 0.79 134
macro avg 0.83 0.76 0.77 134
weighted avg 0.82 0.79 0.78 134
`
To run the LightGBM regression example, type
cd examples/
./test-lightgbm-example.py
the output should end with
2019-05-19 22:54:04,825 - __main__ - INFO - *************[Testing]*************
2019-05-19 22:54:04,825 - __main__ - INFO - ******************************
2019-05-19 22:54:04,825 - __main__ - INFO - * [Testing]RMSE=0.447120
2019-05-19 22:54:04,826 - __main__ - INFO - * [Testing]R^2-Score=0.194094
2019-05-19 22:54:04,826 - __main__ - INFO - ******************************