What is Sarus DP-XGBoost?

This is a fork of XGBoost that aims at adding differential-privacy to gradient boosted trees.

A detailed explanation of the theory and methods used can be found in: Grislain, Nicolas and Joan Gonzalvez. “DP-XGBoost: Private Machine Learning at Scale.” (2021)..

Quick Start

You can start using dp-xgboost with the following notebook. Other Python examples which build a DP model are given in sarus/python/.

Installing Sarus DP-XGBoost

To install DP-XGBoost simply run: pip install dp-xgboost

Usage

Python examples which build a DP model are given in sarus/python/.

The main parameters involved in DP learning are:

tree_method which must be set to approxDP to use Sarus XGBoost DP tree learning.
dp_epsilon_per_tree: the privacy budget of a single tree.
min_child_weight: the minimum weight needed to construct a leaf, this influences the DP noise.
subsample: the fraction of the dataset randomly sampled to each tree, subsampling improve the privacy.
num_boost_rounds: the number of trees built.

The privacy queries used during training are stored in the model and accessible via booster.save_model().

Privacy consumption

Note that the total privacy consumption of the boosted trees is given by:

$$n \log{ \left( 1 + \gamma(e^{\epsilon} - 1) \right) }$$

Where $n$ is the number of trees, $\gamma$ the subsample fraction (between 0 and 1), and $\epsilon$ is the budget per tree. You can refer to our explaining article in doc/sarus for more details on privacy consumption.

Differential Privacy in the C++ library

DP is added at three levels in the XGBoost C++ shared library (under the src repo): to construct sketches (with a histogram query), for split selection (with an exponential mech), and for leaf values (with a Laplace mechanism). The mechanisms are located in include/xgboost/mechanisms.h.

Relevant classes are in the src/tree/updater_histmaker.cc file and especially the DPHistMaker class which is the DP tree updater called when setting approxDP as tree_method param in XGBoost.

Building for the JVM

To use with Spark, please follow https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html.

Needed: Java JDK 1.8, Spark 2.12, Maven 3
Set the JAVA_HOME env variable first: export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/
In the jvm-packages folder run mvn package install -DskipTests -Dmaven.test.skip=true

This should build the jars xgboost4j and xgboost4j-spark which will then be passed to spark-submit. The sarus/spark folder contains an example of Spark project in Scala with a POM file that should compile and launch Sarus XGBoost with 2 workers.

Developer guide

Get the submodules (s.a. dmlc)

git submodule sync
git submodule update --init --recursive

(Optional) Install prerequisites (s.a. cmake, g++, libomp
Build

mkdir build
cd build
cmake ..

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.github/workflows		.github/workflows
R-package		R-package
amalgamation		amalgamation
cmake		cmake
cub @ af39ee2		cub @ af39ee2
demo		demo
dev		dev
dmlc-core @ f00e3ec		dmlc-core @ f00e3ec
doc		doc
gputreeshap @ 3310a30		gputreeshap @ 3310a30
include/xgboost		include/xgboost
jvm-packages		jvm-packages
plugin		plugin
python-package		python-package
rabit		rabit
sarus-docker		sarus-docker
sarus		sarus
src		src
tests		tests
xgboost-operator @ 29f22a0		xgboost-operator @ 29f22a0
.clang-tidy		.clang-tidy
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
.travis.yml		.travis.yml
CITATION		CITATION
CMakeLists.txt		CMakeLists.txt
CONTRIBUTORS.md		CONTRIBUTORS.md
Jenkinsfile		Jenkinsfile
Jenkinsfile-win64		Jenkinsfile-win64
LICENSE		LICENSE
Makefile		Makefile
NEWS.md		NEWS.md
README.md		README.md
appveyor.yml		appveyor.yml
logo-dp-xgboost.png		logo-dp-xgboost.png
logo-dp-xgboost.svg		logo-dp-xgboost.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is Sarus DP-XGBoost?

Quick Start

Installing Sarus DP-XGBoost

Usage

Privacy consumption

Differential Privacy in the C++ library

Building for the JVM

Developer guide

About

Releases 4

Packages

Contributors 2

Languages

License

sarus-tech/dp-xgboost

Folders and files

Latest commit

History

Repository files navigation

What is Sarus DP-XGBoost?

Quick Start

Installing Sarus DP-XGBoost

Usage

Privacy consumption

Differential Privacy in the C++ library

Building for the JVM

Developer guide

About

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 2

Languages

Packages