Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SharpLearning.XGBoost project #68

Merged
merged 62 commits into from
May 20, 2018
Merged

Add SharpLearning.XGBoost project #68

merged 62 commits into from
May 20, 2018

Conversation

mdabros
Copy link
Owner

@mdabros mdabros commented May 10, 2018

Add a more efficient alternative to SharpLearning.GradientBoost. XGBoost is faster on CPU and also supports GPU learning. However, it does have native dependencies, so might not be ideal for all platforms and situations.

A small test comparing the RegressionXGBoostLearner and the RegressionGradientBoostLearner from SharpLearning on a medium sized regression task.

Dataset: YearPredictionMSD
Rows: 515345
Cols: 90

Hardware:
CPU: Core i7-4770
GPU: GTX-1070

Model parameters:
MaximumTreeDepth: 7
Estimators: 152
colSampleByTree: 0.45
colSampleByLevel: 0.77

Training time compared using XGBoost in histogram and exact mode on GPU and CPU:

image

As can be seen, XGBoost can be up to 70 times faster, when using the histogram based tree method. Using the exact method, which is more similar to the method from SharpLearning.GradientBoost, the speed up is still around 10 when using GPU, and 5 when using CPU.

Missing tasks before the PR can be completed:

  • Add argument checks to learners.
  • Add unit test of conversion class.
  • Add index support for learners, and input checks.
  • Add probability support for classifier model.
  • Add more learner and model tests.
  • In the classification model, consider removing the targetNameToTargetIndex member, and adopt XGBoost´s requirement of sequntial class labels starting at 0. Checks can be added to alert users before learning starts.
  • Complete pull request to XGBoost.Net to enable GPU use and Booster selection.
  • Add VariableImportance support to XGBoost models.
  • Split objectives into regression and classification, so only compatible objectives are available for the learners .
  • Consider splitting learners into Linear, Tree and Dart, to only show relevant hyperparameters for each in the constructors.
  • Add enums for the DART specific parameters.
  • Complete pull request to XGBoost.Net to add DART parameters.
  • Complete pull request to XGBoost.Net to fix Booster.Dispose().
  • Get XGBoost.Net to publish new nuget package.
  • Change from local reference to updated XGBoost.Net package.
  • Update readme.
  • Check cross-validation and learning curves loops with XGBoost models (disposable).
  • Package SharpLearing.XGBoost during build to avoid issue with "dotnet pack" and how the native dll is included in picnet.xgboost.net
  • Add probability interfaces to xgboost classification learner.
  • Add model converter from XGBoost to SharpLearing.GradientBoost. (Added but not completed).
  • Consider using the SharpLearing.GradientBoost.Models instead of the XGBoost equivilants. This would enable standard serialization and features, and avoid having to deal with native resources when using the XGBoost models. (For now, it has been decided to use the XGBoostModels, and leave the conversion for another pull request).

mdabros added 30 commits May 8, 2018 18:09
…to use the filesystem directly because of how the XGBoost models are saved. Also, the target name to target index mapping needs to be saved separately. XGBoost expects the mapping to be sequential from [0;numberOfClasses].
…tem directly because of how XGBoost saves its models
@mdabros mdabros changed the title [WIP] Add Sharplearning.XGBoost project [WIP] Add SharpLearning.XGBoost project May 16, 2018
@mdabros mdabros changed the title [WIP] Add SharpLearning.XGBoost project Add SharpLearning.XGBoost project May 20, 2018
@mdabros mdabros merged commit bdbaa0d into master May 20, 2018
@mdabros mdabros deleted the sharplearning-xgboost branch May 22, 2018 15:34
@fstandhartinger
Copy link

Wow Mads, this is great news! Excellent work! Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants