fix(deps): update dependency xgboost to v2 #70
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
^1.7.5
->^2.0.0
Release Notes
dmlc/xgboost (xgboost)
v2.0.0
: Release 2.0.0 stableCompare Source
2.0.0 (2023 Sep 12)
We are excited to announce the release of XGBoost 2.0. This note will begin by covering some overall changes and then highlight specific updates to the package.
Initial work on multi-target trees with vector-leaf outputs
We have been working on vector-leaf tree models for multi-target regression, multi-label classification, and multi-class classification in version 2.0. Previously, XGBoost would build a separate model for each target. However, with this new feature that's still being developed, XGBoost can build one tree for all targets. The feature has multiple benefits and trade-offs compared to the existing approach. It can help prevent overfitting, produce smaller models, and build trees that consider the correlation between targets. In addition, users can combine vector leaf and scalar leaf trees during a training session using a callback. Please note that the feature is still a working in progress, and many parts are not yet available. See #9043 for the current status. Related PRs: (#8538, #8697, #8902, #8884, #8895, #8898, #8612, #8652, #8698, #8908, #8928, #8968, #8616, #8922, #8890, #8872, #8889, #9509) Please note that, only the
hist
(default) tree method on CPU can be used for building vector leaf trees at the moment.New
device
parameter.A new
device
parameter is set to replace the existinggpu_id
,gpu_hist
,gpu_predictor
,cpu_predictor
,gpu_coord_descent
, and the PySpark specific parameteruse_gpu
. Onward, users need only thedevice
parameter to select which device to run along with the ordinal of the device. For more information, please see our document page (https://xgboost.readthedocs.io/en/stable/parameter.html#general-parameters) . For example, withdevice="cuda", tree_method="hist"
, XGBoost will run thehist
tree method on GPU. (#9363, #8528, #8604, #9354, #9274, #9243, #8896, #9129, #9362, #9402, #9385, #9398, #9390, #9386, #9412, #9507, #9536). The old behavior ofgpu_hist
is preserved but deprecated. In addition, thepredictor
parameter is removed.hist
is now the default tree methodStarting from 2.0, the
hist
tree method will be the default. In previous versions, XGBoost choosesapprox
orexact
depending on the input data and training environment. The new default can help XGBoost train models more efficiently and consistently. (#9320, #9353)GPU-based approx tree method
There's initial support for using the
approx
tree method on GPU. The performance of theapprox
is not yet well optimized but is feature complete except for the JVM packages. It can be accessed through the use of the parameter combinationdevice="cuda", tree_method="approx"
. (#9414, #9399, #9478). Please note that the Scala-based Spark interface is not yet supported.Optimize and bound the size of the histogram on CPU, to control memory footprint
XGBoost has a new parameter
max_cached_hist_node
for users to limit the CPU cache size for histograms. It can help prevent XGBoost from caching histograms too aggressively. Without the cache, performance is likely to decrease. However, the size of the cache grows exponentially with the depth of the tree. The limit can be crucial when growing deep trees. In most cases, users need not configure this parameter as it does not affect the model's accuracy. (#9455, #9441, #9440, #9427, #9400).Along with the cache limit, XGBoost also reduces the memory usage of the
hist
andapprox
tree method on distributed systems by cutting the size of the cache by half. (#9433)Improved external memory support
There is some exciting development around external memory support in XGBoost. It's still an experimental feature, but the performance has been significantly improved with the default
hist
tree method. We replaced the old file IO logic with memory map. In addition to performance, we have reduced CPU memory usage and added extensive documentation. Beginning from 2.0.0, we encourage users to try it with thehist
tree method when the memory saving byQuantileDMatrix
is not sufficient. (#9361, #9317, #9282, #9315, #8457)Learning to rank
We created a brand-new implementation for the learning-to-rank task. With the latest version, XGBoost gained a set of new features for ranking task including:
lambdarank_pair_method
for choosing the pair construction strategy.lambdarank_num_pair_per_sample
for controlling the number of samples for each group.lambdarank_unbiased
parameter.NDCG
using thendcg_exp_gain
parameter.NDCG
is now the default objective function.XGBRanker
.For more information, please see the tutorial. Related PRs: (#8771, #8692, #8783, #8789, #8790, #8859, #8887, #8893, #8906, #8931, #9075, #9015, #9381, #9336, #8822, #9222, #8984, #8785, #8786, #8768)
Automatically estimated intercept
In the previous version,
base_score
was a constant that could be set as a training parameter. In the new version, XGBoost can automatically estimate this parameter based on input labels for optimal accuracy. (#8539, #8498, #8272, #8793, #8607)Quantile regression
The XGBoost algorithm now supports quantile regression, which involves minimizing the quantile loss (also called "pinball loss"). Furthermore, XGBoost allows for training with multiple target quantiles simultaneously with one tree per quantile. (#8775, #8761, #8760, #8758, #8750)
L1 and Quantile regression now supports learning rate
Both objectives use adaptive trees due to the lack of proper Hessian values. In the new version, XGBoost can scale the leaf value with the learning rate accordingly. (#8866)
Export cut value
Using the Python or the C package, users can export the quantile values (not to be confused with quantile regression) used for the
hist
tree method. (#9356)column-based split and federated learning
We made progress on column-based split for federated learning. In 2.0, both
approx
,hist
, andhist
with vector leaf can work with column-based data split, along with support for vertical federated learning. Work on GPU support is still on-going, stay tuned. (#8576, #8468, #8442, #8847, #8811, #8985, #8623, #8568, #8828, #8932, #9081, #9102, #9103, #9124, #9120, #9367, #9370, #9343, #9171, #9346, #9270, #9244, #8494, #8434, #8742, #8804, #8710, #8676, #9020, #9002, #9058, #9037, #9018, #9295, #9006, #9300, #8765, #9365, #9060)PySpark
After the initial introduction of the PySpark interface, it has gained some new features and optimizations in 2.0.
use_gpu
is deprecated. Thedevice
parameter is preferred.Other General New Features
Here's a list of new features that don't have their own section and yet are general to all language bindings.
Other General Optimization
These optimizations are general to all language bindings. For language-specific optimization, please visit the corresponding sections.
array_interface
on CPU (likenumpy
) is significantly improved. (#9090)Notable breaking change
Other than the aforementioned change with the
device
parameter, here's a list of breaking changes affecting all packages.numpy.ndarray
instead of relying on text inputs. See https://github.com/dmlc/xgboost/issues/9472 for more info.Notable bug fixes
Some noteworthy bug fixes that are not related to specific language bindings are listed in this section.
inf
is checked during data construction. (#8911)updater
parameter is used instead of thetree_method
parameter (#9355)\t\n
in feature names for JSON model dump. (#9474)~
on Unix (#9463). In addition, all path inputs are required to be encoded in UTF-8 (#9448, #9443)Documentation
Aside from documents for new features, we have many smaller updates to improve user experience, from troubleshooting guides to typo fixes.
Python package
plot_importance
plot (#8540)__half
type, and no data copy is made. (#8487, #9207, #8481)Series
and Python primitive types ininplace_predict
andQuantileDMatrix
(#8547, #8542)sample_weight
. (#8706)xgboost.dask.train
(#9421)QuantileDMatrix
for efficiency. (#8666, #9445)setup.py
is now replaced with the new configuration filepyproject.toml
. Along with this, XGBoost now supports Python 3.11. (#9021, #9112, #9114, #9115) Consult the latest documentation for the updated instructions to build and install XGBoost.DataIter
now accepts only keyword arguments. (#9431)DaskXGBClassifier.classes_
to an array (#8452)best_iteration
only if early stopping is used to be consistent with documented behavior. (#9403)device
parameter section, thepredictor
parameter is now removed. (#9129)save_model
call for the scikit-learn interface. (#8963)ntree_limit
in the python package. This has been deprecated in previous versions. (#8345)black
andisort
for code formatting (#8420, #8748, #8867)enable_categorical
to True in predict. (#8592)R package
NA
. (#9522)JVM packages
Following are changes specific to various JVM-based packages.
ResultStage
toShuffleMapStage
(#9423)Revised support for
flink
(#9046)Breaking changes
DeviceQuantileDmatrix
intoQuantileDMatrix
(#8461)Maintenance (#9253, #9166, #9395, #9389, #9224, #9233, #9351, #9479)
CI bot PRs
We employed GitHub dependent bot to help us keep the dependencies up-to-date for JVM packages. With the help from the bot, we have cleared up all the dependencies that are lagging behind (#8501, #8507).
Here's a list of dependency update PRs including those made by dependent bots (#8456, #8560, #8571, #8561, #8562, #8600, #8594, #8524, #8509, #8548, #8549, #8533, #8521, #8534, #8532, #8516, #8503, #8531, #8530, #8518, #8512, #8515, #8517, #8506, #8504, #8502, #8629, #8815, #8813, #8814, #8877, #8876, #8875, #8874, #8873, #9049, #9070, #9073, #9039, #9083, #8917, #8952, #8980, #8973, #8962, #9252, #9208, #9131, #9136, #9219, #9160, #9158, #9163, #9184, #9192, #9265, #9268, #8882, #8837, #8662, #8661, #8390, #9056, #8508, #8925, #8920, #9149, #9230, #9097, #8648, #9203, #8593).
Maintenance
Maintenance work includes refactoring, fixing small issues that don't affect end users. (#9256, #8627, #8756, #8735, #8966, #8864, #8747, #8892, #9057, #8921, #8949, #8941, #8942, #9108, #9125, #9155, #9153, #9176, #9447, #9444, #9436, #9438, #9430, #9200, #9210, #9055, #9014, #9004, #8999, #9154, #9148, #9283, #9246, #8888, #8900, #8871, #8861, #8858, #8791, #8807, #8751, #8703, #8696, #8693, #8677, #8686, #8665, #8660, #8386, #8371, #8410, #8578, #8574, #8483, #8443, #8454, #8733)
CI
Additional artifacts:
You can verify the downloaded packages by running the following command on your Unix shell:
Experimental binary packages for R with CUDA enabled
Source tarball
Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR has been generated by Mend Renovate. View repository job log here.