Implement multi-target for hist.

Initial commit. Predictor. Compile. fixes. Cleanup. Moving code around. Start working on cat features. Start working on model IO. Fix. Revert. cleanup. Rebase. Reverse cleanup. rename. Fix rebase. small cleanup. inc Merge it into reg tree. Strategy. Extract the cat matrix. Use array in predictor. Use array in scalar. Merge two kernels. QDM. inplace predict. cleanup. naming. cleanup. cleanup. sampler. copy. cleanup. compile test. Hide the tree. Hide from the partitioner. Hide init root. layer to trees. check. Remove old sampling func. leaf partition. use linalg. remove grad stats. ro5 reverse. Don't support prediction cache for now. col sampler. Cleanup. Cleanup. Cleanup histogram. t Cleanup evaluation. ic. Cleanup. start working on io. is valid. basic io. dispatch. Basic IO. Cleanup node sum. cleanup. Extract the updater. Merge it into quantile hist. cleanup. Cleanup. restore checks. Cleanup. remove num_target. fix tests. Fix. fixes. Type deduction. R package. Predict leaf. Predict leaf. cleanup. Add a test to sampling. check. cleanup. cleanup. parallel. Cleanup Fix root. column-major. fewer right. Cleanup. Initial work on merging the updaters. Fix. Merge update tree. Consistent naming. HD. Unify sampling. Fix build. Fix build. CUDA build. Fix GPU SHAP tests. fix. fix rebase. nd. update rebase errors. configuration. Lint. Fix segfault. split up groups and targets. Fix. Fix. Remove targets. cleanup. Cleanup linalg. fix test. revert. Rebase. interaction constraint. try to use constant. work on merging the parameter into tree. work on tree json model. Initialization. remove fixme. Pass the model parameter in. Cleanup. Fix size. Checks. lint. Update document. Pass obj info instead of model parameter. make clang happy. fix rebase. Cleanup. Tests.
dmlc · Mar 14, 2023 · fd670a8 · fd670a8
1 parent 72e8331
commit fd670a8
Show file tree

Hide file tree

Showing 31 changed files with 1,199 additions and 917 deletions.
diff --git a/demo/guide-python/multioutput_regression.py b/demo/guide-python/multioutput_regression.py
@@ -40,11 +40,18 @@ def gen_circle() -> Tuple[np.ndarray, np.ndarray]:
     return X, y
 
 
-def rmse_model(plot_result: bool):
+def rmse_model(plot_result: bool, strategy: str):
     """Draw a circle with 2-dim coordinate as target variables."""
     X, y = gen_circle()
     # Train a regressor on it
-    reg = xgb.XGBRegressor(tree_method="hist", n_estimators=64)
+    reg = xgb.XGBRegressor(
+        tree_method="hist",
+        n_estimators=128,
+        n_jobs=16,
+        max_depth=8,
+        multi_strategy=strategy,
+        subsample=0.6,
+    )
     reg.fit(X, y, eval_set=[(X, y)])
 
     y_predt = reg.predict(X)
@@ -88,9 +95,10 @@ def rmse(predt: np.ndarray, dtrain: xgb.DMatrix) -> Tuple[str, float]:
         {
             "tree_method": "hist",
             "num_target": y.shape[1],
+            "multi_strategy": "monolithic",
         },
         dtrain=Xy,
-        num_boost_round=100,
+        num_boost_round=128,
         obj=squared_log,
         evals=[(Xy, "Train")],
         evals_result=results,
@@ -107,6 +115,9 @@ def rmse(predt: np.ndarray, dtrain: xgb.DMatrix) -> Tuple[str, float]:
     parser.add_argument("--plot", choices=[0, 1], type=int, default=1)
     args = parser.parse_args()
     # Train with builtin RMSE objective
-    rmse_model(args.plot == 1)
+    # one model per output
+    rmse_model(args.plot == 1, "composite")
+    # one model for all outputs
+    rmse_model(args.plot == 1, "monolithic")
     # Train with custom objective.
     custom_rmse_model(args.plot == 1)
diff --git a/doc/parameter.rst b/doc/parameter.rst
@@ -226,6 +226,13 @@ Parameters for Tree Booster
     list is a group of indices of features that are allowed to interact with each other.
     See :doc:`/tutorials/feature_interaction_constraint` for more information.
 
+* ``multi_strategy``, [default = ``composite``]
+
+  - The strategy used for training multi-target models.
+
+    - ``composite``: One model for each target.
+    - ``monolithic``:  Use multi-target trees.
+
 .. _cat-param:
 
 Parameters for Categorical Feature

diff --git a/doc/tutorials/multioutput.rst b/doc/tutorials/multioutput.rst
@@ -11,7 +11,11 @@ can be simultaneously classified as both sci-fi and comedy.  For detailed explan
 terminologies related to different multi-output models please refer to the
 :doc:`scikit-learn user guide <sklearn:modules/multiclass>`.
 
-Internally, XGBoost builds one model for each target similar to sklearn meta estimators,
+**********************************
+Training with One-Model-Per-Target
+**********************************
+
+By default, XGBoost builds one model for each target similar to sklearn meta estimators,
 with the added benefit of reusing data and other integrated features like SHAP.  For a
 worked example of regression, see
 :ref:`sphx_glr_python_examples_multioutput_regression.py`. For multi-label classification,
@@ -36,3 +40,26 @@ dense matrix for labels.
 
 
 The feature is still under development with limited support from objectives and metrics.
+
+*************************
+Training with Vector Leaf
+*************************
+
+.. versionadded:: 2.0
+
+.. note::
+
+   This is highly experimental and many features are missing.
+
+
+XGBoost can optionally build multi-output trees with the size of leaf equals to the number
+of targets. The behavior can be controlled by the ``multi_strategy`` training
+parameter. It can take the value `composite` (the default) or `monolithic`. Specify
+`monolithic` and use ``tree_method=hist`` to enable this feature.
+
+
+.. code-black:: python
+
+  clf = xgb.XGBClassifier(tree_method="hist", multi_strategy="monolithic")
+
+See :ref:`sphx_glr_python_examples_multioutput_regression.py` for a worked example.
diff --git a/include/xgboost/base.h b/include/xgboost/base.h
@@ -89,19 +89,19 @@
 namespace xgboost {
 
 /*! \brief unsigned integer type used for feature index. */
-using bst_uint = uint32_t;  // NOLINT
+using bst_uint = std::uint32_t;  // NOLINT
 /*! \brief integer type. */
-using bst_int = int32_t;    // NOLINT
+using bst_int = std::int32_t;    // NOLINT
 /*! \brief unsigned long integers */
-using bst_ulong = uint64_t;  // NOLINT
+using bst_ulong = std::uint64_t;  // NOLINT
 /*! \brief float type, used for storing statistics */
 using bst_float = float;  // NOLINT
 /*! \brief Categorical value type. */
-using bst_cat_t = int32_t;  // NOLINT
+using bst_cat_t = std::int32_t;  // NOLINT
 /*! \brief Type for data column (feature) index. */
-using bst_feature_t = uint32_t;  // NOLINT
+using bst_feature_t = std::uint32_t;  // NOLINT
 /*! \brief Type for histogram bin index. */
-using bst_bin_t = int32_t;  // NOLINT
+using bst_bin_t = std::int32_t;  // NOLINT
 /*! \brief Type for data row index.
  *
  * Be careful `std::size_t' is implementation-defined.  Meaning that the binary

diff --git a/include/xgboost/linalg.h b/include/xgboost/linalg.h
@@ -530,17 +530,17 @@ class TensorView {
   /**
    * \brief Number of items in the tensor.
    */
-  LINALG_HD [[nodiscard]] std::size_t Size() const { return size_; }
+  [[nodiscard]] LINALG_HD std::size_t Size() const { return size_; }
   /**
    * \brief Whether this is a contiguous array, both C and F contiguous returns true.
    */
-  LINALG_HD [[nodiscard]] bool Contiguous() const {
+  [[nodiscard]] LINALG_HD bool Contiguous() const {
     return data_.size() == this->Size() || this->CContiguous() || this->FContiguous();
   }
   /**
    * \brief Whether it's a c-contiguous array.
    */
-  LINALG_HD [[nodiscard]] bool CContiguous() const {
+  [[nodiscard]] LINALG_HD bool CContiguous() const {
     StrideT stride;
     static_assert(std::is_same<decltype(stride), decltype(stride_)>::value);
     // It's contiguous if the stride can be calculated from shape.
@@ -550,7 +550,7 @@ class TensorView {
   /**
    * \brief Whether it's a f-contiguous array.
    */
-  LINALG_HD [[nodiscard]] bool FContiguous() const {
+  [[nodiscard]] LINALG_HD bool FContiguous() const {
     StrideT stride;
     static_assert(std::is_same<decltype(stride), decltype(stride_)>::value);
     // It's contiguous if the stride can be calculated from shape.

diff --git a/include/xgboost/task.h b/include/xgboost/task.h
@@ -1,15 +1,15 @@
-/*!
- * Copyright 2021-2022 by XGBoost Contributors
+/**
+ * Copyright 2021-2023 by XGBoost Contributors
  */
 #ifndef XGBOOST_TASK_H_
 #define XGBOOST_TASK_H_
 
-#include <xgboost/base.h>
+#include <xgboost/base.h>  // for XGBOOST_DEVICE
 
-#include <cinttypes>
+#include <cstdint>         // for uint8_t
 
 namespace xgboost {
-/*!
+/**
  * \brief A struct returned by objective, which determines task at hand.  The struct is
  *        not used by any algorithm yet, only for future development like categorical
  *        split.
@@ -23,7 +23,7 @@ namespace xgboost {
  */
 struct ObjInfo {
   // What kind of problem are we trying to solve
-  enum Task : uint8_t {
+  enum Task : std::uint8_t {
     kRegression = 0,
     kBinary = 1,
     kClassification = 2,
@@ -41,7 +41,7 @@ struct ObjInfo {
   /**
    * \brief Use adaptive tree if the objective doesn't have valid hessian value.
    */
-  XGBOOST_DEVICE bool UpdateTreeLeaf() const { return zero_hess; }
+  [[nodiscard]] XGBOOST_DEVICE bool UpdateTreeLeaf() const { return zero_hess; }
 };
 }  // namespace xgboost
 #endif  // XGBOOST_TASK_H_
diff --git a/include/xgboost/tree_model.h b/include/xgboost/tree_model.h
@@ -101,14 +101,14 @@ struct RTreeNodeStat {
   /*! \brief weight of current node */
   bst_float base_weight;
   /*! \brief number of child that is leaf node known up to now */
-  int leaf_child_cnt {0};
+  int leaf_child_cnt{0};
 
   RTreeNodeStat() = default;
-  RTreeNodeStat(float loss_chg, float sum_hess, float weight) :
-      loss_chg{loss_chg}, sum_hess{sum_hess}, base_weight{weight} {}
+  RTreeNodeStat(float loss_chg, float sum_hess, float weight)
+      : loss_chg{loss_chg}, sum_hess{sum_hess}, base_weight{weight} {}
   bool operator==(const RTreeNodeStat& b) const {
-    return loss_chg == b.loss_chg && sum_hess == b.sum_hess &&
-           base_weight == b.base_weight && leaf_child_cnt == b.leaf_child_cnt;
+    return loss_chg == b.loss_chg && sum_hess == b.sum_hess && base_weight == b.base_weight &&
+           leaf_child_cnt == b.leaf_child_cnt;
   }
   // Swap byte order for all fields. Useful for transporting models between machines with different
   // endianness (big endian vs little endian)
@@ -433,11 +433,9 @@ class RegTree : public Model {
    * \param leaf_right_child  The right child index of leaf, by default kInvalidNodeId,
    *                          some updaters use the right child index of leaf as a marker
    */
-  void ExpandNode(bst_node_t nid, unsigned split_index, bst_float split_value,
-                  bool default_left, bst_float base_weight,
-                  bst_float left_leaf_weight, bst_float right_leaf_weight,
-                  bst_float loss_change, float sum_hess, float left_sum,
-                  float right_sum,
+  void ExpandNode(bst_node_t nid, unsigned split_index, bst_float split_value, bool default_left,
+                  bst_float base_weight, bst_float left_leaf_weight, bst_float right_leaf_weight,
+                  bst_float loss_change, float sum_hess, float left_sum, float right_sum,
                   bst_node_t leaf_right_child = kInvalidNodeId);
   /**
    * \brief Expands a leaf node into two additional leaf nodes for a multi-target tree.
@@ -587,7 +585,6 @@ class RegTree : public Model {
     [[nodiscard]] bool IsMissing(size_t i) const;
     [[nodiscard]] bool HasMissing() const;
 
-
    private:
     /*!
      * \brief a union value of value and flag
@@ -627,9 +624,7 @@ class RegTree : public Model {
   /*!
    * \brief Get split types for all nodes.
    */
-  [[nodiscard]] std::vector<FeatureType> const& GetSplitTypes() const {
-    return split_types_;
-  }
+  [[nodiscard]] std::vector<FeatureType> const& GetSplitTypes() const { return split_types_; }
   [[nodiscard]] common::Span<uint32_t const> GetSplitCategories() const {
     return split_categories_;
   }

diff --git a/include/xgboost/tree_updater.h b/include/xgboost/tree_updater.h
@@ -22,6 +22,8 @@
 #include <vector>                        // for vector
 
 namespace xgboost {
+struct ObjInfo;
+struct Context;
 namespace tree {
 struct TrainParam;
 }

diff --git a/python-package/xgboost/sklearn.py b/python-package/xgboost/sklearn.py
@@ -624,6 +624,7 @@ def __init__(
         feature_types: Optional[FeatureTypes] = None,
         max_cat_to_onehot: Optional[int] = None,
         max_cat_threshold: Optional[int] = None,
+        multi_strategy: Optional[str] = None,
         eval_metric: Optional[Union[str, List[str], Callable]] = None,
         early_stopping_rounds: Optional[int] = None,
         callbacks: Optional[List[TrainingCallback]] = None,
@@ -670,6 +671,7 @@ def __init__(
         self.feature_types = feature_types
         self.max_cat_to_onehot = max_cat_to_onehot
         self.max_cat_threshold = max_cat_threshold
+        self.multi_strategy = multi_strategy
         self.eval_metric = eval_metric
         self.early_stopping_rounds = early_stopping_rounds
         self.callbacks = callbacks

diff --git a/src/c_api/c_api.cc b/src/c_api/c_api.cc
@@ -991,9 +991,8 @@ XGB_DLL int XGBoosterPredictFromDMatrix(BoosterHandle handle,
   xgboost_CHECK_C_ARG_PTR(out_dim);
   xgboost_CHECK_C_ARG_PTR(out_shape);
 
-  CalcPredictShape(strict_shape, type, p_m->Info().num_row_,
-                   p_m->Info().num_col_, chunksize, learner->Groups(), rounds,
-                   &shape, out_dim);
+  CalcPredictShape(strict_shape, type, p_m->Info().num_row_, p_m->Info().num_col_, chunksize,
+                   learner->Groups(), rounds, &shape, out_dim);
   *out_shape = dmlc::BeginPtr(shape);
   API_END();
 }

diff --git a/src/c_api/c_api_utils.h b/src/c_api/c_api_utils.h
@@ -5,9 +5,9 @@
 #define XGBOOST_C_API_C_API_UTILS_H_
 
 #include <algorithm>
-#include <cstddef>
+#include <cstddef>  // for size_t
 #include <functional>
-#include <memory>  // std::shared_ptr
+#include <memory>   // for shared_ptr
 #include <string>
 #include <vector>
 
@@ -18,10 +18,11 @@
 #include "xgboost/learner.h"
 #include "xgboost/linalg.h"       // ArrayInterfaceHandler
 #include "xgboost/logging.h"
-#include "xgboost/string_view.h"  // StringView
+#include "xgboost/string_view.h"  // for StringView
 
 namespace xgboost {
-/* \brief Determine the output shape of prediction.
+/**
+ * \brief Determine the output shape of prediction.
  *
  * \param strict_shape Whether should we reshape the output with consideration of groups
  *                     and forest.
@@ -34,14 +35,14 @@ namespace xgboost {
  * \param out_shape    Output shape
  * \param out_dim      Output dimension
  */
-inline void CalcPredictShape(bool strict_shape, PredictionType type, size_t rows, size_t cols,
-                             size_t chunksize, size_t groups, size_t rounds,
-                             std::vector<bst_ulong> *out_shape,
+inline void CalcPredictShape(bool strict_shape, PredictionType type, std::size_t rows,
+                             std::size_t cols, std::uint32_t chunksize, std::uint32_t n_groups,
+                             std::size_t rounds, std::vector<bst_ulong> *out_shape,
                              xgboost::bst_ulong *out_dim) {
   auto &shape = *out_shape;
   if (type == PredictionType::kMargin && rows != 0) {
     // When kValue is used, softmax can change the chunksize.
-    CHECK_EQ(chunksize, groups);
+    CHECK_EQ(chunksize, n_groups);
   }
 
   switch (type) {
@@ -55,13 +56,14 @@ inline void CalcPredictShape(bool strict_shape, PredictionType type, size_t rows
       *out_dim = 2;
       shape.resize(*out_dim);
       shape.front() = rows;
-      shape.back() = std::min(groups, chunksize);
+      // chunksize can be 1 if it's softmax
+      shape.back() = std::min(n_groups, chunksize);
     }
     break;
   }
   case PredictionType::kApproxContribution:
   case PredictionType::kContribution: {
-    if (groups == 1 && !strict_shape) {
+    if (n_groups == 1 && !strict_shape) {
       *out_dim = 2;
       shape.resize(*out_dim);
       shape.front() = rows;
@@ -70,14 +72,14 @@ inline void CalcPredictShape(bool strict_shape, PredictionType type, size_t rows
       *out_dim = 3;
       shape.resize(*out_dim);
       shape[0] = rows;
-      shape[1] = groups;
+      shape[1] = n_groups;
       shape[2] = cols + 1;
     }
     break;
   }
   case PredictionType::kApproxInteraction:
   case PredictionType::kInteraction: {
-    if (groups == 1 && !strict_shape) {
+    if (n_groups == 1 && !strict_shape) {
       *out_dim = 3;
       shape.resize(*out_dim);
       shape[0] = rows;
@@ -87,7 +89,7 @@ inline void CalcPredictShape(bool strict_shape, PredictionType type, size_t rows
       *out_dim = 4;
       shape.resize(*out_dim);
       shape[0] = rows;
-      shape[1] = groups;
+      shape[1] = n_groups;
       shape[2] = cols + 1;
       shape[3] = cols + 1;
     }
@@ -98,7 +100,7 @@ inline void CalcPredictShape(bool strict_shape, PredictionType type, size_t rows
       shape.resize(4);
       shape[0] = rows;
       shape[1] = rounds;
-      shape[2] = groups;
+      shape[2] = n_groups;
       auto forest = chunksize / (shape[1] * shape[2]);
       forest = std::max(static_cast<decltype(forest)>(1), forest);
       shape[3] = forest;