-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Unify device configuration. #7308
Comments
cc @RAMitchell @hcho3 @wbo4958 @JohnZed @dantegd @vepifanov @ShvetsKS @pseudotensor This might be the most significant breaking change in a long time. Please help with comments and suggestions. |
A previous attempt is at https://github.com/dmlc/xgboost/pull/6971/files . I have written some thoughts in the code comment there but largely summarized here. |
Looks like JVM can follow your suggestion easily. |
First example looks good to me:
Is going to be slightly tedious to implement as we have to change every language binding, but seems like a very positive change. |
For implementing the change, I would like to create an independent branch in dmlc during development so that we can run CI with incremental changes. |
This is the one last PR for removing omp global variable. * Add context object to the `DMatrix`. This bridges `DMatrix` with #7308 . * Require context to be available at the construction time of booster. * Add `n_threads` support for R csc DMatrix constructor. * Remove `omp_get_max_threads` in R glue code. * Remove threading utilities that rely on omp global variable.
This is a continuation of #4600
Overview
Use global configuration
From my perspective, this method is cleaner and covers both
DMatrix
andBooster
so it's listed first. An easier-to-implement solution is described in the next section.Define a new
device
parameter for XGBoost as global configuration and remove existing parameters includinggpu_hist
,gpu_id
, andpredictor
. For the native Python interface, it will look like this:The above code snippet should run on the first CUDA device, using GPU implementation of
hist
tree method. Also, the prediction should run on the same device regardless of the location of input data. The scikit-learn interface will look like this:while the config context is created internally in each function of
XGBClassifier
. For R users, we also have thexgb.set_config
function that changes global parameters.JVM packages are lagging behind. But in theory, we can have something similar. For the Java binding, we can define functions that are similar to R or Python
xgb.set_config
to set the global parameter. For Scala binding, we have high-level estimators likeXGBClassifier
in Python so we can handle the configuration internally.Last but not least, the C interface is the basis of all other interfaces, so its implementation should be trivial.
For handling existing code, my suggestion would be simply to throw an informative error. For example, if the user has specified
gpu_hist
, then we requiredevice
also to be set.Alternative solution
This might be more practical in short term. The
device
parameter doesn't have to be a global parameter. Like the currently availablegpu_id
, which is a parameter for thebooster
object. Hence we can keep it that way and reuse thegpu_id
parameter. This is still a breaking change due to other removed parameter but require lesser changes. For the native Python interface, it will look like this:Motivation
Device management has been a headache over the past. We have removed the
n_gpus
parameter in the 1.0 release, which helped clean up the code a little bit. But there are still many other issues in the current device management configuration. The most important one is, we need a single source of information about device ordinal. Currently, the device is chosen based on the following inputs:gpu_id
parameter: the supposed only authority that was rarely honored.tree_method
parameter.gpu_hist
or not.predictor
parameter:gpu_predictor
or not.As one might see, there are too many correlated factors influencing the decision of device ordinal, and sometimes they are conflicting with each other. For instance, Setting "gpu_hist" leads to
gpu_id >= 0
then if a user wants to run prediction on the CPU, the predictor might be set:
Then what's the current
gpu_id
? I don't know. The problem is getting worse with inplace prediction and GPU data inputs. Also, with theOneAPI
proposal from Intel, we have a growing number of configurations, and the existing approach simply cannot handle the complexity.Implementation
Depending on which solution is chosen, global parameter or booster parameter, we might opt for a different implementation. But the general state transition should be the same.
gpu_predictor
orgpu_hist
is chosen, Consistentdevice
must also be specified, otherwise, there will be an error. By consistent, it means thedevice
should be set asCUDA:x
. This is a breaking change, but can be handled with a crafted error message.device
is selected to be CUDA, then the tree method must be one of the {hist
,gpu_hist
,auto
}. All of them will becomegpu_hist
internally. For any other tree methods, XGBoost will throw a not implemented error. We can haveapprox
running onGPU if needed, but that's beyond the scope of this RFC.
device
. Or we simply revert the configuration and let the user decide whether inplace prediction is desired. This one is a bit more tricky as it helps reducing memory usage and latency dramatically, especially for dask. We can use more thoughts on this.Based on these rules, we have removed
predictor
,tree_method
, memory conservation heuristic, and data input type from the decision-making process. Lastly, there are the environment and the custom objective. These 2 can be continued to be handled as it's.The text was updated successfully, but these errors were encountered: