Define the new `device` parameter. #9362

trivialfis · 2023-07-06T01:54:39Z

Related: #7308

A new device parameter for replacing the previous gpu_id. The new parameter can take three different values and two additional aliases:

cpu
cuda
cuda:<ordinal> with <ordinal> as an integer
gpu alias to cuda
gpu:<ordinal> alias to cuda

The compatibility with gpu_id is kept. A warning is emitted when it's used. Along with the new parameter, gpu_hist is sunset.

After this PR, we still have many necessary changes in documents and demonstrations. I will leave them as follow-up PRs.

TODOs:

"GPU"
lower case.
Spark
PySpark
Test custom obj with deprecated gpu_hist and the new hist + device. (prediction cache, gradient)
Test inplace predict with mismatched data type and device.

Port the changes. Start working on tests. basic tests. gpu id. doc. Remove `gpu_id` from configuration. gpu id. cpu build. -1. lower case. alias. Fix tests. pyspark. sklearn. Cleanup. cpu build & tests. make. Set device. Cleanup. R scala. spark. spark tests.

trivialfis · 2023-07-12T03:54:16Z

@WeichenXu123 Would you like to take a look at the changes in the pyspark module?

.../xgboost4j-spark-gpu/src/main/scala/ml/dmlc/xgboost4j/scala/rapids/spark/GpuPreXGBoost.scala

wbo4958 · 2023-07-12T06:40:44Z

it's good from spark side.

trivialfis · 2023-07-13T01:19:27Z

regex hangs on mingw ...

doc/gpu/index.rst

napetrov · 2023-07-13T09:28:42Z

doc/parameter.rst

+    + ``cpu``: Use CPU.
+    + ``cuda``: Use a GPU (CUDA device).
+    + ``cuda:<ordinal>``: ``<ordinal>`` is an integer that specifies the ordinal of the GPU (which GPU do you want to use if you have more than one devices).
+    + ``gpu``: Same as ``cuda``.


I understand that now this are equivalent from XGBoost perspective, but as far we are talking about API used in next many years would it make sense to not introduce this restriction?

Absolutely, that's why we have gpu and cuda. It's equal to cuda "for now", in the future others can make variants based on available GPU devices at run time.

@napetrov Feel free to share your suggestions. ;-) Would like to get more opinions.

I mean from API perspective users would be writing code assuming GPU device are always CUDA, but this would results in breaking change when there would be another GPU backend.

Might be it worth to point that this is not the same but just a convenient way for selecting default GPU device - so in long-term this would result in GPU dispatching regardless of particular HW. i.e. now this is not same as 'cuda' but is default GPU device selector although it can select only from ['cuda']. In this way we would clearly define expectations from API and would allow extensions here

Thank you for sharing, will change the document as suggested.

napetrov · 2023-07-13T09:40:54Z

doc/parameter.rst

+    + ``cpu``: Use CPU.
+    + ``cuda``: Use a GPU (CUDA device).
+    + ``cuda:<ordinal>``: ``<ordinal>`` is an integer that specifies the ordinal of the GPU (which GPU do you want to use if you have more than one devices).
+    + ``gpu``: Same as ``cuda``.


Suggested change

+ ``gpu``: Same as ``cuda``.

+ ``gpu``: Default GPU device selection from the list of available and supported devices. Only ''cuda'' devices are supported currently.

Something along this lines

trivialfis marked this pull request as draft July 6, 2023 01:54

trivialfis mentioned this pull request Jul 11, 2023

Unify the hist tree method for different devices. #9363

Merged

trivialfis force-pushed the device-ordinal-ctx branch from d1a5337 to ff12bad Compare July 11, 2023 06:50

trivialfis added 9 commits July 11, 2023 23:35

Tests.

26d02c3

small cleanups.

887c1c4

use device.

79b88ff

cleanup

8119408

Tests.

2a49e07

Remove workaround.

eb7a17b

Cleanup prediction tests.

15b71cc

Remove workaround.

49f1e91

Document.

00e39b1

trivialfis added 2 commits July 12, 2023 12:55

remove warning test.

63e632c

Test type combination.

9260e6a

wbo4958 reviewed Jul 12, 2023

View reviewed changes

.../xgboost4j-spark-gpu/src/main/scala/ml/dmlc/xgboost4j/scala/rapids/spark/GpuPreXGBoost.scala Outdated Show resolved Hide resolved

trivialfis added 2 commits July 12, 2023 14:57

reviewer's comment.

ebbb852

give up.

99a6a48

trivialfis marked this pull request as ready for review July 12, 2023 08:25

trivialfis mentioned this pull request Jul 12, 2023

In place prediction with cupy on windows crashes #5793

Open

trivialfis added 5 commits July 13, 2023 10:06

mingw, string view.

dea8a75

lint.

1d91534

strings.

46add85

scala formatted string.

5e8f7f0

Constants.

45454b0

RAMitchell approved these changes Jul 13, 2023

View reviewed changes

doc/gpu/index.rst Show resolved Hide resolved

napetrov reviewed Jul 13, 2023

View reviewed changes

Reviewer's comment.

871dcbc

trivialfis mentioned this pull request Jul 13, 2023

[RFC] Unify device configuration. #7308

Closed

trivialfis merged commit 04aff3a into dmlc:master Jul 13, 2023

trivialfis deleted the device-ordinal-ctx branch July 13, 2023 11:30

trivialfis mentioned this pull request Jul 16, 2023

[jvm-packages] Add the new device parameter. #9385

Merged

ShellLM mentioned this pull request Aug 11, 2024

Xgboost 2.0.0 · dmlc/xgboost irthomasthomas/undecidability#878

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define the new `device` parameter. #9362

Define the new `device` parameter. #9362

trivialfis commented Jul 6, 2023 •

edited

Loading

trivialfis commented Jul 12, 2023

wbo4958 commented Jul 12, 2023

trivialfis commented Jul 13, 2023

napetrov Jul 13, 2023

trivialfis Jul 13, 2023 •

edited

Loading

trivialfis Jul 13, 2023

napetrov Jul 13, 2023

trivialfis Jul 13, 2023

napetrov Jul 13, 2023

trivialfis Jul 13, 2023

	+ ``gpu``: Same as ``cuda``.
	+ ``gpu``: Default GPU device selection from the list of available and supported devices. Only ''cuda'' devices are supported currently.

Define the new device parameter. #9362

Define the new device parameter. #9362

Conversation

trivialfis commented Jul 6, 2023 • edited Loading

trivialfis commented Jul 12, 2023

wbo4958 commented Jul 12, 2023

trivialfis commented Jul 13, 2023

napetrov Jul 13, 2023

Choose a reason for hiding this comment

trivialfis Jul 13, 2023 • edited Loading

Choose a reason for hiding this comment

trivialfis Jul 13, 2023

Choose a reason for hiding this comment

napetrov Jul 13, 2023

Choose a reason for hiding this comment

trivialfis Jul 13, 2023

Choose a reason for hiding this comment

napetrov Jul 13, 2023

Choose a reason for hiding this comment

trivialfis Jul 13, 2023

Choose a reason for hiding this comment

Define the new `device` parameter. #9362

Define the new `device` parameter. #9362

trivialfis commented Jul 6, 2023 •

edited

Loading

trivialfis Jul 13, 2023 •

edited

Loading