Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define the new device parameter. #9362

Merged
merged 20 commits into from
Jul 13, 2023
Merged

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Jul 6, 2023

Related: #7308

A new device parameter for replacing the previous gpu_id. The new parameter can take three different values and two additional aliases:

  • cpu
  • cuda
  • cuda:<ordinal> with <ordinal> as an integer
  • gpu alias to cuda
  • gpu:<ordinal> alias to cuda

The compatibility with gpu_id is kept. A warning is emitted when it's used. Along with the new parameter, gpu_hist is sunset.

After this PR, we still have many necessary changes in documents and demonstrations. I will leave them as follow-up PRs.

TODOs:

  • "GPU"
  • lower case.
  • Spark
  • PySpark
  • Test custom obj with deprecated gpu_hist and the new hist + device. (prediction cache, gradient)
  • Test inplace predict with mismatched data type and device.

Port the changes.

Start working on tests.

basic tests.

gpu id.

doc.

Remove `gpu_id` from configuration.

gpu id.

cpu build.

-1.

lower case.

alias.

Fix tests.

pyspark.

sklearn.

Cleanup.

cpu build & tests.

make.

Set device.

Cleanup.

R

scala.

spark.

spark tests.
@trivialfis trivialfis force-pushed the device-ordinal-ctx branch from d1a5337 to ff12bad Compare July 11, 2023 06:50
@trivialfis
Copy link
Member Author

@WeichenXu123 Would you like to take a look at the changes in the pyspark module?

@wbo4958
Copy link
Contributor

wbo4958 commented Jul 12, 2023

it's good from spark side.

@trivialfis trivialfis marked this pull request as ready for review July 12, 2023 08:25
@trivialfis
Copy link
Member Author

regex hangs on mingw ...

doc/gpu/index.rst Show resolved Hide resolved
+ ``cpu``: Use CPU.
+ ``cuda``: Use a GPU (CUDA device).
+ ``cuda:<ordinal>``: ``<ordinal>`` is an integer that specifies the ordinal of the GPU (which GPU do you want to use if you have more than one devices).
+ ``gpu``: Same as ``cuda``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that now this are equivalent from XGBoost perspective, but as far we are talking about API used in next many years would it make sense to not introduce this restriction?

Copy link
Member Author

@trivialfis trivialfis Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, that's why we have gpu and cuda. It's equal to cuda "for now", in the future others can make variants based on available GPU devices at run time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@napetrov Feel free to share your suggestions. ;-) Would like to get more opinions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean from API perspective users would be writing code assuming GPU device are always CUDA, but this would results in breaking change when there would be another GPU backend.

Might be it worth to point that this is not the same but just a convenient way for selecting default GPU device - so in long-term this would result in GPU dispatching regardless of particular HW. i.e. now this is not same as 'cuda' but is default GPU device selector although it can select only from ['cuda']. In this way we would clearly define expectations from API and would allow extensions here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for sharing, will change the document as suggested.

+ ``cpu``: Use CPU.
+ ``cuda``: Use a GPU (CUDA device).
+ ``cuda:<ordinal>``: ``<ordinal>`` is an integer that specifies the ordinal of the GPU (which GPU do you want to use if you have more than one devices).
+ ``gpu``: Same as ``cuda``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
+ ``gpu``: Same as ``cuda``.
+ ``gpu``: Default GPU device selection from the list of available and supported devices. Only ''cuda'' devices are supported currently.

Something along this lines

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants