Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize adapter element counting on GPU. #9209

Merged
merged 2 commits into from
May 30, 2023

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented May 27, 2023

  • Implement a simple IterSpan for passing iterators with size.
  • Use shared memory for column size counts.
  • Use one thread for each sample in row count to reduce atomic operations.

The first two items are from #9194 .

The time used for row counts after and before the PR:

 Time (%)  Total Time (ns)  Instances    Avg (ns)      Med (ns)     Min (ns)    Max (ns)   StdDev (ns)                                                  Name                                                
      2.1       44,654,359         32   1,395,448.7   1,393,922.5   1,329,936   1,466,870     61,552.5  void dh::LaunchNKernel<unsigned long xgboost::data::GetRowCounts<xgboost::data::CupyAdapterBatch>(T…
      5.1      121,245,208         32   3,788,912.8   3,793,333.0   3,609,087   3,896,713     94,741.1  void dh::LaunchNKernel<unsigned long xgboost::data::GetRowCounts<xgboost::data::CupyAdapterBatch>(T…

@trivialfis trivialfis merged commit 17fd3f5 into dmlc:master May 30, 2023
@trivialfis trivialfis deleted the opt-entry-counts branch May 30, 2023 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants