Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Symbolic Cholesky performance #1758

Draft
wants to merge 33 commits into
base: develop
Choose a base branch
from
Draft

Improve Symbolic Cholesky performance #1758

wants to merge 33 commits into from

Conversation

upsj
Copy link
Member

@upsj upsj commented Dec 19, 2024

This improves the symbolic Cholesky performance by preprocessing the matrix on the GPU with a Minimum Spanning Tree algorithm.

Example rgg_22 from SuiteSparse with METIS nested dissection on H100:

  • Before: 0.76 s
  • After: 0.5 s

The performance improvements are split between device-host transfer (transferring a spanning tree instead of the full matrix) and the elimination tree computation (operating on a sparser graph)

@upsj upsj requested a review from a team December 19, 2024 20:44
@upsj upsj self-assigned this Dec 19, 2024
@upsj upsj added the 1:ST:ready-for-review This PR is ready for review label Dec 19, 2024
@ginkgo-bot ginkgo-bot added reg:testing This is related to testing. reg:benchmarking This is related to benchmarking. type:factorization This is related to the Factorizations reg:helper-scripts This issue/PR is related to the helper scripts mainly concerned with development of Ginkgo. mod:all This touches all Ginkgo modules. labels Dec 19, 2024
@upsj upsj marked this pull request as draft December 27, 2024 00:18
Copy link
Member

@yhmtsai yhmtsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for omp part, does it also show better performance than the previous one?
previous one seems to be much shorter than the current one, so maybe we can keep the old one if it does not give better performance

core/factorization/cholesky.cpp Outdated Show resolved Hide resolved
core/factorization/ic.cpp Outdated Show resolved Hide resolved
core/test/components/range_minimum_query.cpp Outdated Show resolved Hide resolved
core/test/components/range_minimum_query.cpp Show resolved Hide resolved
core/test/components/range_minimum_query.cpp Outdated Show resolved Hide resolved
core/test/components/range_minimum_query.cpp Outdated Show resolved Hide resolved
core/test/components/range_minimum_query.cpp Show resolved Hide resolved
test/factorization/cholesky_kernels.cpp Outdated Show resolved Hide resolved
@upsj
Copy link
Member Author

upsj commented Dec 27, 2024

The OpenMP part is not yet parallelized, and the skeleton tree computation is not enabled by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-for-review This PR is ready for review mod:all This touches all Ginkgo modules. reg:benchmarking This is related to benchmarking. reg:helper-scripts This issue/PR is related to the helper scripts mainly concerned with development of Ginkgo. reg:testing This is related to testing. type:factorization This is related to the Factorizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants