Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for support of CUDA Memory Pools at runtime (#4679) #6440

Merged

Conversation

ao2
Copy link
Contributor

@ao2 ao2 commented Oct 21, 2023

Type

  • Bug fix (non-breaking change which fixes an issue): Fixes CUDA runtime error - open3D v0.14.1 #4679
  • New feature (non-breaking change which adds functionality). Resolves #
  • Breaking change (fix or feature that would cause existing functionality to not work as expected) Resolves #

Motivation and Context

Some CUDA GPUs, like the Quadro M3000M don't support Memory Pools operations like cudaMallocAsync/cudaFreeAsync even on driver versions newer than 11020, and this can result in errors like:

CUDA runtime error: operation not supported

So check for support at runtime instead of compile time.

Checklist:

  • I have run python util/check_style.py --apply to apply Open3D code style
    to my code.
  • This PR changes Open3D behavior or adds new functionality.
    • Both C++ (Doxygen) and Python (Sphinx / Google style) documentation is
      updated accordingly.
    • I have added or updated C++ and / or Python unit tests OR included test
      results
      (e.g. screenshots or numbers) here.
  • I will follow up and update the code if CI fails.
  • For fork PRs, I have selected Allow edits from maintainers.

Description

A new core::cuda::SupportsMemoryPools() helper function is added, and it is used to decide whether to use cudaMallocAsync/cudaMallocFree or not.

Initially I though about adding this to the Device class (e.g. Device::IsCUDAMemoryPoolsSupported()), but this is quite CUDA specific so I decided not to.

Please note that currently I cannot test the code on non-Quadro GPUs so I would like someone else to verify that it does not break already working cases.

Thanks,
Antonio


This change is Reviewable

@update-docs
Copy link

update-docs bot commented Oct 21, 2023

Thanks for submitting this pull request! The maintainers of this repository would appreciate if you could update the CHANGELOG.md based on your changes.

@ao2 ao2 force-pushed the ao2/check-support-for-memory-pools-at-runtime branch from aa1fb43 to 3967155 Compare October 21, 2023 09:26
@ao2 ao2 mentioned this pull request Oct 21, 2023
3 tasks
@ao2 ao2 force-pushed the ao2/check-support-for-memory-pools-at-runtime branch from 3967155 to 1ef0a48 Compare October 21, 2023 09:29
@ssheorey ssheorey requested a review from benjaminum October 24, 2023 16:25
@benjaminum
Copy link
Contributor

Thank you for this contribution! Please have a look at the checks. There are some problems with unknown identifiers error C3861: 'cudaFreeAsync': identifier not found.

@ao2 ao2 force-pushed the ao2/check-support-for-memory-pools-at-runtime branch from 4915a8a to 95447f4 Compare October 27, 2023 16:05
@ao2
Copy link
Contributor Author

ao2 commented Oct 27, 2023

I pushed an update with a compilation fix and I was able to build and run locally.

@ao2 ao2 force-pushed the ao2/check-support-for-memory-pools-at-runtime branch from 95447f4 to 25c31f0 Compare October 28, 2023 17:58
@ao2
Copy link
Contributor Author

ao2 commented Oct 28, 2023

OK, in the latest version I kept the compile time check as well to be able to build with CUDA versions older than 11.2, apparently some CI jobs are using 11.0.3

CUDA_VERSION: 11.0.3

Some CUDA GPUs, like the Quadro M3000M don't support Memory Pools
operations like cudaMallocAsync/cudaFreeAsync even on driver versions
newer than 11.2, and this can result in errors like:

  CUDA runtime error: operation not supported

So check for support at runtime instead of compile time.

Still keep the compile time check to support building with CUDA versions
older than 11.2.
@ao2 ao2 force-pushed the ao2/check-support-for-memory-pools-at-runtime branch from 25c31f0 to 51a6195 Compare October 30, 2023 08:27
@ao2
Copy link
Contributor Author

ao2 commented Oct 30, 2023

Pushed another update to also guard cudaDevAttrMemoryPoolsSupported behind cuda version >= 11.2

Hopefully this fixes the CI windows build issues with older cuda versions.

I have no clue about the failure in the headless-docs job tho, any suggestion?

Thanks, Antonio

@benjaminum
Copy link
Contributor

Pushed another update to also guard cudaDevAttrMemoryPoolsSupported behind cuda version >= 11.2

Hopefully this fixes the CI windows build issues with older cuda versions.

I have no clue about the failure in the headless-docs job tho, any suggestion?

Thanks, Antonio

Thanks! The headless-docs problem does not seem related to this PR

Copy link
Contributor

@benjaminum benjaminum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 0 of 4 files reviewed, all discussions resolved

@ssheorey ssheorey merged commit ad0edd0 into isl-org:master Oct 31, 2023
35 of 36 checks passed
@ao2 ao2 deleted the ao2/check-support-for-memory-pools-at-runtime branch October 31, 2023 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CUDA runtime error - open3D v0.14.1
3 participants