-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cuda] Add CUDA version check #3249
Conversation
✔️ Deploy Preview for jovial-fermat-aa59dc canceled. 🔨 Explore the source changes: 5c4ac74 🔍 Inspect the deploy log: https://app.netlify.com/sites/jovial-fermat-aa59dc/deploys/61744aaaa833ef000799ae9d |
taichi/python/taichi/lang/__init__.py Lines 991 to 1009 in 53e04c6
For current archs, env variables Currently, error info in file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! I only got a few nits
python/taichi/lang/__init__.py
Outdated
@@ -1004,7 +1004,7 @@ def is_arch_supported(arch): | |||
arch = _ti_core.arch_name(arch) | |||
_ti_core.warn( | |||
f"{e.__class__.__name__}: '{e}' occurred when detecting " | |||
f"{arch}, consider add `export TI_WITH_{arch.upper()}=0` " | |||
f"{arch}, consider add `TI_ENABLE_{arch.upper()}=0` " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f"{arch}, consider add `TI_ENABLE_{arch.upper()}=0` " | |
f"{arch}, consider adding `TI_ENABLE_{arch.upper()}=0` " |
python/taichi/lang/__init__.py
Outdated
@@ -1004,7 +1004,7 @@ def is_arch_supported(arch): | |||
arch = _ti_core.arch_name(arch) | |||
_ti_core.warn( | |||
f"{e.__class__.__name__}: '{e}' occurred when detecting " | |||
f"{arch}, consider add `export TI_WITH_{arch.upper()}=0` " | |||
f"{arch}, consider add `TI_ENABLE_{arch.upper()}=0` " | |||
f" to environment variables to depress this warning message.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// not your fault :-)
f" to environment variables to depress this warning message.") | |
f" to environment variables to suppress this warning message.") |
taichi/backends/cuda/cuda_driver.cpp
Outdated
// CUDA versions should >= 10. | ||
if (version < 10000) { | ||
cuda_version_valid = false; | ||
TI_WARN( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's show a bit more information here:
TI_WARN("The Taichi CUDA backend requires at least CUDA 10.0, got {}", version);
taichi/backends/cuda/cuda_driver.h
Outdated
@@ -119,6 +121,8 @@ class CUDADriver { | |||
std::unique_ptr<DynamicLoader> loader_; | |||
|
|||
std::mutex lock_; | |||
|
|||
bool cuda_version_valid = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, our CPP guide is at https://docs.taichi.graphics/lang/articles/contribution/cpp_style
bool cuda_version_valid = true; | |
bool cuda_version_valid_{false}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I use true
value to initialize.
😂 It's tricky. But I consider use this way to solves the problem.
The return value of detected()
dependent on CUDA version and library loading together.
- If failed to load library.
cuda_version_valid
still be true. But always returnfalse
. - If successed to load library.
cuda_version_valid
will be set according actual status. And return value is dependent oncuda_version_valid
.
It's all because the function detected()
be first called when construct singleton instance CUDADriver
. 😂
Maybe I could decoupling detected()
from CUDADriver
constructor. The value of cuda_version_valid
will be seemed more safer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation!
The fact that detect()
is called during initialization makes things a bit tricky -- Its value is supposed to be constant throughout the lifetime of CUDADriver
. To improve the situation a bit, we can do this:
- Cache the result of
TI_ENABLE_CUDA
in another bool member - Don't call
detect()
inside the ctor.
So the pseudo code can be something like this:
class CUDADriver {
public:
CUDADriver() {
disabled_by_env_ = (get_environ_config("TI_ENABLE_CUDA", 1) == 0);
if (disabled_by_env_) {
// return directly
}
loader_ = ...;
if (!loader_->loaded()) {
// return directly
}
// Read version from the loader
if (version < 10000) {
cuda_version_valid_ = false;
// rteturn directly
}
}
bool detected() {
return !disabled_by_env_ && valid_cuda_version_ && loader_->loaded();
}
private:
bool disabled_by_env_{false};
bool valid_cuda_version_{false};
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much!!
I changed the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Related issue = fixes #3195
Issue
CUDADriver
have two behaviours:In the past, error info throwed when loading function failure. Because lack of CUDA version's check. It assumes loading CUDA functions will be success.
After fix this, the behaviour of
with_cuda
will consistent withwith_opengl
. If cuda is not supported, arch will falling back to cpu directly and don't throw error info like before.Implementation
The way I implemented because,
driver_get_version()
is a one of the CUDA functions in filecuda_driver_functions.inc.h
. Therefore I extracted this function from the file.detected()
will be called twice. First time, it used to check status of library loading when constructCUDADriver
. Second time, it checks whether CUDA driver is ready. Thus, I set a member flagcuda_version_valid
initialized withtrue
. When constructCUDADriver
, this flag is senseless. After instance constructed, the return value ofdetected()
will dependent on actual CUDA driver status.