[cuda] Add CUDA version check #3249

0xzhang · 2021-10-22T05:09:46Z

Related issue = fixes #3195

Issue
CUDADriver have two behaviours:

Load library;
Load CUDA functions.

In the past, error info throwed when loading function failure. Because lack of CUDA version's check. It assumes loading CUDA functions will be success.

After fix this, the behaviour of with_cuda will consistent with with_opengl. If cuda is not supported, arch will falling back to cpu directly and don't throw error info like before.

Implementation
The way I implemented because,

Version checking should be done after the library has been successfully loaded.
Version checking should be done before many CUDA functions are loaded.
The lower version library can load successfully, but the version check failed and the function cannot be loaded.
driver_get_version() is a one of the CUDA functions in file cuda_driver_functions.inc.h. Therefore I extracted this function from the file.
detected() will be called twice. First time, it used to check status of library loading when construct CUDADriver. Second time, it checks whether CUDA driver is ready. Thus, I set a member flag cuda_version_valid initialized with true. When construct CUDADriver, this flag is senseless. After instance constructed, the return value of detected() will dependent on actual CUDA driver status.

netlify · 2021-10-22T05:11:36Z

✔️ Deploy Preview for jovial-fermat-aa59dc canceled.

🔨 Explore the source changes: 5c4ac74

🔍 Inspect the deploy log: https://app.netlify.com/sites/jovial-fermat-aa59dc/deploys/61744aaaa833ef000799ae9d

0xzhang · 2021-10-22T05:17:58Z

taichi/python/taichi/lang/__init__.py

Lines 991 to 1009 in 53e04c6

    
           arch_table = { 
        
               cuda: _ti_core.with_cuda, 
        
               metal: _ti_core.with_metal, 
        
               opengl: _ti_core.with_opengl, 
        
               cc: _ti_core.with_cc, 
        
               vulkan: lambda: _ti_core.with_vulkan(), 
        
               wasm: lambda: True, 
        
               cpu: lambda: True, 
        
           } 
        
           with_arch = arch_table.get(arch, lambda: False) 
        
           try: 
        
               return with_arch() 
        
           except Exception as e: 
        
               arch = _ti_core.arch_name(arch) 
        
               _ti_core.warn( 
        
                   f"{e.__class__.__name__}: '{e}' occurred when detecting " 
        
                   f"{arch}, consider add `export TI_WITH_{arch.upper()}=0` " 
        
                   f" to environment variables to depress this warning message.") 
        
               return False

For current archs, env variables TI_ENABLE_METAL/TI_ENABLE_OPENGL/TI_ENABLE_CUDA are avaliable. Env variable are not considered in arch cc and vulkan's implementation. They only use C++ macro TI_WITH_CC and TI_WITH_VULKAN to decide return value.

Currently, error info in file __init__.py is only work for metal, opengl and cuda. Support of env variables TI_ENABLE_VULKAN/TI_ENABLE_CC for vulkan and cc may be need to be implemented.

k-ye

Awesome! I only got a few nits

k-ye · 2021-10-22T14:08:33Z

python/taichi/lang/__init__.py

@@ -1004,7 +1004,7 @@ def is_arch_supported(arch):
        arch = _ti_core.arch_name(arch)
        _ti_core.warn(
            f"{e.__class__.__name__}: '{e}' occurred when detecting "
-            f"{arch}, consider add `export TI_WITH_{arch.upper()}=0` "
+            f"{arch}, consider add `TI_ENABLE_{arch.upper()}=0` "


Suggested change

f"{arch}, consider add `TI_ENABLE_{arch.upper()}=0` "

f"{arch}, consider adding `TI_ENABLE_{arch.upper()}=0` "

k-ye · 2021-10-22T14:09:03Z

python/taichi/lang/__init__.py

@@ -1004,7 +1004,7 @@ def is_arch_supported(arch):
        arch = _ti_core.arch_name(arch)
        _ti_core.warn(
            f"{e.__class__.__name__}: '{e}' occurred when detecting "
-            f"{arch}, consider add `export TI_WITH_{arch.upper()}=0` "
+            f"{arch}, consider add `TI_ENABLE_{arch.upper()}=0` "
            f" to environment variables to depress this warning message.")


// not your fault :-)

Suggested change

f" to environment variables to depress this warning message.")

f" to environment variables to suppress this warning message.")

k-ye · 2021-10-22T14:10:53Z

taichi/backends/cuda/cuda_driver.cpp

+    // CUDA versions should >= 10.
+    if (version < 10000) {
+      cuda_version_valid = false;
+      TI_WARN(


Let's show a bit more information here:

TI_WARN("The Taichi CUDA backend requires at least CUDA 10.0, got {}", version);

k-ye · 2021-10-22T14:12:08Z

taichi/backends/cuda/cuda_driver.h

@@ -119,6 +121,8 @@ class CUDADriver {
  std::unique_ptr<DynamicLoader> loader_;

  std::mutex lock_;
+
+  bool cuda_version_valid = true;


FYI, our CPP guide is at https://docs.taichi.graphics/lang/articles/contribution/cpp_style

Suggested change

bool cuda_version_valid = true;

bool cuda_version_valid_{false};

Here I use true value to initialize.
😂 It's tricky. But I consider use this way to solves the problem.

The return value of detected() dependent on CUDA version and library loading together.

If failed to load library. cuda_version_valid still be true. But always return false.

If successed to load library. cuda_version_valid will be set according actual status. And return value is dependent on cuda_version_valid.

It's all because the function detected() be first called when construct singleton instance CUDADriver. 😂

Maybe I could decoupling detected() from CUDADriver constructor. The value of cuda_version_valid will be seemed more safer.

Thanks for the detailed explanation!

The fact that detect() is called during initialization makes things a bit tricky -- Its value is supposed to be constant throughout the lifetime of CUDADriver. To improve the situation a bit, we can do this:

Cache the result of TI_ENABLE_CUDA in another bool member

Don't call detect() inside the ctor.

So the pseudo code can be something like this:

class CUDADriver { public: CUDADriver() { disabled_by_env_ = (get_environ_config("TI_ENABLE_CUDA", 1) == 0); if (disabled_by_env_) { // return directly } loader_ = ...; if (!loader_->loaded()) { // return directly } // Read version from the loader if (version < 10000) { cuda_version_valid_ = false; // rteturn directly } } bool detected() { return !disabled_by_env_ && valid_cuda_version_ && loader_->loaded(); } private: bool disabled_by_env_{false}; bool valid_cuda_version_{false}; };

Thanks very much!!
I changed the implementation.

k-ye

LGTM, thanks!

0xzhang added 2 commits October 21, 2021 22:36

[misc] Different ways set env variables in different platforms.

7ca8dcf

[cuda] Check CUDA version when python check arch.

f6e6bf9

k-ye approved these changes Oct 22, 2021

View reviewed changes

0xzhang added 2 commits October 23, 2021 00:45

[Fix] Fix according to k-ye's advice.

6202dc0

[Fix] No more call detected() in ctor.

5c4ac74

k-ye approved these changes Oct 24, 2021

View reviewed changes

k-ye merged commit 7787245 into taichi-dev:master Oct 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cuda] Add CUDA version check #3249

[cuda] Add CUDA version check #3249

0xzhang commented Oct 22, 2021

netlify bot commented Oct 22, 2021 •

edited

Loading

0xzhang commented Oct 22, 2021 •

edited

Loading

k-ye left a comment

k-ye Oct 22, 2021

k-ye Oct 22, 2021

k-ye Oct 22, 2021

k-ye Oct 22, 2021 •

edited

Loading

0xzhang Oct 22, 2021 •

edited

Loading

k-ye Oct 23, 2021 •

edited

Loading

0xzhang Oct 23, 2021 •

edited

Loading

k-ye left a comment

	f"{arch}, consider add `TI_ENABLE_{arch.upper()}=0` "
	f"{arch}, consider adding `TI_ENABLE_{arch.upper()}=0` "

	f" to environment variables to depress this warning message.")
	f" to environment variables to suppress this warning message.")

	bool cuda_version_valid = true;
	bool cuda_version_valid_{false};

[cuda] Add CUDA version check #3249

[cuda] Add CUDA version check #3249

Conversation

0xzhang commented Oct 22, 2021

netlify bot commented Oct 22, 2021 • edited Loading

0xzhang commented Oct 22, 2021 • edited Loading

k-ye left a comment

Choose a reason for hiding this comment

k-ye Oct 22, 2021

Choose a reason for hiding this comment

k-ye Oct 22, 2021

Choose a reason for hiding this comment

k-ye Oct 22, 2021

Choose a reason for hiding this comment

k-ye Oct 22, 2021 • edited Loading

Choose a reason for hiding this comment

0xzhang Oct 22, 2021 • edited Loading

Choose a reason for hiding this comment

k-ye Oct 23, 2021 • edited Loading

Choose a reason for hiding this comment

0xzhang Oct 23, 2021 • edited Loading

Choose a reason for hiding this comment

k-ye left a comment

Choose a reason for hiding this comment

netlify bot commented Oct 22, 2021 •

edited

Loading

0xzhang commented Oct 22, 2021 •

edited

Loading

k-ye Oct 22, 2021 •

edited

Loading

0xzhang Oct 22, 2021 •

edited

Loading

k-ye Oct 23, 2021 •

edited

Loading

0xzhang Oct 23, 2021 •

edited

Loading