-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error with CUDA_ERROR_ILLEGAL_ADDRESS #313
Comments
Python version: 3.8.10 A relative issue: https://github.com/tensorflow/tensorflow/issues/50735 But I have tried CUDA_LAUNCH_BLOCKING=1 with no luck. |
what gpu and cpu hardware does this platform use? It seems like gpu or cpu memory too little or hardware impactive. |
硬件配置如下,理论上是够的。而且是一开始运行就崩溃。 实例规格 B1.large |
tensorflow版本可以自己选吗?尝试下配置为其他版本的 |
我换了一个Tensoflow 2.8.0的镜像,结果一样。不过我发现我之前用的镜像是Tensorflow 2.10.1。但是两个环境运行下面命令都返回2.8.4的版本。我怀疑我是不是不会用……
|
我知道上面的问题是为什么了,Tensoflow 2.8.4的版本要求是写在requirements.txt里的。我需要改这个文件。不过我有些奇怪,为什么requirements.txt里的版本要求这么严格,都是等于某一个版本,而不能只写几个主要的,其它按依赖安装。因为现在我改Tensorflow的版本会触发其它依赖失败,需要注释若干行才能通过。 更换版本Tensorflow版本之后运行train还是报CUDA_ERROR_ILLEGAL_ADDRESS错误。查版本的时候又报了一些错误,我去问一下平台社区,也许我安装Tensorflow的方式有误。
|
试了镜像提供的Tensorflow 2.8.0, 2.9.3, 2.10.1(不是通过pip安装的),都报CUDA_ERROR_ILLEGAL_ADDRESS错误。暂时没有什么思路了。 说明一点:在镜像提供Tensorflow的前提下,我只通过pip单独安装了matplotlib和scipy,没有安装requirements.txt。我感觉requirements.txt那个列表好像不是那么必要。实际只需要装几个,其它的依赖会自动解决。 |
I have successfully run training on a Ubuntu 22.04 without GPU.
However, I fail to run on platform.virtaicloud. Training aborted with CUDA_ERROR_ILLEGAL_ADDRESS.
What should I do? Anyone has idea can use the public environment mirror above to debug.
The text was updated successfully, but these errors were encountered: