Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python MI_Proposed_CNNs_Architecture.py 执行错误 #10

Open
pioneerRick opened this issue Mar 13, 2024 · 5 comments
Open

python MI_Proposed_CNNs_Architecture.py 执行错误 #10

pioneerRick opened this issue Mar 13, 2024 · 5 comments

Comments

@pioneerRick
Copy link

Traceback (most recent call last):
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(128, 4), b.shape=(512, 4), m=128, n=512, k=4
[[{{node Train_Optimizer/gradients/Output_Layer/prediction/MatMul_grad/MatMul}} = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Train_Optimizer/gradients/Output_Layer/prediction/add_grad/tuple/control_dependency, Output_Layer/W_fc2/Variable/read)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "MI_Proposed_CNNs_Architecture.py", line 582, in
sess.run(train_step, feed_dict={x: batch_xs, y: batch_ys, keep_prob: 0.50})
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(128, 4), b.shape=(512, 4), m=128, n=512, k=4
[[node Train_Optimizer/gradients/Output_Layer/prediction/MatMul_grad/MatMul (defined at MI_Proposed_CNNs_Architecture.py:301) = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Train_Optimizer/gradients/Output_Layer/prediction/add_grad/tuple/control_dependency, Output_Layer/W_fc2/Variable/read)]]

Caused by op 'Train_Optimizer/gradients/Output_Layer/prediction/MatMul_grad/MatMul', defined at:
File "MI_Proposed_CNNs_Architecture.py", line 301, in
train_step = tf.train.AdamOptimizer(1e-5).minimize(loss)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 400, in minimize
grad_loss=grad_loss)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 519, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 630, in gradients
gate_gradients, aggregation_method, stop_gradients)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 814, in _GradientsHelper
lambda: grad_fn(op, *out_grads))
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 408, in _MaybeCompile
return grad_fn() # Exit early
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 814, in
lambda: grad_fn(op, *out_grads))
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py", line 1130, in _MatMulGrad
grad_a = gen_math_ops.mat_mul(grad, b, transpose_b=True)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4560, in mat_mul
name=name)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

...which was originally created as op 'Output_Layer/prediction/MatMul', defined at:
File "MI_Proposed_CNNs_Architecture.py", line 290, in
prediction = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 2057, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4560, in mat_mul
name=name)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(128, 4), b.shape=(512, 4), m=128, n=512, k=4
[[node Train_Optimizer/gradients/Output_Layer/prediction/MatMul_grad/MatMul (defined at MI_Proposed_CNNs_Architecture.py:301) = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Train_Optimizer/gradients/Output_Layer/prediction/add_grad/tuple/control_dependency, Output_Layer/W_fc2/Variable/read)]]

作者您好,我是一名大三的学生,最近正在复现您的论文寻找灵感。但是当我运行(Under Python 3.6 Environment) $ python MI_Proposed_CNNs_Architecture.py 时候遇到了以上错误,我查询了很多资料都没有结果,目前最大的可能性是tensorflow版本和cuda版本不匹配,但是我不确定这是否正确。

我的电脑配置如下:
NVIDIA-SMI 535.146.02 Driver Version: 535.146.02 CUDA Version: 12.2
NVIDIA GeForce RTX 4090 显存24G

conda 环境如下: 这个conda 环境是运行在 python 3.6.13 下

absl-py 0.15.0
astor 0.8.1
certifi 2021.5.30
coverage 5.5
Cython 0.29.24
dataclasses 0.8
et-xmlfile 1.1.0
gast 0.5.3
grpcio 1.36.1
h5py 2.10.0
importlib-metadata 4.8.1
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.2
Markdown 3.3.4
mkl-fft 1.3.0
mkl-random 1.1.1
mkl-service 2.3.0
numpy 1.19.2
openpyxl 3.1.2
pandas 1.1.5
pip 20.0.2
protobuf 3.17.2
python-dateutil 2.9.0.post0
pytz 2024.1
scipy 1.5.2
setuptools 36.4.0
six 1.16.0
tensorboard 1.12.2
tensorflow 1.12.0
termcolor 1.1.0
typing-extensions 4.1.1
Werkzeug 2.0.3
wheel 0.37.1
xlrd 1.2.0
zipp 3.6.0

@pioneerRick
Copy link
Author

image
当我运行将batch_size 调整到 64以上的时候遇到是那个错误

@pioneerRick
Copy link
Author

但是当我将batch_size大小调整到16的时候我遇到的是一个关于显存不足的错误:
Traceback (most recent call last):
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[134487,32,32,20] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node Convolutional_1/h_conv1/Conv2D}} = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Input/Reshape_Data/Reshape, Convolutional_1/W_conv1/Variable/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[{{node loss/Euclidean_Distance/Mean/_69}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_372_loss/Euclidean_Distance/Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "MI_Proposed_CNNs_Architecture.py", line 584, in
train_acc, train_loss = sess.run([Global_Average_Accuracy, loss], feed_dict={x: train_data, y: train_labels, keep_prob: 1.0})
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[134487,32,32,20] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node Convolutional_1/h_conv1/Conv2D (defined at MI_Proposed_CNNs_Architecture.py:101) = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Input/Reshape_Data/Reshape, Convolutional_1/W_conv1/Variable/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[{{node loss/Euclidean_Distance/Mean/_69}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_372_loss/Euclidean_Distance/Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Caused by op 'Convolutional_1/h_conv1/Conv2D', defined at:
File "MI_Proposed_CNNs_Architecture.py", line 101, in
h_conv1 = tf.nn.conv2d(x_Reshape, W_conv1, strides=[1, 1, 1, 1], padding='SAME') + b_conv1
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 957, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/root/miniconda3/envs/oldMotor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[134487,32,32,20] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node Convolutional_1/h_conv1/Conv2D (defined at MI_Proposed_CNNs_Architecture.py:101) = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Input/Reshape_Data/Reshape, Convolutional_1/W_conv1/Variable/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[{{node loss/Euclidean_Distance/Mean/_69}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_372_loss/Euclidean_Distance/Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info

@pioneerRick
Copy link
Author

image
这或者也能成为您指导我的线索,由于我通过matlab仅仅生成了Excel .xlsx Files文件,所以我将您原本的 .csv 文件全部换成了.xlsx文件进行读取,但是我感觉这个无法构成我错误的原因,但如果这能启发您请给我回复

@pioneerRick
Copy link
Author

其实还有一个可能的错误,您使用的是windows环境,而我使用的unbuntu16.0.4 ,请问这个是否也可能是我错误的原因呢?

@pioneerRick
Copy link
Author

您好,在我更换了一个更大的显卡,Tesla V100-32GB
环境是cuda 12.2 ,python 3.6.3
tensorflow-gpu=1.13.1
解决了以上的两个错误,可以判断是因为显存大小的问题
但是当我解决了以上两个问题的时候我遇到了另外一个问题
bfd173ac3707252d5f0dd3feec73ade
我搜索到的结果的是输入的input太小,做卷积运算的时候input会越来越小,过于小会“无法卷”,报错。
目前的解决方案估计是更换神经网络每层的大小,请问这是否意味着一开始的层数和每层的输入和输出大小有误。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant