Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDNN_STATUS_EXECUTION_FAILED #51

Open
mehranjeelani opened this issue Mar 16, 2021 · 8 comments
Open

RuntimeError: CUDNN_STATUS_EXECUTION_FAILED #51

mehranjeelani opened this issue Mar 16, 2021 · 8 comments

Comments

@mehranjeelani
Copy link

I get the following error when I use your trained model to test on vid4 dataset. I was able to compile deformable convolution and have torch version = 0.3.1 and python = 3.6 with cuda = 9.
Kindly help!
Traceback (most recent call last):
File "eval.py", line 117, in
output, _ = model(lr)
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 73, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 83, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 67, in parallel_apply
raise output
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 42, in _worker
output = module(*input, **kwargs)
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/data2/superresolution/video_sr/TDAN-VSR/model.py", line 225, in forward
out = self.relu(self.conv_first(y))
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 282, in forward
self.padding, self.dilation, self.groups)
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/functional.py", line 90, in conv2d
return f(input, weight, bias)
RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

@YapengTian
Copy link
Owner

Did you change any code and run the given test examples? It seems that the issue is from gpu device parallel. Sorry for the very late response.

@mehranjeelani
Copy link
Author

mehranjeelani commented Apr 6, 2021

Yes, I am changing the code a bit. I am actually testing on my custom dataset. My test directory is just the path to the folder containing all the frames, and I am accordingly changing the code.
Here is my python code for eval.py:

import argparse
import sys
import scipy
import os
from PIL import Image
import torch
os.environ["CUDA_VISIBLE_DEVICES"] = "2"
import numpy as np
from skimage import io, transform
from model import ModelFactory
from torch.autograd import Variable
import time
description='Video Super Resolution pytorch implementation'

def forward_x8(lr, forward_function=None):
        def _transform(v, op):
            v = v.float()

            v2np = v.data.cpu().numpy()
            #print(v2np.shape)
            if op == 'v':
                tfnp = v2np[:, :, :, :, ::-1].copy()
            elif op == 'h':
                tfnp = v2np[:, :, :, ::-1, :].copy()
            elif op == 't':
                tfnp = v2np.transpose((0, 1, 2, 4, 3)).copy()
	
            ret = Variable(torch.Tensor(tfnp).cuda())
            #ret = ret.half()

            return ret

        def _transform_back(v, op):
       		
            if op == 'v':
                tfnp = v[:, :, :, ::-1].copy()
            elif op == 'h':
                tfnp = v[:, :, ::-1, :].copy()
            elif op == 't':
                tfnp = v.transpose((0, 1, 3, 2)).copy()
	
            return tfnp

        
        x = [lr]
        for tf in 'v', 'h': x.extend([_transform(_x, tf) for _x in x])
       
        list_r = []
        for k in range(len(x)):
            z = x[k]
            r, _ = forward_function(z)
            r = r.data.cpu().numpy()
            if k % 4 > 1:
                    r =  _transform_back(r, 'h')
            if (k % 4) % 2 == 1:
                    r =  _transform_back(r, 'v')
            list_r.append(r)
        y = np.sum(list_r,  axis=0)/4.0
       
        y = Variable(torch.Tensor(y).cuda())
        if len(y) == 1: y = y[0]
        return y
def quantize(img, rgb_range):
    return img.mul(255 / rgb_range).clamp(0, 255).round()


parser = argparse.ArgumentParser(description=description)

parser.add_argument('-m', '--model', metavar='M', type=str, default='TDAN',
                    help='network architecture.')
parser.add_argument('-s', '--scale', metavar='S', type=int, default=4, 
                    help='interpolation scale. Default 4')
parser.add_argument('-t', '--test-set', metavar='NAME', type=str, default='../datasets/KLE_1519',
                    help='dataset for testing.')
parser.add_argument('-mp', '--model-path', metavar='MP', type=str, default='model',
                    help='model path.')
parser.add_argument('-sp', '--save-path', metavar='SP', type=str, default='res/KLE_1519_sr',
                    help='saving directory path.')
args = parser.parse_args()

model_factory = ModelFactory()
model = model_factory.create_model(args.model)
dir_LR = args.test_set
#lis = sorted(os.listdir(dir_LR))
model_path = os.path.join(args.model_path, 'model.pt')
if not os.path.exists(model_path):
    raise Exception('Cannot find %s.' %model_path)
model = torch.load(model_path)
model.eval()
path = args.save_path
if not os.path.exists(path):
    os.makedirs(path)

#for i in range(len(lis)):
for i in range(1):
    #print(lis[i])
    LR = dir_LR
    ims = sorted(os.listdir(LR))
    num = len(ims)
    # number of the seq
    num = len(ims)
    image = io.imread(os.path.join(LR, ims[0]))
    row, col, ch = image.shape
    frames_lr = np.zeros((5, int(row), int(col), ch))
    for j in range(num):
        for k in range(j-2, j + 3):
            idx = k-j+2
            if k < 0:
                k = -k
            if k >= num:
                k = num - 3
            frames_lr[idx, :, :, :] = io.imread(os.path.join(LR, ims[k]))
        start = time.time()
        frames_lr = frames_lr/255.0 - 0.5
        lr = torch.from_numpy(frames_lr).float().permute(0, 3, 1, 2)
        lr = Variable(lr.cuda()).unsqueeze(0).contiguous()
        output, _ = model(lr)
        #output = forward_x8(lr, model)
        output = (output.data + 0.5)*255
        output = quantize(output, 255)
        output = output.squeeze(dim=0)
        elapsed_time = time.time() - start
        print(elapsed_time)
        img_name = os.path.join(path,ims[j])
        Image.fromarray(np.around(output.cpu().numpy().transpose(1, 2, 0)).astype(np.uint8)).save(img_name)
        

        

@Jin-97
Copy link

Jin-97 commented Jun 15, 2021

I have the same problem.Did you solve it?

@mehranjeelani
Copy link
Author

Hi. No, I actually used another model which gave better results. I didn't bother to fix this

@YapengTian
Copy link
Owner

Sorry, I missed it. @Jin-97 Do you still have the problem. It is pretty weird to see a parallel issue since only GPU is used.

@Jin-97
Copy link

Jin-97 commented Jun 17, 2021

I reconfigure the dependencies:python=3.6.6,torch=0.3.1, cuda=9.1,and seem to solve the problem.Because a new problem has emerged:RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58.
I just run the test code, my GPU is GTX1080.Where can I change the batchsize?Or do you have any suggestions?

@YapengTian
Copy link
Owner

If you are training the model, using a smaller batchsize is a good choice. If you are running testing, I would like to suggest you use the chop_forward function in the solver https://github.com/YapengTian/TDAN-VSR-CVPR-2020/blob/master/solver.py , which split the whole video frames into smaller patches.

@Jin-97
Copy link

Jin-97 commented Jun 17, 2021

Thanks~~~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants