Some error occured while running the torchrun.py #19

seven-sent · 2021-04-26T01:29:44Z

Thank you for sharing your project.
But I have some trouble in running the training code.

Traceback (most recent call last):
File "torchrun.py", line 325, in
net_manager.train()
File "torchrun.py", line 145, in train
outputs = model(x, *y)
File "D:\download\Anaconda3\envs\torch_env\lib\site-packages\torch\nn\modules\module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "D:\Study\3D\ExampleCode\2021\PRNet-PyTorch-master\torchmodel.py", line 63, in forward
x = self.encoder(x)
File "D:\download\Anaconda3\envs\torch_env\lib\site-packages\torch\nn\modules\module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "D:\download\Anaconda3\envs\torch_env\lib\site-packages\torch\nn\modules\container.py", line 100, in forward
input = module(input)
File "D:\download\Anaconda3\envs\torch_env\lib\site-packages\torch\nn\modules\module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "D:\Study\3D\ExampleCode\2021\PRNet-PyTorch-master\torchmodule.py", line 118, in forward
assert (s.shape == out.shape)
AssertionError

The output shape:
s.shape: torch.Size([15, 32, 128, 128])
out.shape: torch.Size([15, 32, 129, 129])

I did not change any code.
How can I solve it?

reshow · 2021-04-26T02:25:06Z

I think this is because your pytorch version is not v1.2. I came up with this error too with pytorch v1.6.

seven-sent · 2021-04-26T03:02:32Z

I think this is because your pytorch version is not v1.2. I came up with this error too with pytorch v1.6.

Yes. My pytorch version is v1.7.
Then I successfully ran the code on other environment with pytorch 1.1.

I wonder if there is a way to make the code suitable for higher pytorch version?

reshow · 2021-04-26T03:07:29Z

I have not solved this problem yet. A barely satisfactory way is to manually pad the convolution results.

super3kl · 2021-04-26T12:53:32Z

i got the same problem
there is a bug in torch
while you are using nn.conv2d and set it's padding mode as circular

PRNet-PyTorch/torchmodule.py

Line 97 in 9b0a5dc

kernel_size=kernel_size, padding=kernel_size - 1, padding_mode='circular'),

different version of torch will get different size of feather map from the same input
but another reason maybe the code

PRNet-PyTorch/torchmodel.py

Line 27 in 9b0a5dc

    
           self.layer0 = Conv2d_BN_AC(in_channels=3, out_channels=feature_size, kernel_size=4, stride=1, padding=1)  # 256 x 256 x 16

the feather map size come out with 255x255 instead of 256x256
for the repo it's fine to install torch1.20 and torchvison 0.4.0 without any problem
for higher version of torch ,changing the conv2d settings does work

seven-sent · 2021-04-27T13:32:58Z

i got the same problem
there is a bug in torch
while you are using nn.conv2d and set it's padding mode as circular

PRNet-PyTorch/torchmodule.py

Line 97 in 9b0a5dc

kernel_size=kernel_size, padding=kernel_size - 1, padding_mode='circular'),

different version of torch will get different size of feather map from the same input
but another reason maybe the code

PRNet-PyTorch/torchmodel.py

Line 27 in 9b0a5dc

self.layer0 = Conv2d_BN_AC(in_channels=3, out_channels=feature_size, kernel_size=4, stride=1, padding=1) # 256 x 256 x 16

the feather map size come out with 255x255 instead of 256x256
for the repo it's fine to install torch1.20 and torchvison 0.4.0 without any problem
for higher version of torch ,changing the conv2d settings does work

Thank you so much for your help!!
However, to avoid more mistakes, I decided to use the lower version of pytorch.

HeBangYan · 2021-12-31T02:17:20Z

i got the same problem there is a bug in torch while you are using nn.conv2d and set it's padding mode as circular

PRNet-PyTorch/torchmodule.py

Line 97 in 9b0a5dc

kernel_size=kernel_size, padding=kernel_size - 1, padding_mode='circular'),

different version of torch will get different size of feather map from the same input
but another reason maybe the code

PRNet-PyTorch/torchmodel.py

Line 27 in 9b0a5dc

self.layer0 = Conv2d_BN_AC(in_channels=3, out_channels=feature_size, kernel_size=4, stride=1, padding=1) # 256 x 256 x 16

the feather map size come out with 255x255 instead of 256x256
for the repo it's fine to install torch1.20 and torchvison 0.4.0 without any problem
for higher version of torch ,changing the conv2d settings does work

Hello, I met the same problem, how did you modify the Conv2D Settings?

maltempoLuca · 2022-04-28T09:27:38Z

i got the same problem there is a bug in torch while you are using nn.conv2d and set it's padding mode as circular

PRNet-PyTorch/torchmodule.py

Line 97 in 9b0a5dc

kernel_size=kernel_size, padding=kernel_size - 1, padding_mode='circular'),

different version of torch will get different size of feather map from the same input
but another reason maybe the code

PRNet-PyTorch/torchmodel.py

Line 27 in 9b0a5dc

self.layer0 = Conv2d_BN_AC(in_channels=3, out_channels=feature_size, kernel_size=4, stride=1, padding=1) # 256 x 256 x 16

the feather map size come out with 255x255 instead of 256x256
for the repo it's fine to install torch1.20 and torchvison 0.4.0 without any problem
for higher version of torch ,changing the conv2d settings does work

Hello, I met the same problem, how did you modify the Conv2D Settings?

@HeBangYan, @seven-sent, @super3kl

Hi, after a lot of hours and tears I have found the correct Setting, with this setting the entire architecture run on Pytorch 1.11 march 2022.
In torchmodel.py the only modified line is self.layer0, you can use the 2 combination of kernel_size and padding as it is commented:

class InitPRN2(nn.Module):
    def __init__(self):
        super(InitPRN2, self).__init__()
        self.feature_size = 16
        feature_size = self.feature_size
        self.layer0 = Conv2d_BN_AC(in_channels=3, out_channels=feature_size, kernel_size=4, stride=1,
                                   padding='same')  # 256 x 256 x 16. run with {kernel_size:3, padding:1} or {4, 'same'}
        self.encoder = nn.Sequential(
            PRNResBlock(in_channels=feature_size, out_channels=feature_size * 2, kernel_size=4, stride=2, with_conv_shortcut=True),  # 128 x 128 x 32
            PRNResBlock(in_channels=feature_size * 2, out_channels=feature_size * 2, kernel_size=4, stride=1, with_conv_shortcut=False),  # 128 x 128 x 32
            PRNResBlock(in_channels=feature_size * 2, out_channels=feature_size * 4, kernel_size=4, stride=2, with_conv_shortcut=True),  # 64 x 64 x 64
            PRNResBlock(in_channels=feature_size * 4, out_channels=feature_size * 4, kernel_size=4, stride=1, with_conv_shortcut=False),  # 64 x 64 x 64
            PRNResBlock(in_channels=feature_size * 4, out_channels=feature_size * 8, kernel_size=4, stride=2, with_conv_shortcut=True),  # 32 x 32 x 128
            PRNResBlock(in_channels=feature_size * 8, out_channels=feature_size * 8, kernel_size=4, stride=1, with_conv_shortcut=False),  # 32 x 32 x 128
            PRNResBlock(in_channels=feature_size * 8, out_channels=feature_size * 16, kernel_size=4, stride=2, with_conv_shortcut=True),  # 16 x 16 x 256
            PRNResBlock(in_channels=feature_size * 16, out_channels=feature_size * 16, kernel_size=4, stride=1, with_conv_shortcut=False),  # 16 x 16 x 256
            PRNResBlock(in_channels=feature_size * 16, out_channels=feature_size * 32, kernel_size=4, stride=2, with_conv_shortcut=True),  # 8 x 8 x 512
            PRNResBlock(in_channels=feature_size * 32, out_channels=feature_size * 32, kernel_size=4, stride=1, with_conv_shortcut=False),  # 8 x 8 x 512
        )
        self.decoder = nn.Sequential(
            ConvTranspose2d_BN_AC(in_channels=feature_size * 32, out_channels=feature_size * 32, kernel_size=4, stride=1),  # 8 x 8 x 512
            ConvTranspose2d_BN_AC(in_channels=feature_size * 32, out_channels=feature_size * 16, kernel_size=4, stride=2),  # 16 x 16 x 256
            ConvTranspose2d_BN_AC(in_channels=feature_size * 16, out_channels=feature_size * 16, kernel_size=4, stride=1),  # 16 x 16 x 256
            ConvTranspose2d_BN_AC(in_channels=feature_size * 16, out_channels=feature_size * 16, kernel_size=4, stride=1),  # 16 x 16 x 256
            ConvTranspose2d_BN_AC(in_channels=feature_size * 16, out_channels=feature_size * 8, kernel_size=4, stride=2),  # 32 x 32 x 128
            ConvTranspose2d_BN_AC(in_channels=feature_size * 8, out_channels=feature_size * 8, kernel_size=4, stride=1),  # 32 x 32 x 128
            ConvTranspose2d_BN_AC(in_channels=feature_size * 8, out_channels=feature_size * 8, kernel_size=4, stride=1),  # 32 x 32 x 128
            ConvTranspose2d_BN_AC(in_channels=feature_size * 8, out_channels=feature_size * 4, kernel_size=4, stride=2),  # 64 x 64 x 64
            ConvTranspose2d_BN_AC(in_channels=feature_size * 4, out_channels=feature_size * 4, kernel_size=4, stride=1),  # 64 x 64 x 64
            ConvTranspose2d_BN_AC(in_channels=feature_size * 4, out_channels=feature_size * 4, kernel_size=4, stride=1),  # 64 x 64 x 64
            ConvTranspose2d_BN_AC(in_channels=feature_size * 4, out_channels=feature_size * 2, kernel_size=4, stride=2),
            ConvTranspose2d_BN_AC(in_channels=feature_size * 2, out_channels=feature_size * 2, kernel_size=4, stride=1),
            ConvTranspose2d_BN_AC(in_channels=feature_size * 2, out_channels=feature_size * 1, kernel_size=4, stride=2),
            ConvTranspose2d_BN_AC(in_channels=feature_size * 1, out_channels=feature_size * 1, kernel_size=4, stride=1),
            ConvTranspose2d_BN_AC(in_channels=feature_size * 1, out_channels=3, kernel_size=4, stride=1),
            ConvTranspose2d_BN_AC(in_channels=3, out_channels=3, kernel_size=4, stride=1),
            ConvTranspose2d_BN_AC(in_channels=3, out_channels=3, kernel_size=4, stride=1, activation=nn.Tanh())
        )
        self.loss = InitLoss()

Last thing you have to modify is in torchmodule.py, this is the new PRNResBlock class:

class PRNResBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, with_conv_shortcut=False,
                 activation=nn.ReLU(inplace=True)):
        super(PRNResBlock, self).__init__()

        if kernel_size % 2 == 1:
            self.pipe = nn.Sequential(
                Conv2d_BN_AC(in_channels=in_channels, out_channels=out_channels // 2, stride=1, kernel_size=1),
                Conv2d_BN_AC(in_channels=out_channels // 2, out_channels=out_channels // 2, stride=stride,
                             kernel_size=kernel_size, padding=(kernel_size - 1) // 2),
                nn.Conv2d(in_channels=out_channels // 2, out_channels=out_channels, stride=1, kernel_size=1,
                          bias=False))
        else:  # even kernel
            if stride == 1:
                self.pipe = nn.Sequential(
                    Conv2d_BN_AC(in_channels=in_channels, out_channels=out_channels // 2, stride=1, kernel_size=1),
                    Conv2d_BN_AC(in_channels=out_channels // 2, out_channels=out_channels // 2, stride=stride,
                                 kernel_size=kernel_size, padding=kernel_size - 1, padding_mode='circular'),
                    nn.Conv2d(in_channels=out_channels // 2, out_channels=out_channels, stride=1,
                              kernel_size=kernel_size, bias=False))
            elif stride == 2:
                self.pipe = nn.Sequential(
                    Conv2d_BN_AC(in_channels=in_channels, out_channels=out_channels // 2, stride=1, kernel_size=1),
                    Conv2d_BN_AC(in_channels=out_channels // 2, out_channels=out_channels // 2, stride=stride,
                                 kernel_size=kernel_size, padding=kernel_size - 1, padding_mode='circular'),
                    nn.Conv2d(in_channels=out_channels // 2, out_channels=out_channels, stride=1,
                              kernel_size=kernel_size - 1, bias=False))
            else:
                print('Stride dimension are wrong:', stride)
            self.shortcut = nn.Sequential()

        if with_conv_shortcut:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels=in_channels, out_channels=out_channels, stride=stride, kernel_size=1, bias=False),
            )
        self.BN_AC = nn.Sequential(
            nn.BatchNorm2d(out_channels, eps=0.001, momentum=0.5),
            activation
        )

As you can see, in case of even kernel, if you want to maintain the correct size of output after the layer you need to use different kernel_size based on stride in nn.Conv2d.

You have to retrain the CNN and you can not use the saved model given by @reshow .
Hope you are still interested ahahahahah
Have a nice day :D

jinwkim · 2022-09-12T18:12:54Z

@maltempoLuca Thank you for sharing the workaround. I was able to train the model from scratch using your modifications, but the model I got has terrible output. Do you have your retrained model available to share? I'd like to try yours and see how it is.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some error occured while running the torchrun.py #19

Some error occured while running the torchrun.py #19

seven-sent commented Apr 26, 2021

reshow commented Apr 26, 2021

seven-sent commented Apr 26, 2021

reshow commented Apr 26, 2021

super3kl commented Apr 26, 2021

seven-sent commented Apr 27, 2021

HeBangYan commented Dec 31, 2021

maltempoLuca commented Apr 28, 2022 •

edited

Loading

jinwkim commented Sep 12, 2022

Some error occured while running the torchrun.py #19

Some error occured while running the torchrun.py #19

Comments

seven-sent commented Apr 26, 2021

reshow commented Apr 26, 2021

seven-sent commented Apr 26, 2021

reshow commented Apr 26, 2021

super3kl commented Apr 26, 2021

seven-sent commented Apr 27, 2021

HeBangYan commented Dec 31, 2021

maltempoLuca commented Apr 28, 2022 • edited Loading

jinwkim commented Sep 12, 2022

maltempoLuca commented Apr 28, 2022 •

edited

Loading