ConvNext in_channels > 3 leads to RunTime Error #1869

aaprasad · 2023-07-06T16:10:05Z

aaprasad
Jul 6, 2023

Hi there! I'm trying to use convnext on multichannel audio spectrogram but am hitting an error:

import timm
encoder = timm.create_model("convnextv2_base", in_chans=8, pretrained=False)
spectrogram = compute_spectrogram(audio) #shape (1, 8, 129, 26)
encoder.forward_features(audio)

RuntimeError: Calculated padded input size per channel: (8 x 1). Kernel size: (2 x 2). Kernel size can't be greater than actual input size

Is this just an issue with the spectrogram shape or am I missing something? For reference running the same code but using resnet18 as the encoder works fine

Answered by TorbenSDJohansen

Jul 7, 2023

I don't think this is related to the number of channels, but rather to the other dimensions (specifically, the small last dimension). To see why, note that the following should run with no problems:

import torch
import timm

encoder = timm.create_model("convnextv2_base", in_chans=8, pretrained=False)
input = torch.ones((1, 8, 224, 224))
encoder.forward_features(input)

but if instead you use too low a value for the last dimension (such as 26), we run into an issue, i.e., the below will not work:

small_input = torch.ones((1, 8, 224, 26))
encoder.forward_features(small_input)

>>> RuntimeError: Calculated padded input size per channel: (14 x 1). Kernel size: (2 x 2). Kernel size can't be grea…

View full answer

TorbenSDJohansen · 2023-07-07T12:37:31Z

TorbenSDJohansen
Jul 7, 2023

I don't think this is related to the number of channels, but rather to the other dimensions (specifically, the small last dimension). To see why, note that the following should run with no problems:

import torch
import timm

encoder = timm.create_model("convnextv2_base", in_chans=8, pretrained=False)
input = torch.ones((1, 8, 224, 224))
encoder.forward_features(input)

but if instead you use too low a value for the last dimension (such as 26), we run into an issue, i.e., the below will not work:

small_input = torch.ones((1, 8, 224, 26))
encoder.forward_features(small_input)

>>> RuntimeError: Calculated padded input size per channel: (14 x 1). Kernel size: (2 x 2). Kernel size can't be greater than actual input size

The reason your code work with resnet18 is that it does not downsample as much.

You can also see this directly from the error message: The kernel size referenced in the RuntimeError (2 x 2) only works on input with per-channel dimension of at least 2 x 2; specifically, neither dimension is allowed to be smaller than 2.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ConvNext in_channels > 3 leads to RunTime Error #1869

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

ConvNext in_channels > 3 leads to RunTime Error #1869

aaprasad Jul 6, 2023

Replies: 1 comment

TorbenSDJohansen Jul 7, 2023

aaprasad
Jul 6, 2023

TorbenSDJohansen
Jul 7, 2023