关于参数的一些调节 #23

BboyHanat · 2020-06-22T03:30:18Z

channels=64的效果远不如channels=512的效果，channels=512时attention得到了非常好的效果！不过这样cam模块会变的非常大，参数量从5M到了96M。整体的参数量为178M左右。
cam channels=512时，我把主干网络换成了Darknet-light，依然得到了非常好的效果。我把两层的gru 换成了单层的lstm 以后效果也提升了不少，就是prelstm去掉，GRUCell换成了LSTMCell。

数据集是，LSVT, ReCTs, ICDAR2017RCTW, ART, 还有部分自己生成数据集。
这个Issues希望能给后来的人少走弯路。

谢谢作者开源！

Yuliang-Liu · 2020-06-22T03:58:46Z

@BboyHanat 能请问一下大概提升了多少吗。你是在这些数据集上做纯识别的任务吗？大概训练多久？

BboyHanat · 2020-06-22T07:55:00Z

@BboyHanat 能请问一下大概提升了多少吗。你是在这些数据集上做纯识别的任务吗？大概训练多久？
目前在我自己的数据集上有7个点的提升，这个数据集是业务数据集，验证方式一致，训练时间5天，数据量60万。

Yuliang-Liu · 2020-06-22T23:32:33Z

谢谢告知

Wang-Tianwei · 2020-06-24T02:37:09Z

谢谢关注。
ReCTS上我没试过把CAM搞到512那么大，还是用8层64，backbone用Res-34，输入尺寸用64*512，卷积下采样8倍，效果还可以。

BboyHanat · 2020-06-24T03:11:37Z

谢谢关注。
ReCTS上我没试过把CAM搞到512那么大，还是用8层64，backbone用Res-34，输入尺寸用64*512，卷积下采样8倍，效果还可以。
嗯，你们这个模型还是很不错的，我试的是320 * 320 的输入，之前是用你们的参数，然后想试试改大CAM的参数，然后得到了很大的提升，这个在中文OCR 上还是很不错的，现在就是还是得想办法把模型弄小，换了backbone为DarkNet-Light，channel改到了64 ，目前在模型蒸馏。

Wang-Tianwei · 2020-06-24T03:31:46Z

@BboyHanat 很有趣的发现。我之前的想法是CAM用来进行视觉对齐不需要学习具体文字特征，所以没有尝试把通道设太多，增大CAM channel可以提升性能这个是出乎我意料的。

BboyHanat · 2020-06-24T03:41:29Z

@BboyHanat 很有趣的发现。我之前的想法是CAM用来进行视觉对齐不需要学习具体文字特征，所以没有尝试把通道设太多，增大CAM channel可以提升性能这个是出乎我意料的。

是的，可能在任意形状的识别任务上可以学到更好的序列attention

Wang-Tianwei · 2020-06-24T04:22:43Z

有道理，论文里的任务都是排列比较规则的，可能任意形状增多channel更有优势

whuhangzhang · 2020-07-02T08:27:47Z

@BboyHanat 你好除了GRU修改为LSTMCell之外还有什么网络的修改么，我进行修改之后，发现有些问题,想问下LSTMCell里面怎么设置

BboyHanat · 2020-07-06T02:43:11Z

@BboyHanat 你好除了GRU修改为LSTMCell之外还有什么网络的修改么，我进行修改之后，发现有些问题,想问下LSTMCell里面怎么设置
删除了prelstm, gru->lstm, 然后将CAM的channel 调大，替换了backbone，就这些修改

whuhangzhang · 2020-07-06T07:26:58Z

@BboyHanat 你好除了GRU修改为LSTMCell之外还有什么网络的修改么，我进行修改之后，发现有些问题,想问下LSTMCell里面怎么设置
删除了prelstm, gru->lstm, 然后将CAM的channel 调大，替换了backbone，就这些修改

我的问题在于gru->lstm，如果直接把这一行：

Decoupled-attention-network/DAN.py

Line 175 in 64806d7

self.rnn = nn.GRUCell(nchannel * 2, nchannel)

更换为：
self.rnn = nn.LSTMCell(nchannel * 2, nchannel)的话，

Decoupled-attention-network/DAN.py

Line 203 in 64806d7

hidden = self.rnn(torch.cat((C[i, :, :], prev_emb), dim = 1),

这里会报错的，请问下前面这句代码如何写呢？

Wang-Tianwei · 2020-07-06T07:30:51Z

@whuhangzhang LSTMcell的输入比GRUcell多了一个c，直接换肯定不行的

BboyHanat · 2020-07-06T07:38:19Z

@BboyHanat 你好除了GRU修改为LSTMCell之外还有什么网络的修改么，我进行修改之后，发现有些问题,想问下LSTMCell里面怎么设置
删除了prelstm, gru->lstm, 然后将CAM的channel 调大，替换了backbone，就这些修改

我的问题在于gru->lstm，如果直接把这一行：

Decoupled-attention-network/DAN.py

Line 175 in 64806d7

self.rnn = nn.GRUCell(nchannel * 2, nchannel)

更换为：
self.rnn = nn.LSTMCell(nchannel * 2, nchannel)的话，

Decoupled-attention-network/DAN.py

Line 203 in 64806d7

hidden = self.rnn(torch.cat((C[i, :, :], prev_emb), dim = 1),

这里会报错的，请问下前面这句代码如何写呢？

`stats = torch.zeros(nB, self.nchannel).type_as(C.data)‘
hidden, stats = self.rnn(torch.cat((C[step, :, :], prev_emb), dim=1), (hidden, stats))

whuhangzhang · 2020-07-06T10:44:27Z

@BboyHanat ，十分感谢您的回复。最后想问您两个问题

第一，我在代码中发现一处bug：

Decoupled-attention-network/dataset_scene.py

Line 54 in 64806d7

self.target_ratio = img_width / float(img_width)

如果改为
self.target_ratio = img_width / float(img_height)
性能就会下降3-4个点，我想问下这个您是否做过实验呀？

第二，我看您把图像的大小设置为320*320, 那么下面Feature_Extractor模块的输入大小，是否要从input_shape': [1, 32, 128], # C x H x W 修改为input_shape': [1, 320, 320], # C x H x W

Decoupled-attention-network/cfgs_scene.py

Lines 49 to 55 in 64806d7

    
           net_cfgs = { 
        
               'FE': Feature_Extractor, 
        
               'FE_args': { 
        
                   'strides': [(1,1), (2,2), (1,1), (2,2), (1,1), (1,1)], 
        
                   'compress_layer' : False,  
        
                   'input_shape': [1, 32, 128], # C x H x W 
        
               },

BboyHanat · 2020-07-06T11:11:09Z

@BboyHanat ，十分感谢您的回复。最后想问您两个问题

第一，我在代码中发现一处bug：

Decoupled-attention-network/dataset_scene.py

Line 54 in 64806d7

self.target_ratio = img_width / float(img_width)

如果改为
self.target_ratio = img_width / float(img_height)
性能就会下降3-4个点，我想问下这个您是否做过实验呀？

第二，我看您把图像的大小设置为320*320, 那么下面Feature_Extractor模块的输入大小，是否要从input_shape': [1, 32, 128], # C x H x W 修改为input_shape': [1, 320, 320], # C x H x W

Decoupled-attention-network/cfgs_scene.py

Lines 49 to 55 in 64806d7

net_cfgs = {

'FE': Feature_Extractor,

'FE_args': {

'strides': [(1,1), (2,2), (1,1), (2,2), (1,1), (1,1)],

'compress_layer' : False,

'input_shape': [1, 32, 128], # C x H x W

},

因为我读取数据的方式和本工程不一样，所以这一块我使用的是自己写的dataloader, 数据处理方式没有变

chibohe · 2020-07-13T09:57:24Z

感谢楼主的调优分享。请问有在中文长文本上测试过吗

BboyHanat · 2020-07-16T02:20:40Z

感谢楼主的调优分享。请问有在中文长文本上测试过吗

我的目标文本在1-20之间，精度还可以

chibohe · 2020-07-16T03:20:21Z

感谢楼主的调优分享。请问有在中文长文本上测试过吗

我的目标文本在1-20之间，精度还可以

感谢回复🙏

whereitogo · 2020-12-01T14:30:11Z

channels=64的效果远不如channels=512的效果，channels=512时attention得到了非常好的效果！不过这样cam模块会变的非常大，参数量从5M到了96M。整体的参数量为178M左右。
cam channels=512时，我把主干网络换成了Darknet-light，依然得到了非常好的效果。我把两层的gru 换成了单层的lstm 以后效果也提升了不少，就是prelstm去掉，GRUCell换成了LSTMCell。

数据集是，LSVT, ReCTs, ICDAR2017RCTW, ART, 还有部分自己生成数据集。
这个Issues希望能给后来的人少走弯路。

谢谢作者开源！

你好，请问我像下面这种制作标签的方式对吗？
import os
import re
import numpy as np

import inspect

currentdir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))

这里的label_dict/icdar_labels.txt文件是我的汉字
def get_dict(path=os.path.join(currentdir, 'label_dict/icdar_labels.txt'), add_space=False, add_eos=False):
"""
Load text label dict from preprocessed text file.
Args:
path: label dict text file path.
add_space: whether add additional space charater to label dict.
add_eos: whether add EOS which represents end of sequence to label dict.
Returns:
label_dict: text label dict.
"""
label_dict = dict()
with open(path, 'r', encoding="utf-8") as f:
lines = f.readlines()
for line in lines:
m = re.match(r'(\d+) (.*)', line)
idx, label = int(m.group(1)), m.group(2)
label_dict[idx] = label
if add_space:
idx = idx + 1
label_dict[idx] = ' '
if add_eos:
idx = idx + 1
label_dict[idx] = 'EOS'
return label_dict

if name == 'main':
label_dict = get_dict()
print(label_dict)

上面的汉字标签文件是如下：
0 UNK
1 EOS
2 0
3 1
4 2
5 3
6 4
7 5
8 6
9 7
10 8
11 9
12 a
这样对吗？
然后制作数据集的过程中：

anno_data文本图像标签
transcripts = [anno['transcription'] for anno in anno_data]
for char in transcript:
seq_label.append(self.label_dict[char])
是这样吗？

xiaolingCao · 2021-07-19T02:59:50Z

channels=64的效果远不如channels=512的效果，channels=512时attention得到了非常好的效果！不过这样cam模块会变的非常大，参数量从5M到了96M。整体的参数量为178M左右。
cam channels=512时，我把主干网络换成了Darknet-light，依然得到了非常好的效果。我把两层的gru 换成了单层的lstm 以后效果也提升了不少，就是prelstm去掉，GRUCell换成了LSTMCell。

数据集是，LSVT, ReCTs, ICDAR2017RCTW, ART, 还有部分自己生成数据集。
这个Issues希望能给后来的人少走弯路。

谢谢作者开源！

你好，请问发论文了吗？可以给个论文链接吗？

BboyHanat closed this as completed Jul 6, 2020

BboyHanat reopened this Jul 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于参数的一些调节 #23

关于参数的一些调节 #23

BboyHanat commented Jun 22, 2020

Yuliang-Liu commented Jun 22, 2020

BboyHanat commented Jun 22, 2020

Yuliang-Liu commented Jun 22, 2020

Wang-Tianwei commented Jun 24, 2020

BboyHanat commented Jun 24, 2020 •

edited

Loading

Wang-Tianwei commented Jun 24, 2020

BboyHanat commented Jun 24, 2020

Wang-Tianwei commented Jun 24, 2020

whuhangzhang commented Jul 2, 2020 •

edited

Loading

BboyHanat commented Jul 6, 2020 •

edited

Loading

whuhangzhang commented Jul 6, 2020

Wang-Tianwei commented Jul 6, 2020

BboyHanat commented Jul 6, 2020 •

edited

Loading

whuhangzhang commented Jul 6, 2020

BboyHanat commented Jul 6, 2020

chibohe commented Jul 13, 2020

BboyHanat commented Jul 16, 2020

chibohe commented Jul 16, 2020

whereitogo commented Dec 1, 2020

xiaolingCao commented Jul 19, 2021

关于参数的一些调节 #23

关于参数的一些调节 #23

Comments

BboyHanat commented Jun 22, 2020

Yuliang-Liu commented Jun 22, 2020

BboyHanat commented Jun 22, 2020

Yuliang-Liu commented Jun 22, 2020

Wang-Tianwei commented Jun 24, 2020

BboyHanat commented Jun 24, 2020 • edited Loading

Wang-Tianwei commented Jun 24, 2020

BboyHanat commented Jun 24, 2020

Wang-Tianwei commented Jun 24, 2020

whuhangzhang commented Jul 2, 2020 • edited Loading

BboyHanat commented Jul 6, 2020 • edited Loading

whuhangzhang commented Jul 6, 2020

Wang-Tianwei commented Jul 6, 2020

BboyHanat commented Jul 6, 2020 • edited Loading

whuhangzhang commented Jul 6, 2020

BboyHanat commented Jul 6, 2020

chibohe commented Jul 13, 2020

BboyHanat commented Jul 16, 2020

chibohe commented Jul 16, 2020

whereitogo commented Dec 1, 2020

xiaolingCao commented Jul 19, 2021

BboyHanat commented Jun 24, 2020 •

edited

Loading

whuhangzhang commented Jul 2, 2020 •

edited

Loading

BboyHanat commented Jul 6, 2020 •

edited

Loading

BboyHanat commented Jul 6, 2020 •

edited

Loading